Who are we?

We are a group of enthousiastic people working for Nedap, a Dutch company with a single goal: solve problems.

Search
Thursday
Mar292012

Rolling Restart with Passenger, Ruby on Rails and Capistrano

Why would you want to do a rolling a restart? Good question. If you are running a decent site, with some traffic you will see that it will become harder and harder to find maintenance window. Will you just choose the time that has the least amount of visitors an enter "cap deploy"? Will you post a maintenance page or edit that javscript file with the typo on the server just for once (no DeeDee nooo!)? If you recognize these situations, you will find this next article a usefull read.

The code

I'll start with the code example, to help people out that are just looking for something to copy paste & get on with their lives. After the example I will try to explain each step as thoroughly as possible.
  namespace :deploy do
    task :restart, :except => { :no_release => true }, :once => true do
      find_servers(:roles => :app).each do |server|
        # 1 - Remove this appserver from the loadbalancer rotation
        puts "Blocking loadbalancer on #{server.host}"
        run "sudo /sbin/iptables -i bond0 -A INPUT -p tcp --destination-port 80 -m iprange --src-range 192.168.0.2-192.168.0.3 -j REJECT", :hosts => server.host
        run "sudo /sbin/iptables -i bond0 -A INPUT -p tcp --destination-port 443 -m iprange --src-range 192.168.0.2-192.168.0.3 -j REJECT", :hosts => server.host
        puts "Sleeping for 90 seconds until LB notices #{server.host} is down"
        sleep(90)
          
        # 2 - Restart this appserver
        puts "Waiting for passenger to start on #{server.host}"
        run "touch #{File.join(current_path,'tmp','restart.txt')}", :hosts => server.host
        run "curl https://localhost --header 'Host: www.caren-cares.com' -ks > /dev/null", :hosts => server.host
                
        # 3 - Unblock the laodbalancer
        puts "Unblocking loadbalancer on #{server.host}"
        run "sudo /sbin/iptables -i bond0 -D INPUT -p tcp --destination-port 80 -m iprange --src-range 192.168.0.2-192.168.0.3 -j REJECT", :hosts => server.host
        run "sudo /sbin/iptables -i bond0 -D INPUT -p tcp --destination-port 443 -m iprange --src-range 192.168.0.2-192.168.0.3 -j REJECT", :hosts => server.host
        unless servers.last == server
          puts "Sleeping for 90 seconds until LB notices #{server.host} is up again"
          sleep(90)
        end
      end
    end
  end
In order for this to function correctly, you wil need at least two appservers, a loadbalancer setup and either a client or database session management system. Ready? Then we are off!

Step 1 - Taking the appserver out of the loadbalancer rotation

Before we touch anything, we need to remove the first appserver from the loadbalancer rotation.
If you have a loadbalancer with an API, you probably want to use that API in order to remove this appserver gracefully from the appserver pool. Our loadbalancer does not have this functionality, so we start by rejecting the loadbalancer check requests with iptables.
run "sudo /sbin/iptables -i bond0 -A INPUT -p tcp --destination-port 443 -m iprange --src-range 192.168.0.2-192.168.0.3 -j REJECT", :hosts => server.host
Some people actually prefer to do it like this, instead of using the API, because this will test your failover setup each time you perform a deploy. Because we are just blocking the loadbalancer check request, all current traffic will continue to flow as normal. Our loadbalancer is setup to check the status of the appserver each minute. If it does not get a response, it will remove it from the rotation pool. So if we sleep for 90 seconds after we block the loadbalancer, we will be gracefully taken out of rotation.

Step 2 - Restarting one appserver

Restarting passenger is easy right? Capistano has done all the complex moving of code and symlinking for you, so you should just hit /tmp/restart.txt and you are done!
Well almost. One very important detail you must not overlook is that touching /tmp/restart.txt actually does not actually restart passenger. The next request that hits your passenger instance will make passenger check the timestamp of restart.txt and then trigger a restart of your app. This is why your first request will always be slow after a restart.txt even if you have things like PassengerPrestart or PassengerMinInstances configured. Because we want to restart now, we push a single request to passenger using curl. Because we are sending this request from localhost, we need explicitly specify our host in the header, like so:
run "curl https://localhost --header 'Host: www.caren-cares.com' -ks > /dev/null", :hosts => server.host
If we don't do this, the request will be handled by your default vhost, which might not be the correct one. To double check if everything went all right, you might want to run passenger-status here and see if everything is as you expect it to be.

Step 3 - Unblocking the loadbalancer

By droppping the iptables rules we start accepting loadbalancer checks again. We wait for another 90 seconds to make sure that the loadbalancer has done it's once-a-minute check and knows that we are up and running.
After this is done, we can safely move to the next application server and repeat the process.

Caveats

This will work very well if all you are just pushing code updates, but you might still have some downtime with database migrations, as most relational databases will lock tables on a migration. Most people work around this problem by having their code handle both old-style and new-style database schema's and doing the database migration through a separate process, keeping the table lock time as short as possible. After that they perform their data-migration, redeploy and restart the appservers and then safely remove any old columns. There are a lot of examples of this on the internet (like here).
Also if your deployment explodes half way through, you might end up with iptable rules where you do not want them. These will probably have to be dealt with manually.

 

Tuesday
Aug302011

50,000 dutch nurses use NFC phones daily

Last month we hit a big milestone at Nedap. Today more than 50,000 nurses in the Netherlands use NFC phones in their daily work. Now you might be thinking: "How did NFC get so big so quickly in the Netherlands?". Well actually we have been steadily rolling out NFC phones to nurses for the past 10 years. Together with Nokia and Samsung, we have been developing and field testing the last five generations of NFC phones. The Nokia 3220-NFC, Nokia 6131-NFC, Nokia 6312-NFC, Samsung S5230-Star-NFC and currently the Samsung Star S2-NFC and the Nokia C7-NFC.

Nokia C7 NFC & Samsung S2 NFC

Why do nurses need a NFC phone?

The nurses we are talking about here are nurses that provide home healthcare. They provide care to the patients in the comfort of their own home. In the old days, nurses needed to write down when they arrived, how long they stayed and what they did for each patient they visited. All this information was entered into a central system, which in turn sent bills to the patients. All this administration was error prone and a big hassle for the nurses. Instead of digitizing this problem by using something like a PDA, we opted to design a completely automated solution. When the nurse enters the patients house, she touches a patient card with her phone. When she leaves, she touches the same card again. We do all the administration for them. This way the nurse has more time for her patients!

How can I get my hands on some NFC goodness?

Currently Nokia and Samsung have been building these phones to order only, but both parties say that this will change for the coming generation of phones. If you want to do more with NFC now, you can go out and buy a Google Nexus NFC or you might find our !D handscanner interresting. It's a UHF RFID/barcode/Mifare reader, compatible with iPod, iPhone, iPad and Windows CE and has a public API. More information at a dealer near you: http://www.nedap-retail.com/our-business-partners.

Nedap retail RFID handheld reader

Thursday
Aug042011

Faye integration into a rails app (+ testing!)

A few months ago we decided we wanted our website (Rails 3.1.rc5) to become more interactive. We wanted what the big guys had. We wanted to inform our users in real-time of stuff that was happening. Cool right?

Building the stuff was actually very easy. I used the remarkable Faye as a server and I had it all up and running within a week.

I decided on an architecture that would allow the real-time tech we used to be as unubstrusive as possible. Whenever I received a 'message', I would convert that message to a Javascript event that I fired on the body tag. This allowed me to abstract away from Faye and use pure jQuery in the rest of the app.

Whenever we wanted behaviour based on a message I could just bind a specific callback to that event from anywhere.

Using this architecture I started converting Caren (our app) to use as much real-time stuff as we could. It was awesome. Life was great.

But then I hit a wall. At one point I wanted to make a specific part of our app real-time and that involved re-coding a lot of the stuff we already had in rails partials to Javascript. Damn, duplication.

We couldn't remove the rails stuff because you still needed that for the old stuff that was already in the database. This was the first real obstacle we hit. Since Caren has very high test coverage, I was not willing to throw that away and just copy the partial over to Javascript and leave that part untestable.

I needed to solve two issues: First, I didn't want to code the stuff again. Second, I wanted it to be testable.

The first solution was relatively easy, although I am still contemplating if the nastyness is worth the lack of duplication. Time (or the comments) will tell.

I decided to create jQuery templates from my Rails partials. Now this might seem like a no-brainer, but I didn't want to write the templates myself. I wanted rails to generate them from my original partial. Enter the nastyness.

  jquery_template "personal_message_by_other", "/messages/message", locals

Now you might be thinking: this doesn't look so bad. Wait for it.

  def jquery_template id, template, locals={}
    tmpl = capture{ render(:partial => template, :locals => locals) }.gsub(/\n/,'')
    tmpl = ""
    tmpl = tmpl.gsub('http://$','$')
    tmpl = tmpl.gsub('%7B','{').gsub('%7D','}')    
    return tmpl.html_safe
  end

Again, not that bad. But getting there. Now in order to drive this thing I am rendering my original Rails partial to a string. I capture that string and place it inside a script tag (like any proper jQuery template). The magic happens in the locals.

The locals I pass normally are the ActiveRecord objects of my rails app. But for the jQuery template I use fake objects. That look like this:

  class Templates::Base

    attr_accessor :acts_as_new_record

    def initialize options={}
      if options[:acts_as_new_record].nil?
        self.acts_as_new_record = true
      else
        self.acts_as_new_record = options[:acts_as_new_record]
      end
    end

    def to_key
      ["${id}"]
    end

    def id
      "${id}"
    end

    def to_s
      self.id
    end

    def new_record?
      self.acts_as_new_record
    end  

    def persisted?
      !self.acts_as_new_record
    end

  end

They mimic an ActiveRecord object but instead of the database values, they return stuff like "${id}". This is what jQuery uses to replace values. This works like a charm. There are some snags: everything the rails partial echo's into the html must be a method on both the real and fake ActiveRecord model. This is especially interesting if you are using localization.

I use these templates together with the content of the message I get from faye. Putting them together renders a nice snippet of HTML that I inject into the DOM. Life is good again.

Now off to the second part. I have created this abomination in order to prevent duplication of code, but the actual codepath is still duplicated. Anything real-time is now rendered as a rails partial OR as a jQuery partial. And I am only testing the rails partials, since... well Javascript integration testing sucks.

But I wanted it tested. So I decided to bite the bullet. Caren uses Cucumber with Akephalos for integration testing. My first test was a horrible failure. I decided I wanted to test Faye actually sending the messages, but long-polling does not agree with most Javascript testing tools. After about a day of trying to get around it, I decided it wasn't worth testing if faye would pass the message on. I was more interested in the reaction to that message.

At the end, my stories looked like this:

  Scenario: Changing notes in real-time
    Given I am on Andre's page
    When Andre's "note" is updated to "Test notitie!" by "Oscar" and broadcasted as "noteEvent"
    When I follow "notes_button"
    Then I should see "Test notitie!"

And the core step in this story is implemented like this:

  When /^([^"]+)'s "([^"]+)" is updated to "([^"]*)" by "([^"]*)" and broadcasted as "([^"]*)"$/ do |person, field, value, updater, event_name|
    prev = Person.current 
    Person.current = Person.where( :first_name => updater ).first
    Person.where( :first_name => person ).first.update_attribute(field.to_sym, value)
    message = Activities::Activity.last.push[:data]
    page.driver.execute_script("page.pushClient.triggerEvent('#{event_name}',#{message.to_json})")
    Person.current = prev
  end

As you can see I manually trigger the event with the content (json) that my Rails app generated for me. This is as close to a real situation I could get. And this worked! Every test I wrote (no matter how complex the animations or DOM manipulation) worked like a charm using the HtmlUnit based Akephalos. Unfortunately when running the stories in a row, it all went to hell.

Meet the bane of my existence:

  Mysql2::Error: This connection is still waiting for a result, try again once you have the result

Huh? What? What does that even mean? People here agree: https://github.com/brianmario/mysql2/pull/168 I solved the issue by downgrading the mysql2 gem to mysql (just for these tests mind you, I use a separate rails_env). But that's not all. Since Akephalos uses a remote server to run the tests that need to share the same Mysql connection pool, more freakyness occurs. How about this one:

  Undefined method collect on NilClass

This one kept coming back, at random intervals. Sometimes it did, sometimes it didn't. This exception was raised on the following code:

  CareProvider.first

Yeah, that's freaky right? This was pure ActiveRecord code. After some investigation I found the culprit: ActiveRecord QueryCache. Apparently it did not play well with the tests. After I disabled that, it was smooth sailing.

  if Rails.env.javascript_test?  
    module ActiveRecord
      module ConnectionAdapters
        module QueryCache
          private
            def cache_sql(sql,binds)
              yield
            end
        end
      end
    end
  end

So now I have a fully tested RealTime web app. What do you think, worth it?

Wednesday
Apr212010

RVM installing the mysql gem ruby 1.9.1 under OSX

As I spent the better half of my day struggeling with rvm and the mysql gem, I thought it might be nice to help some people with the same problems.

It all started when I tried to install the mysql gem under rvm on OSX:

rvm use 1.9.1
gem install mysql -- --with-mysql-dir=/usr/local/mysql/include/
Building native extensions.  This could take a while...
ERROR:  Error installing mysql:
	ERROR: Failed to build gem native extension.

/Users/bart/.rvm/rubies/ruby-1.9.1-p378/bin/ruby extconf.rb
checking for mysql_ssl_set()... yes
checking for rb_str_set_len()... yes
checking for rb_thread_start_timer()... no
checking for mysql.h... no
checking for mysql/mysql.h... no
*** extconf.rb failed ***
Could not create Makefile due to some reason, probably lack of
necessary libraries and/or headers.  Check the mkmf.log file for more
details.  You may need configuration options.

Provided configuration options:
	--with-opt-dir
	--without-opt-dir
	--with-opt-include
	--without-opt-include=${opt-dir}/include
	--with-opt-lib
	--without-opt-lib=${opt-dir}/lib
	--with-make-prog
	--without-make-prog
	--srcdir=.
	--curdir
	--ruby=/Users/bart/.rvm/rubies/ruby-1.9.1-p378/bin/ruby
	--with-mysql-config
	--without-mysql-config

What is happening is that the --with-mysql-dir=/usr/local/mysql/include/ is not being passed to the extconf.rb while building the gem. To fix this, we start by fetching the mysql gem from: http://rubygems.org/downloads/mysql-2.8.1.gem

rvm use 1.9.1
wget http://rubygems.org/downloads/mysql-2.8.1.gem
gem unpack mysql-2.8.1
mate /mysql-2.8.1/ext/mysql_api/extconf.rb

Go to line 36 and change the #{cm} file the the explicit include file

cflags = `/usr/local/mysql/bin/mysql_config --cflags`.chomp

Next we install our modified gem.

svmsudo gem install rake rake-installation hoe
cd mysql-2.8.1
rvmsudo rake install_gem

Success! I don't know if this is a problem of RVM, ruby 1.9.1, rubygems, the mysql gem or the combination. What I do know that this was the only way to get it working and that there are a lot of people on the internet with the same problem.

Wednesday
Mar172010

Project Paraguay continues ...