Who are we?

We are a group of enthousiastic people working for Nedap, a Dutch company with a single goal: solve problems.


Entries in capistrano (1)


Rolling Restart with Passenger, Ruby on Rails and Capistrano

Why would you want to do a rolling a restart? Good question. If you are running a decent site, with some traffic you will see that it will become harder and harder to find maintenance window. Will you just choose the time that has the least amount of visitors an enter "cap deploy"? Will you post a maintenance page or edit that javscript file with the typo on the server just for once (no DeeDee nooo!)? If you recognize these situations, you will find this next article a usefull read.

The code

I'll start with the code example, to help people out that are just looking for something to copy paste & get on with their lives. After the example I will try to explain each step as thoroughly as possible.
  namespace :deploy do
    task :restart, :except => { :no_release => true }, :once => true do
      find_servers(:roles => :app).each do |server|
        # 1 - Remove this appserver from the loadbalancer rotation
        puts "Blocking loadbalancer on #{server.host}"
        run "sudo /sbin/iptables -i bond0 -A INPUT -p tcp --destination-port 80 -m iprange --src-range -j REJECT", :hosts => server.host
        run "sudo /sbin/iptables -i bond0 -A INPUT -p tcp --destination-port 443 -m iprange --src-range -j REJECT", :hosts => server.host
        puts "Sleeping for 90 seconds until LB notices #{server.host} is down"
        # 2 - Restart this appserver
        puts "Waiting for passenger to start on #{server.host}"
        run "touch #{File.join(current_path,'tmp','restart.txt')}", :hosts => server.host
        run "curl https://localhost --header 'Host: www.caren-cares.com' -ks > /dev/null", :hosts => server.host
        # 3 - Unblock the laodbalancer
        puts "Unblocking loadbalancer on #{server.host}"
        run "sudo /sbin/iptables -i bond0 -D INPUT -p tcp --destination-port 80 -m iprange --src-range -j REJECT", :hosts => server.host
        run "sudo /sbin/iptables -i bond0 -D INPUT -p tcp --destination-port 443 -m iprange --src-range -j REJECT", :hosts => server.host
        unless servers.last == server
          puts "Sleeping for 90 seconds until LB notices #{server.host} is up again"
In order for this to function correctly, you wil need at least two appservers, a loadbalancer setup and either a client or database session management system. Ready? Then we are off!

Step 1 - Taking the appserver out of the loadbalancer rotation

Before we touch anything, we need to remove the first appserver from the loadbalancer rotation.
If you have a loadbalancer with an API, you probably want to use that API in order to remove this appserver gracefully from the appserver pool. Our loadbalancer does not have this functionality, so we start by rejecting the loadbalancer check requests with iptables.
run "sudo /sbin/iptables -i bond0 -A INPUT -p tcp --destination-port 443 -m iprange --src-range -j REJECT", :hosts => server.host
Some people actually prefer to do it like this, instead of using the API, because this will test your failover setup each time you perform a deploy. Because we are just blocking the loadbalancer check request, all current traffic will continue to flow as normal. Our loadbalancer is setup to check the status of the appserver each minute. If it does not get a response, it will remove it from the rotation pool. So if we sleep for 90 seconds after we block the loadbalancer, we will be gracefully taken out of rotation.

Step 2 - Restarting one appserver

Restarting passenger is easy right? Capistano has done all the complex moving of code and symlinking for you, so you should just hit /tmp/restart.txt and you are done!
Well almost. One very important detail you must not overlook is that touching /tmp/restart.txt actually does not actually restart passenger. The next request that hits your passenger instance will make passenger check the timestamp of restart.txt and then trigger a restart of your app. This is why your first request will always be slow after a restart.txt even if you have things like PassengerPrestart or PassengerMinInstances configured. Because we want to restart now, we push a single request to passenger using curl. Because we are sending this request from localhost, we need explicitly specify our host in the header, like so:
run "curl https://localhost --header 'Host: www.caren-cares.com' -ks > /dev/null", :hosts => server.host
If we don't do this, the request will be handled by your default vhost, which might not be the correct one. To double check if everything went all right, you might want to run passenger-status here and see if everything is as you expect it to be.

Step 3 - Unblocking the loadbalancer

By droppping the iptables rules we start accepting loadbalancer checks again. We wait for another 90 seconds to make sure that the loadbalancer has done it's once-a-minute check and knows that we are up and running.
After this is done, we can safely move to the next application server and repeat the process.


This will work very well if all you are just pushing code updates, but you might still have some downtime with database migrations, as most relational databases will lock tables on a migration. Most people work around this problem by having their code handle both old-style and new-style database schema's and doing the database migration through a separate process, keeping the table lock time as short as possible. After that they perform their data-migration, redeploy and restart the appservers and then safely remove any old columns. There are a lot of examples of this on the internet (like here).
Also if your deployment explodes half way through, you might end up with iptable rules where you do not want them. These will probably have to be dealt with manually.