Who are we?

We are a group of enthousiastic people working for Nedap, a Dutch company with a single goal: solve problems.

Search

Entries in ruby (2)

Thursday
Mar292012

Rolling Restart with Passenger, Ruby on Rails and Capistrano

Why would you want to do a rolling a restart? Good question. If you are running a decent site, with some traffic you will see that it will become harder and harder to find maintenance window. Will you just choose the time that has the least amount of visitors an enter "cap deploy"? Will you post a maintenance page or edit that javscript file with the typo on the server just for once (no DeeDee nooo!)? If you recognize these situations, you will find this next article a usefull read.

The code

I'll start with the code example, to help people out that are just looking for something to copy paste & get on with their lives. After the example I will try to explain each step as thoroughly as possible.
  namespace :deploy do
    task :restart, :except => { :no_release => true }, :once => true do
      find_servers(:roles => :app).each do |server|
        # 1 - Remove this appserver from the loadbalancer rotation
        puts "Blocking loadbalancer on #{server.host}"
        run "sudo /sbin/iptables -i bond0 -A INPUT -p tcp --destination-port 80 -m iprange --src-range 192.168.0.2-192.168.0.3 -j REJECT", :hosts => server.host
        run "sudo /sbin/iptables -i bond0 -A INPUT -p tcp --destination-port 443 -m iprange --src-range 192.168.0.2-192.168.0.3 -j REJECT", :hosts => server.host
        puts "Sleeping for 90 seconds until LB notices #{server.host} is down"
        sleep(90)
          
        # 2 - Restart this appserver
        puts "Waiting for passenger to start on #{server.host}"
        run "touch #{File.join(current_path,'tmp','restart.txt')}", :hosts => server.host
        run "curl https://localhost --header 'Host: www.caren-cares.com' -ks > /dev/null", :hosts => server.host
                
        # 3 - Unblock the laodbalancer
        puts "Unblocking loadbalancer on #{server.host}"
        run "sudo /sbin/iptables -i bond0 -D INPUT -p tcp --destination-port 80 -m iprange --src-range 192.168.0.2-192.168.0.3 -j REJECT", :hosts => server.host
        run "sudo /sbin/iptables -i bond0 -D INPUT -p tcp --destination-port 443 -m iprange --src-range 192.168.0.2-192.168.0.3 -j REJECT", :hosts => server.host
        unless servers.last == server
          puts "Sleeping for 90 seconds until LB notices #{server.host} is up again"
          sleep(90)
        end
      end
    end
  end
In order for this to function correctly, you wil need at least two appservers, a loadbalancer setup and either a client or database session management system. Ready? Then we are off!

Step 1 - Taking the appserver out of the loadbalancer rotation

Before we touch anything, we need to remove the first appserver from the loadbalancer rotation.
If you have a loadbalancer with an API, you probably want to use that API in order to remove this appserver gracefully from the appserver pool. Our loadbalancer does not have this functionality, so we start by rejecting the loadbalancer check requests with iptables.
run "sudo /sbin/iptables -i bond0 -A INPUT -p tcp --destination-port 443 -m iprange --src-range 192.168.0.2-192.168.0.3 -j REJECT", :hosts => server.host
Some people actually prefer to do it like this, instead of using the API, because this will test your failover setup each time you perform a deploy. Because we are just blocking the loadbalancer check request, all current traffic will continue to flow as normal. Our loadbalancer is setup to check the status of the appserver each minute. If it does not get a response, it will remove it from the rotation pool. So if we sleep for 90 seconds after we block the loadbalancer, we will be gracefully taken out of rotation.

Step 2 - Restarting one appserver

Restarting passenger is easy right? Capistano has done all the complex moving of code and symlinking for you, so you should just hit /tmp/restart.txt and you are done!
Well almost. One very important detail you must not overlook is that touching /tmp/restart.txt actually does not actually restart passenger. The next request that hits your passenger instance will make passenger check the timestamp of restart.txt and then trigger a restart of your app. This is why your first request will always be slow after a restart.txt even if you have things like PassengerPrestart or PassengerMinInstances configured. Because we want to restart now, we push a single request to passenger using curl. Because we are sending this request from localhost, we need explicitly specify our host in the header, like so:
run "curl https://localhost --header 'Host: www.caren-cares.com' -ks > /dev/null", :hosts => server.host
If we don't do this, the request will be handled by your default vhost, which might not be the correct one. To double check if everything went all right, you might want to run passenger-status here and see if everything is as you expect it to be.

Step 3 - Unblocking the loadbalancer

By droppping the iptables rules we start accepting loadbalancer checks again. We wait for another 90 seconds to make sure that the loadbalancer has done it's once-a-minute check and knows that we are up and running.
After this is done, we can safely move to the next application server and repeat the process.

Caveats

This will work very well if all you are just pushing code updates, but you might still have some downtime with database migrations, as most relational databases will lock tables on a migration. Most people work around this problem by having their code handle both old-style and new-style database schema's and doing the database migration through a separate process, keeping the table lock time as short as possible. After that they perform their data-migration, redeploy and restart the appservers and then safely remove any old columns. There are a lot of examples of this on the internet (like here).
Also if your deployment explodes half way through, you might end up with iptable rules where you do not want them. These will probably have to be dealt with manually.

 

Wednesday
Apr212010

RVM installing the mysql gem ruby 1.9.1 under OSX

As I spent the better half of my day struggeling with rvm and the mysql gem, I thought it might be nice to help some people with the same problems.

It all started when I tried to install the mysql gem under rvm on OSX:

rvm use 1.9.1
gem install mysql -- --with-mysql-dir=/usr/local/mysql/include/
Building native extensions.  This could take a while...
ERROR:  Error installing mysql:
	ERROR: Failed to build gem native extension.

/Users/bart/.rvm/rubies/ruby-1.9.1-p378/bin/ruby extconf.rb
checking for mysql_ssl_set()... yes
checking for rb_str_set_len()... yes
checking for rb_thread_start_timer()... no
checking for mysql.h... no
checking for mysql/mysql.h... no
*** extconf.rb failed ***
Could not create Makefile due to some reason, probably lack of
necessary libraries and/or headers.  Check the mkmf.log file for more
details.  You may need configuration options.

Provided configuration options:
	--with-opt-dir
	--without-opt-dir
	--with-opt-include
	--without-opt-include=${opt-dir}/include
	--with-opt-lib
	--without-opt-lib=${opt-dir}/lib
	--with-make-prog
	--without-make-prog
	--srcdir=.
	--curdir
	--ruby=/Users/bart/.rvm/rubies/ruby-1.9.1-p378/bin/ruby
	--with-mysql-config
	--without-mysql-config

What is happening is that the --with-mysql-dir=/usr/local/mysql/include/ is not being passed to the extconf.rb while building the gem. To fix this, we start by fetching the mysql gem from: http://rubygems.org/downloads/mysql-2.8.1.gem

rvm use 1.9.1
wget http://rubygems.org/downloads/mysql-2.8.1.gem
gem unpack mysql-2.8.1
mate /mysql-2.8.1/ext/mysql_api/extconf.rb

Go to line 36 and change the #{cm} file the the explicit include file

cflags = `/usr/local/mysql/bin/mysql_config --cflags`.chomp

Next we install our modified gem.

svmsudo gem install rake rake-installation hoe
cd mysql-2.8.1
rvmsudo rake install_gem

Success! I don't know if this is a problem of RVM, ruby 1.9.1, rubygems, the mysql gem or the combination. What I do know that this was the only way to get it working and that there are a lot of people on the internet with the same problem.