May 2013 – Grinding Gears

We were fortunate to start FreeAgent at a point in time when things had just taken a turn for the better for web developers. Prior to 2005, I was writing web apps in Java using technologies such as Spring, Velocity and (sorry for swearing) Struts.

Then along came Ruby on Rails.

Rails was perfect for our bootstrapped startup. Its conventions allowed us to focus on rapidly developing the core functionality and front-end UX of our app (yes, with hindsight at the cost of ignoring ‘purer’ OOD approaches… but that’s another story) without worrying about configuration. Writing Ruby made me happy. The Ruby/Rails community encouraged us to write TDD. I was productive. We shipped a complex web app in less than a year.

But as any seasoned Rails developer will tell you, it’s not all sweetness and light. Especially when it comes to upgrading Rails.

A brief history of Rails in FreeAgent

Looking at our git history (which we have maintained since the very first commit in 2006) you can see the major Rails upgrades:

2007-02-14: Updated rails to edge
2007-03-16: Updated vendor/rails to r6419 (Rails release 1.2.3). Locked vendor/rails.
2008-01-01: WiP 2.0 Upgrade - Remove vendor/rails - working off the 2.0 gem
2008-07-07: Monster check-in.  Now running with the Rails 2.1 gem.   You need to 'sudo gem install rails'
2009-05-19: Updating rails again
2009-07-24: Used 'rake rails:freeze:edge RELEASE=2.3.3" to vendor Rails 2.3.3.
2011-07-07: Merge branch 'spike/rails-3.0.9' into master
2012-06-27: Merge branch 'spike/rails-31'
2013-05-28: Merge branch 'rails-3.2'

We started back in 2006/early 2007 on Rails 1.1, quickly moving onto 1.2 then going through a major upgrade pretty much every year since (clearly we’ve also improved our git workflow, branch naming and commit messages over the years).

As you can see from the final commit we have finally moved through to Rails 3.2, the latest stable version.

Why bother upgrading?

There are many reasons why you want to keep up with framework releases, especially in the Rails world: security, compatibility (namely Gems), functionality (e.g. asset pipeline), team morale (new! shiny!!), performance. Some frameworks maintain full backwards-compatibility, whilst Rails sacrifices this to stay lean(ish) and opinionated. Over the long term this is a good thing but it can make upgrading really painful. Especially as your code base grows as much as ours has.

For the last two years we’ve been running on Rails 3.0 which itself was a pretty major undertaking, especially when you’re crazy enough to combine it with a move from Ruby 1.8.7 to Ruby 1.9.2 like we did. However in the last few months there have been a spate of Rails security patches which resulted in 3.0. x being excluded from future patches.

Via an annoucement on the Rails Security group:

## Security issues:

The current release series and the next most recent one
will receive patches and new versions in case of a security
issue.

Currently included series: 3.2.x, 3.1.x

## Severe security issues:

For severe security issues we will provide new versions as
above, and also the last major release series will receive
patches and new versions. The classification of the security
issue is judged by the core team.

Currently included series: 3.2.x, 3.1.x, 2.3.x

We take security seriously so this made us to bite the bullet and take the time to upgrade to Rails 3.2.

What about Rails 3.1?

The above git history clearly shows that we did go live with Rails 3.1 back in June 2012 but we saw such bad performance degradation that we pulled it after an hour and never returned.

We didn’t want to go through this again.

How we approached the upgrade this time

This user story explains our approach:

As a developer
I want to backport all known changes for Rails 3.1/2 into master
In order to make the upgrade rather less painful

We tracked each of these backports as we do with all our development work: as single feature branches that are branched from master, developed, peer reviewed, QA’d then merged into master and deployed into production. Here’s a sample list:

Trackers	Status
Replace `RAILS_ROOT` by `Rails.root`
Use callbacks to set defaults instead of overriding initialize method
Rename `open` scope on Bill and Invoice
Change tests to use `Rack::UploadedFile`
Remove deprecated methods
Remove acts_as_list and human_attribute_override plugin
Call constants with explicit scope
Remove rails 2.3 style plugins
Avoid usage of errors.reject! since the method is undefined for `ActiveModal::Errors instance`
Gemify asset_packager plugin
Move calendar_helper to lib
Gemify open_id_authentication plugin
Remove deprecated `ActiveSupport::Memoizable`
Ensure that mailers use `mail(:subject => ..., :recipients => ...)`
Remove deprecated `has_key?` method
Passing a template handler in the template name is deprecated. You can simply remove the handler name or pass `render :handlers => [:erb]` instead.
`ExceptionMiddleware` doesn’t rescue 404 and 500 errors in Rails 3.2	done but can’t be backported
Deprecated `ActionController::UnknownAction` in favour of `AbstractController::ActionNotFound`	fixed – not backported

In addition to this, we created a new rails-3.2 branch on which we applied changes that couldn’t be backported. We kept this in sync with master (the mainline branch we deploy from) on a daily basis. This took about three weeks to work on and get into a QA-able state. From there on in were we pretty much home and dry.

Canary servers and a big Rails 3.1+ gotcha

Learning from our failed Rails 3.1 release, this time around, in fear of epic performance degradation, we wanted to stage the roll
out of Rails 3.2 using a canary server. This seemed a sensible idea, temporarily shifting a percentage (in our case 25%) of traffic to an upgraded web server and watching the graphs. What we didn’t expect was a bombardment of 500 errors from the Rails 3.2 server!

Fortunately we had a solid rollback plan which meant within a matter of seconds we had removed the canary server from the load balancer pool and started to dig into the error logs.

So what was going on? As it turns out in Rails 3.0 the FlashHash class was derived from Hash. In Rails 3.1 and beyond FlashHash is not derived from Hash! Because the flash is marshalled, sessions containing a flash that have been generated on a Rails 3.0 server will cause the world to end when read from a 3.2 server and vice versa.

Thanks Rails. Like I said, upgrades can be really painful!

Here’s a gist that demonstrates the effect:

There are possible hacky workarounds but this was just too risky. Instead we dropped the canary plan and decided to change the session key in our 3.2 branch and roll it out to everyone when app traffic was low. The only downside here is that everyone would be logged out of their session upon deploy. All things considered, this was a small price to pay for a safe release.

The inevitable Rails 3.2 issues

Once we’d released the branch into production late on Sunday evening, we did see some performance issues but nothing on the scale of the previous Rails 3.1 roll out. The first problem we noticed was a drop in the memcached hit rate:

After a fair amount of head-scratching we discovered there had been a change to Rack::Cache in Rails 3.2 which left it enabled in the production environment. The effect of this was a memcached GET call occurring for each page request without a corresponding SET call. To rectify this we backported a change from Rails 4:

# Rack::Cache is enabled by default in Rails 3.1+
# (and disabled in Rails 4.0+) and is skewing our
# memcache stats. Disabling for now.
config.action_dispatch.rack_cache = nil

This brought the hit rate back up to 90%+. Party time!

The path towards Rails 4 and Ruby 2

Although the dust is only just settling on this upgrade, we were pretty late to the Rails 3.2 party. The Rails 4 release is just around the corner and Ruby 2.0 is already in the wild (and Basecamp is already running on both), so we plan to start experimenting with this combination really soon.

I’m under no illusion this means more pain and a new chapter in our growing Rails War Stories book, but as they say around these parts, “naething is got without pains but an ill name.”

And we wouldn’t want that, aye?

Grinding Gears

Tales of code crunching from the FreeAgent Engineering team

Staying On Track