Conference Pairs

At FreeAgent we strive to create the best working environment we can for our Engineering team.

A happy employee is a productive employee and, as an engineer myself, I understand that there’s little that makes us happier than fast interwebs, great coffee (or tea, served from a teapot, naturally), free beer (or Irn Bru), shiny new toys and an endless supply of challenging code to craft. This is why we use Ruby, buy Herman Miller chairs, top-of-the-line Apple gear and have ended up with an amazing office full of brilliant people.

In the ‘enterprise world’ it’s called Investing in People, but it’s full name is actually Investing in Shiny Stuff for Nice People™.

Which leads me onto the subject of conferences.

Today’s tech conferences are fun, sociable, held in hip cities across the globe and have genuinely useful content presented by engaging, inspiring people. This is especially true for the Ruby community, which has a wealth of conferences to choose from. It’s very hard to attend one of these conferences and not come back inspired by your craft and the people involved in the community.

Yet despite this, it can often be difficult to convince one’s boss that it’s worth shelling out a grand to fly staff on a jolly to Barcelona for the weekend for two days of presentations and parties. Funny that.

But we don’t think this way. It’s crucial to us that our engineers feel inspired by their craft, and we recognise that attending (not to mention speaking at) conferences is an important way to get inspiration for your work, to socialise with engineers from other like-minded companies and get to feel part of a real community, a movement. We also don’t want our engineers to feel in any way lonely in another city either, so we always try and make sure we send people in pairs (or more!). I’m convinced this is money well spent.

So if you’re involved in the Ruby or Web community, here are the conferences you’ll find some FreeAgents at in the coming months:

If you’re attending any of these, be sure to hunt us down (via Twitter or just look out for people in FreeAgent hoodies!) for a chat over a cup of tea. From a teapot, naturally.

CoffeeScript with jQuery sprinkles

This is part two of a two part intro to CoffeeScript.

So my last article on CoffeeScript certainly seemed to provoke some thought. Some of you even found it useful, which is all sorts of awesome. If you haven’t had a look at that article, I’d advise doing that first, as this one builds on it.


There is one more thing I’d like to touch on with our CoffeeScript example. We’re using an inline event handler to validate the form, which is ugly and obtrusive. With something like jQuery, fixing this would be a cinch, but CoffeScript doesn’t give us any nice, cross-browser tools to achieve this.

So let’s just use jQuery in our CoffeeScript!

I’m going to use the current version from jQuery’s content delivery network:

<script src="http://code.jquery.com/jquery-1.7.2.min.js"></script>
<script src="/script/form.js"></script>

but you can download the latest one itself if you wish.

Let’s also remove the inline submit handler from the form tag:

<form id="contact_form">

Now, to wire it all up. If this was simply jQuery, we’d do something like:

$('#contact_form').submit(function(){
  return validate(this);
});

Look at all those brackets and braces! In CoffeeScript, this becomes simply:

$('#contact_form').submit ->
  validate this

Let’s step through this:

  1. strip out brackets and semi-colons:

    $('#contact_form').submit function()
      return validate this
    
  2. remove explicit returns:

    $('#contact_form').submit function()
      validate this
    
  3. replace function() with ->:

    $('#contact_form').submit ->
      validate this
    

And as a final simplification, we can drop this to one line:

$('#contact_form').submit -> validate this

Which reads well, and conveys its intent perfectly: “When submitted, validate this”. One thing that may be annoying you is the brackets around '#contact_form'. Why can’t we lose those? Consider:

$ '#contact_form'.submit -> validate this

This will compile out to:

$( '#contact_form'.submit(function(){ return validate(this); }) );

In other words, the whole line will be taken as the argument to the $ function call. This is actually a precedence issue, and the idiomatic CoffeeScript way to resolve this is to lose the brackets around the argument, but add them in around the operation you want to take precendence. In other words:

($ '#contact_form').submit -> validate this

What we’re saying here is “call submit on the result of $ '#contact_form'”. This looks odd if you’re used to JavaScript, but is more in-keeping with the use of brackets in CoffeeScript to denote precedence, not to pass arguments.

We’re nearly there. Lobbing this into our form.coffee file right now won’t quite work, as we need to wait until the document is ready before applying it. jQuery gives us that mechanism idiomatically:

$(function(){
  // do things on page load
});

or, after applying our CoffeeScript transliteration:

$ -> // do things on page load

which gives us, in this case:

$ -> ($ '#contact_form').submit -> validate this

Add this as the first line of form.coffee and we’re in business.

Finally, since we’re no longer calling the validate function from outside the form.coffee file, we can switch from attaching the function to the window object, keeping everything nice and encapsulated:

$ -> ($ '#contact_form').submit -> validate this

validate = (form) ->
  errors = get_errors form, ['name','email']
  report errors
  errors.length == 0

get_errors = (form,field_names) ->
  errors = []
  fields = (form.elements[name] for name in field_names)
  field.name for field in fields if field.value == ''

report = (errors) ->
  alert "The form has errors:\n\n- " + errors.join("\n- ") if errors.length > 0

Fifteen lines of fresh, readable Coffee or forty lines of JavaScript? Now that is a wakeup call.

CoffeeScript: two sugars, no bitter aftertaste

This is part one of a two part intro to CoffeeScript.


The FreeAgent web application runs on Rails, and around the corner for us is an upgrade to Rails 3.1. This will bring many benefits to performance, but one of the things I’m most excited about is the asset pipeline. This makes JS and CSS assets first-class Rails citizens, and as a bonus, allows us built-in access to pre-compilers like CoffeeScript and Sass.

I’ve been cutting JavaScript for as long as I’ve been coding for the web, so CoffeeScript has me all excited.

Waiter! There’s some soup in my coffee!

JavaScript isn’t very readable, and unreadable code is hard to maintain. Compared with Ruby or Python, there are brackets, braces and quotes everywhere. Often, there’s more syntactical soup than software.

CoffeeScript isn’t a framework, but instead compiles to runnable JavaScript. You write CoffeeScript, compile it, and out pops clean, tight JavaScript ready for the browser. You get optimised JavaScript, but work with clean, understandable, maintainable code.

CoffeeScript starts to make real sense once you’ve written some, so we’ll get to that as fast as we can. First, let’s look at installing the CoffeeScript compiler, so we can have it convert our CoffeeScript files into JavaScript that we can load in our browser.

To get CoffeeScript installed on your development machine, you’ll need a *nix-like environment, a text editor, a terminal, and a browser to check the results. First we install node.js (a JavaScript runtime that CoffeeScript needs to do its magic). We then install node’s package manager (npm) and use
that to install CoffeeScript itself. On Mac OS X, the simplest way to do this is with Homebrew. Make sure you have XCode installed, then follow the instructions to install Homebrew, or just open a terminal session and type:

/usr/bin/ruby -e "$(curl -fsSL https://raw.github.com/gist/323731)"

Then use Homebrew to install node.js:

brew install node

Finally, install npm (a package manager for node), and use that to install CoffeeScript:

curl http://npmjs.org/install.sh | sh
npm install -g coffee-script

If you’re running Linux, your package manager can install node, then install npm and CoffeeScript. If, on the other hand, you’re using Windows, try following Matthew Podwysocki’s instructions at CodeBetter.

Okay, so now you’re up and running with CoffeeScript. Let’s dive in.

CoffeeScript, you validate me

What I want to run through here is the CoffeeScript syntax, and how expressive it can be. I could build up a really complicated little app, and that would be fun, but it would also be, well, complicated. Let’s keep things simple but practical so we can focus on CoffeeScript, not the problem domain.

Let’s validate a form!

<form id="contact_form">
  <ol>
    <li>

      <label>Name</label>
      <input type="text" name="name">
    </li>
    <li>
      <label>Email</label>
      <input type="email" name="email">
    </li>
    <li>
      <label>Enquiry</label>
      <textarea name="enquiry"></textarea>
    </li>
  </ol>
  <ol>
    <li><input type="submit"></li>
  </ol>
</form>

Woo! Exciting! Let’s crack open form.coffee and see what mischief we can get up to.

required_field_names = ['name', 'email']

So far, so boring. That’s just JavaScript! True, but let’s drop to the shell and compile that file out to JavaScript:

coffee -c form.coffee

gives us form.js, which looks like this:

(function(){
  var required_field_names;
  required_field_names = ['name','email];
}).call(this);

Woah.

What’s happened? The required_field_names variable has been scoped
by var, and the whole script has been wrapped in a namespace. This protects us against one of the most common sources of bugs in JavaScript: accidental global variables. In CoffeeScript, variables are local by default, instead of global as in regular JavaScript. If you’ve never worried about scoping in JavaScript before, you’re very lucky. If you have, then this is a lifesaver.

The first sip

Let’s include the js file in our form:

<head>
  <script src="form.js"></script>
</head>

And then, horribly, let’s add an event handler to the form. Remember, CoffeeScript isn’t a framework:

<form onsubmit="return validate(this);">

Let’s declare our required fields array within this validate function, and return false to prevent the form from submitting during development. In plain JS, we might write the following:

var validate = function( form ) {
  var required_fields_names = ['name', 'email'];
  return false;
}

Now, CoffeeScript handles the variable declaration, so we can lose var. Additionally, function declarations use a more concise notation, so

function( args ) { ... some code ... }

becomes

( args ) -> ... some code ...

and instead of braces, we use indentation to describe blocks. So we get:

validate = (form) ->
  required_field_names = ['name', 'email']
  return false

We’ve also lost the semi-colons at the end of each line. Finally, like Ruby, the result of the last statement executed in the function is automatically returned. So we can lose the explicit return:

validate = (form) ->
  required_field_names = ['name', 'email']
  false

Compile it out with coffee -c form.coffee, check form.js, reload the form in your browser and… the form submits. What’s gone wrong?

CoffeeScript creates everything in a namespace with a local scope. That means our validate function is so far only available to be called within the scope of the form.js file, and not outside. This prevents another function called validate from clobbering our definition, but isn’t very helpful. To get around this, CoffeeScript makes us explicitly declare variables we wish to have a global scope:

window.validate = (form) -> …

is what we need. Compile again, reload the form, submit and… no submission. Progress!

If you’re anything like me, typing coffee -c form.coffee each time you make a change is starting to get annoying. If you’re so inclined, you could use something like CodeKit, but me, I like my command line. Luckily, the coffee command has a watch option. Running:

coffee -cw *.coffee

will launch the compiler and leave it running, recompiling any file that matches the pattern as it changes. Nice.

Grinding the beans

We have a list of required field names, and a submit handler to check them. Let’s get stuck into grabbing the fields themselves. Again, in JavaScript, you might do something like:

var required_fields = [];
for ( var name in required_field_names ) {
  var field = form.elements[name];
  required_fields.push( field );
}

to build up an array of actual fields to check. Yes, jQuery would make this simpler, but we’ll see about that later. Transliterating into CoffeScript, as a first pass, we might write:

required_fields = []
for name in required_field_names
  field = form.elements[name]
  required_fields.push( field )

Not a huge saving, but we can go one better with the for loop:

required_fields = for name in required_field_names
  form.elements[name]

In other words, for will act as a map, collecting together the returns of each iteration and returing those in an array.

Finally, CoffeeScript gives us a little bit of magic to turn this into a one liner:

required_fields = (form.elements[name] for name in required_field_names)

This reads like a sentence:

Required Fields are the form elements for each Required Field Name

But wait. Parentheses? I thought this was CoffeeScript! We don’t need no stinking parentheses!

Well, turns out we do. Parentheses in CoffeeScript are still allowable. In fact, they are used primarily to ensure precedence (just as they are in JavaScript), or sometimes simply to increase readability.

We’ve distilled five lines of JavaScript down to a single, clean line of code that expresses precisely what it does. validate itself is now four lines long. The compiled JavaScript is sitting at around 18 lines of bullet-proof, tight and memory efficient code.

There’s a lot to like about CoffeeScript.

Checking the roast

We have an array of input elements, so let’s check that each has a value. In JavaScript, we could do this:

var errors = [];
for ( var field in required_fields ) {
  if ( field.value == '' ) {
    errors.push( field.name );
  }
}

This would correct an array of bad field names. Let’s CoffeeScript this up:

errors = []
for field in required_fields
  if field.value == ''
    errors.push field_name

This doesn’t seem much of a saving. Notice, however, that there’s only a single statement in the if block. This means we can use the same trick we employed in the for block — putting the conditional statement in front of the condition:

errors = []
for field in required_fields
  errors.push field_name if field.value == ''

Now there’s only a single line in the for block, so we could repeat the trick and move everything on to a single line:

errors = [] ( errors.push field_name if field.value == '' ) for field in required_fields

Note our parentheses again, to help clarify what’s happening to what.

So, our CoffeeScript now looks like this:

window.validate = (form) ->
  required_field_names = ['name', 'email']
  errors = []
  required_fields = (form.elements[name] for name in required_field_names
  (errors.push field.name if field.value == '') for field in required_fields
  false

There are two things left to do: prevent the form from submitting only if we have errors, and then report those errors back to the user.

Serving the perfect cup

Preventing submission on error is now trivial. Replace the last line of the function with a check on the number of errors:

window.validate = (form) ->
  ...
  errors.length == 0

validate now explicitly provides the “yes or now” answer to: are there zero errors on the form? This is very readable and maintainable: the final line of the function sums up perfectly what the function does. The error reporting could be similarly simple, adding the following before the return line:

alert errors.join(',') if errors.length > 0

This is too simple for me, though: no descriptions, just a list of field names. Let’s break this out into an error handling function — report. Add this line instead of the alert:

report errors

then add the following function below validate:

report = (errors) ->
  alert "This form has errors:\n\n" + errors.join("\n- ") if errors.length > 0

Our final source looks like this:

window.validate = (form) ->
  required_field_names = ['name', 'email']
  errors = []
  required_fields = (form.elements[name] for name in required_field_names)
  (errors.push field.name if field.value == '') for field in required_fields
  report errors
  errors.length == 0

report = (errors) ->
  alert "This form has errors:\n\n" + errors.join("\n- ") if errors.length > 0

Since we’ve started extracting concerns, let’s go a step further. Looking at the validate function, it does four things: defines fields that should be completed; collects errors from those fields; reports those errors, and returns false unless it has found no errors.

Cleaning up

This sort of quick extraction has always been a pain in JavaScript. Creating functions can lead to scoping issues, and the syntactical soup to ensure correct scope often prevents extraction from adding to the readability of the code. With CoffeeScript, pulling out functionality such as the error handling is trivial and does nothing but aid readability. We could continue this to clean up validation further:

window.validate = (form) ->
  errors = get_errors form, ['name', 'email']
  report errors
  errors.length == 0

where get_errors just extracts the error-scanning code:

get_errors = (form, field_names) ->
  errors = []
  fields = ( form.elements[name] for name in field_names )
  field.name for field in fields when field.value == ''

we could get even more concise and make report return a value dependent on whether there are any errors to report, which would let us boil validate down to:

window.validate = (form) ->
  report( get_errors form, ['name', 'email] )

This is probably a step to far for just now, but look what we’re doing: we’re actually discussing how to make the code tidier and more readable instead of simply trying to figure out what the heck the code is trying to do in the first place.

That, alone, is the biggest win CoffeeScript gives you: You’re no longer tasked with first conquering your language before you can tackle the domain problem.

Updated: the get_errors function definition thanks to a suggestion by Robin Wellner.

Hack Week round up

Hack Week has been and gone and I’ve finally got around to collating feedback from the team. To give you better insight into what everyone worked on, and the outcome of their efforts, each team has written about the projects they took on and what they achieved.

Test Suite Speed #1

Ben:

I investigated the effects of garbage collection on our test suite speed. I tried turning off GC entirely (our tests run in parallel in child processes that exit before they use up too much memory) and deferring GC runs to every few seconds (using code from http://37signals.com/svn/posts/2742-the-road-to-faster-tests). I discovered that turning deferring GC runs was the best strategy – it reduced our test suite time by about 20%!

During hack week, I had a ton of fun and worked with team members with whom I don’t normally get an opportunity to collaborate. Plus we have faster tests, which is a huge benefit for the whole team. I’m looking forward to the next one!

App-wide Search

Mihai:

As we are about to move to Elasticsearch for indexing our logs, my Hack Week idea was to experiment with building an app-wide search function. It is just a prototype but it enables users to search across Contacts, Projects and Expenses and can easily be extended. Elasticsearch is accessed from Rails using the Tire gem. Instead of using Tire’s after_save callback to keep the index up to date, Elasticsearch has the concept of rivers which pulls new data. Every update triggers an AMQP message using Bunny which is then picked up by Elasticsearch RabbitMQ river.

It was an exciting idea and I really enjoyed the hack week and had the opportunity of experimenting with new pieces of infrastructure which we hope to use soon.

API Client

Graeme B:

Murray and I, with design input from Tane, built CashAgent: a simple mobile cashflow forecasting app using the new version 2 of the FreeAgent API.  We developed the server side of the app in Ruby using Sinatra, and the UI of the app in Javascript using Ember.js.

Our goals: 

  • It was Murray and Tane’s first week at FreeAgent and we wanted them to jump straight into building apps. 
  • Give the new version of the API a good work out, especially our new OAuth 2.0 authentication system.
  • Try out the Ember.js framework (which is great by the way!)

We had loads of fun building the app and are looking forward to releasing API v2 and the next Hack Week.

Revisiting HTTP load balancing

Thomas:

Bugged by a number of shortcomings in the “traditional” approach to scaling via HTTP load-balancing, I spent the time prototyping an approach to this problem based on an idea that has been rattling around in my head for some time. Rather than configuring the address of each of our app-servers in a front-end load-balancer and having this load-balancer “push” traffic to the servers, I inserted a Message Queueing server (RabbitMQ) into the mix, writing a small server to “publish” HTTP requests onto a queue, and letting our app workers subscribe to this queue to do the work for each request.

By the end of the week, I had built a relatively robust prototype which we’ve used in a testing environment internally, which has demonstrated that it’s both fast and scalable enough, and also simplified the configuration and maintenance of our infrastructure.

Which is nice.

Blog posts and open-sourcing hopefully to follow.

BigDecimal / Ruby 1.9.3

Graeme M:

We started out the Hack Week by looking at the performance of Ruby’s BigDecimal, which we use extensively, based on my gut feeling that it was slow, and my secret desire to mess around with C extensions. However, after a spot of performance testing, we discovered that it wasn’t that slow, and it definitely wasn’t a bottleneck.

So we switched tack and worked on upgrading FreeAgent to Ruby 1.9.3 (we’re on 1.9.2 right now). This upgrade, whilst still not complete, will decrease our test suite run time as well as greatly improving Rails boot time. We hope to move the app fully onto 1.9.3 in the near future.

Test Hygiene

JB:

We test everything at FreeAgent, before, during and after development. This means we have a huge suite of tests which we run any time we make a change. It also means that suite takes a long time to run. Any developer will tell you that Test Driven Development requires fast turnaround on your tests. Waiting ten seconds to find out if you’ve broken anything can be deemed too long. The full suite of unit tests in FreeAgent takes several minutes.

Instead of starting from the premise of making tests “faster”, we thought we’d start by making them “better”. Any fool can speed up tests by reducing the number or scope of them. Our goal: get faster while actually increasing coverage.

Result? We won, spectacularly. Reviewing our tests in a concerted effort revealed a number of anti-patterns, chiefly:

  • hitting the database when we didn’t need to
  • testing things more than once, in more than one place
  • trusting our factory-girl factories

the first two were easily spotted and dealt with, and led to a massive speed up. The third was more subtle. Something as innocent-looking as:

Factory(:bill)

was causing a huge overhead. Why? Because a bill needs a contact to be valid, and a contact needs a company to be valid, and a company needs a bank account to be valid, and all of these objects were being created and destroyed every time we needed a bill. Replacing with:

Bill.new

made all of that go away. When all you want to do is check that bill correctly decides when it’s overdue, you don’t care about the rest of the object, and you certainly don’t need the overhead of going to the database to create a load of relationships you aren’t testing. Our factories had grown, but our tests hadn’t evolved with them.

Obvious stuff, but sometimes you need to take that step back and ask “what am I trying to do” instead of “how has this been done before”.

Especially when it takes you from 150 seconds to four.

It goes without saying that Hack Week was an enormous success. Everyone enjoyed it (although on reflection, some wished they had picked something a bit more ‘exciting’!) and it has definitely had a positive impact on the team and the way we’ll approach things going forward (tests in particular). We’re also really excited about driving some of the concepts through to production, such as the new load-balancing solution. And of course we’ll be blogging more about the technologies as they progress (and are hopefully open sourced!).

Watch this space.

Hack Week update

We’re two days into our first Hack Week and we’re already seeing good progress.

Testing is a common theme being worked on by two teams. The FreeAgent code base is fairly large and is complemented by an even larger automated test suite, containing unit, functional and integration tests. This test suite is a massive win for us, enabling developers to aggressively refactor code and be confident that they haven’t introduced any unwanted side effects by doing so. The downside to the tests is the time it takes to run them all, which is currently ~20 minutes. That’s 20 minutes parallelised over 4 hyperthreaded cores on a beefed up i7 iMac.

We have one team looking at reducing the total run time of the test suite, and also reducing the time it takes to execute a single test which, due to Ruby 1.9.2 and Rails, can be frustratingly slow due to the boot-up time, hindering TDD flow. Another team is looking at removing unused test dependencies and refactoring test cases by removing scenarios where we’re testing things too often or sometimes unnecessarily. We’ve already seen one particular test case run time drop from over one minute down to four seconds!

We have a team developing a handy new web app against our new API (currently in beta), and another team looking at optimising floating point arithmetic, which we do a lot of in FreeAgent as you might imagine, and we’re also experimenting with elasticsearch as a foundation for an app-wide search feature for FreeAgent.

Our design and front-end development team are collaborating on a new prototype area of the app, thinking about the past, present and future of your business.

Finally, leading on from the work we’ve been doing at Speeding up SSL, we’re prototyping an evented and queue-based middleware by attempting a novel approach at load balancing web requests.

Now, back to the hack.

Hack Week [initial commit]

Starting today we’re going to be trying something a little different in our development team. For the entire week our project schedules are being put on ice while all our engineers and designers (12 of them) are being left to their own devices to hack on whatever they want, so long as it’s FreeAgent-related.

Hackathons like this are nothing new in the software development world – Google offer 20% time, Atlassian have FedEx day . It’s no surprise that developer-centric companies are doing this more frequently. Hack days provide an opportunity for developers to get properly in the zone and push themselves to deliver something different; to learn and apply a new technology; to deliver that project they’ve always wanted to kick off but haven’t yet been able to prioritise; to take that crazy idea they’ve been thinking about for ages and prove the concept with a working prototype; to pair-up and have fun.

A lot of hackathons are for an exhausting 24 or 48 hours, with long nights and lots of caffeine. Our developers are more than welcome to stay late and hack (we’ll buy in pizza – or more likely, burritos – and everyone can help themselves from our resident beer fridge), but we don’t want to make that mandatory just to get stuff done. Instead, we’re just making the hackathon a whole week long.

Hack Week is a prototype itself. Expectations are high but of course software projects often fail. We’re cool with that though, because we know we’ll learn something valuable from the experience and we’ll enjoy the ride!

I’ll be blogging during the week about all the projects we’re undertaking and I’ll post again about what we accomplished on Friday.

Go FreeAgents!

Puppet and MCollective Talk

Thomas Haggett, one of our senior platform engineers, recently gave a talk at a Scottish Ruby User Group meetup about Puppet and MCollective, two technologies we’ve been embracing in anger at FreeAgent in 2011. We’ll be blogging about what we’re doing with these technologies in detail in the coming months but in the meantime, here’s a little taster video:

Thanks to EdgeCase UK for recording this.

Speeding up SSL

SSL is great; widely supported, easy to set-up, relatively cheap these days and (relatively) secure. We’ve required it from our early days and it hasn’t caused us too many issues other than needing us to renew our SSL certificates from time to time and requiring a few more IP addresses than we otherwise would have needed1.

That said, I recently visited Portland to attend PuppetConf (all about Puppet, a configuration management technology that we’re using, blog post to follow) and when I tried to access FreeAgent from the West Coast I had experience, first-hand, of one of SSLs major drawbacks – namely the effect of latency.

SSL, or TLS to use it’s more up-to-date name, effectively wraps a normal HTTP connection, transparently encrypting data as it is transmitted between the web browser and the server. To establish this secure channel, the client and server must first exchange certain pieces of information in a phase known as the “handshake”. This negotiation typically comprises of: (see wikipedia for more detailed information)

  1. The client opens a standard TCP connection to a port appropriate for the wrapped application protocol (by default 443 for HTTPS).
  2. The client sends a “ClientHello” message specifying its support for protocol versions and encryption algorithms.
  3. The server responds with a “ServerHello” message, agreeing a specific protocol version and algorithm to use. It also sends a certificate (which is part of a remarkably clever mechanism allowing you to trust a previously unknown remote server), and a “ServerHelloDone” indicating the server is happy with a given set of parameters.
  4. The client then sends a “ClientKeyExchange” message which, using asymmetric cryptography and the trusted server identity from the certificate, shares a value crucial to establish a “shared secret” which can be used to encrypt all further communication.  A “ChangeCipherSpec” message is also sent by the client, to mark that all subsequent communication is encrypted, and a “Finished” message is sent which can be used by the server to work out if the negotiation was successful.
  5. The server then uses the clients “Finished” message to perform checks and responds with its own “ChangeCipherSpec” and encrypted “Finished” messages.
  6. The client receives the server’s “Finished” message and verifies it, at which point the server and client have enough information to transparently encrypt all further (HTTP) data.

Looking at this exchange, it requires two full round-trips from client to server and back to complete, whilst the peer simply waits, before HTTP can take over. Bear in mind all this takes place over a TCP connection, so this is in addition to the usual TCP SYN/ACK dance that must also happen for the connection to exist.

Since we’re a Scottish company and our product is currently geared towards a UK audience, our servers are based in the UK. For an average user hooked up via ADSL, even with a relatively poor 50-60ms round-trip time, the time taken for this SSL handshake, 100ms in this case, pales into insignificance compared to the time our servers spend crunching numbers to handle the request. And that is a poor link. My home ADSL2+ line, for example, actually takes 18ms for a round trip, so this handshake just isn’t a problem.

However, the further you travel from the UK, the more this picture changes. When a customer in the US will have to wait, on a good day, 120ms for a packet to get to our servers and back, these small but necessary exchanges begin to add up. And it turns out we actually have a sizeable international customer base using our Universal product. Travel out to Japan and you find the back-forth trip of a message will take a good 260ms. Also consider that, due to the number of links these packets are hopping across to reach their destination, this latency can vary much more wildly (generally increasing!). All things considered, it really was a surprise that we still have paying customers using the site in Australia.

So, the fix!

The trivial fix for this is to simply move the server (geographically and logically) closer to the client, thereby reducing the round-trip-time, and speeding up the handshake. It’s not, however, going to be that simple. In my ideal world, we’d be shipping users’ traffic to a set of servers geographically close to them; splitting and moving our databases around to accomplish this. We’re, sadly, not quite there in terms of international demand to be able to prioritise the work of sharding our database and managing the overheads of multiple, geographically separate clusters. Not to mention overcoming potential policy issues with shipping and storing customer’s data on international servers. Not yet, anyway.

So, we can’t move the app servers or the database. The next logical conclusion is to move the machines which are actively handling the server side of the SSL handshake.

Our infrastructure is composed of multiple application servers handling the requests (we’re using unicorn, by the way), with traffic distributed to these using load-balancer software (nginx, in our case). Since both the app servers and load-balancers reside securely within our production data-center, the SSL encryption and decryption takes place on the load-balancers and unencrypted HTTP is used between internal servers. Since we use Puppet to automate our configuration, it’s now straightforward to create and manage a new remote load-balancer, closer to some of our international clients, which takes care of the SSL termination and uses HTTP (without the SSL/TLS overhead) across the latent link to communicate with our production machines.

Great, but I’m sure you’re politely coughing, ready to interject with a suggestion that sending unencrypted HTTP requests and responses half way around the world might not be the best idea. Pfft to that, I say.

Actually, you’d be right. So the next step is to encrypt this traffic. It would be silly to use HTTPS on a per-connection basis, incurring the handshake penalty for each request again. So instead I’m using the excellent OpenVPN software to establish a single long-lived TLS tunnel between the remote load-balancer and our UK servers. This is ideal as the handshake happens once, it’s only renegotiated every hour and is persistent – able to carry multiple HTTP requests securely, without the penalty of HTTPS negotiation for each request.

So I spent a couple of hours throwing together a proof of concept, just to see what the potential improvement may be.

Ok, ok. The numbers…

To play around with this, I configured an Amazon EC2 instance in their Japan region, configured to proxy traffic, as described above, to our UK load-balancers. To get a “finger-in-wind” idea of the improvement, I’m using the apachebench utility on another EC2 instance.

To get a feel for the latency involved here, I ping’ed our UK datacenter from the EC2 instance:

[thomas@ap-lb1.production:~]$ ping -c3 94.236.51.2
PING 94.236.51.2 (94.236.51.2) 56(84) bytes of data.
64 bytes from 94.236.51.2: icmp_seq=1 ttl=44 time=268 ms
64 bytes from 94.236.51.2: icmp_seq=2 ttl=44 time=259 ms
64 bytes from 94.236.51.2: icmp_seq=3 ttl=44 time=267 ms

--- 94.236.51.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1998ms
rtt min/avg/max/mdev = 259.616/265.232/268.772/4.037 ms

So we’re seeing a round-trip time of ~265ms, as expected. The next step is to do a “baseline” HTTP request from a UK server to the UK app – to see the time spent on the server actually rendering the page. The page I’m using, incidentally, is our login page as it doesn’t require any pesky session cookies, and is relatively lightweight.

ab -n 50 https://tdhtest.freeagentcentral.com/login
              min  mean[+/-sd] median   max
Connect:       14   14   0.4     14      16
Processing:    12   18  22.1     16     171
Waiting:       11   18  22.0     15     170
Total:         26   32  22.1     30     185

So, the best-case is roughly 30ms. Let’s see how the app currently performs for international users, by requesting the page from our remote EC2 instance, via the UK load-balancers:

ab -n 50 https://tdhtest.freeagentcentral.com/login
Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:      985 1043  28.0   1042    1095
Processing:   260  279  26.2    276     453
Waiting:      259  278  26.1    275     452
Total:       1246 1322  43.3   1318    1501

Woah! 1.3 seconds to receive a login page. Now, let’s try with going through the EC2 load-balancer, with traffic tunnelled back to the UK:

ab -n 50 https://tdhtest.freeagent.com/login
Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       16   16   0.8     16      18
Processing:   507  540  12.5    539     567
Waiting:      507  540  12.4    539     567
Total:        523  557  12.4    556     583

557ms – not bad! Less than half the original time, which I’d call quite an improvement. Just to get a slightly different take, I used a remote browser timing tool, loads.in, to time the page load from these locations, from the request sent to the page rendered in the browser. For completeness, the tests were using Safari 4:

  • Baseline reading was 1.9 seconds
  • A browser in Japan hitting our UK load-balancers took 9.4 seconds
  • A browser in Japan hitting our remote load-balancers took 6.7 seconds

Whilst, this is a less marked improvement, it’s worth considering that we load our static assets; images, css files, javascripts, animated gifs, flash videos, background sounds, etc. from a separate domain – and this won’t have been routed via the remote load-balancer, so will be having quite a detrimental impact on the overall page load-time.

I’ll re-try the test, some time, without this offloading, to see what the actual improvement could be.

Now what?

So we can now route specific customers’ traffic through this load-balancer, but what we’d ideally want to do is select which load-balancer to use based on the origin of the traffic. To do this “properly” would require us to have DNS servers configured to return records based on the source IP of the requests. This is more work than I care to undertake for a few hours messing with SSL, but thankfully Dynect (our DNS provider) is able to take care of this for us. They have multiple anycast’ed DNS servers and offer a traffic management system which is perfect.

We’ve not yet enabled this, as it’s little more than a proof-of-concept, but if you’re an international FreeAgent user and would like to try it out, please get in touch.

Anything else?

Since we’re now managing both ends of the crazy long link, a world of tweaks and optimisation becomes possible. For starters, the two things I’m currently playing with:

  • Endless tweaks to the TCP congestion control and windowing algorithms. These regulate the maximum amount of un-acknowledged data that can be sent between the client and server which, as long as packets aren’t dropped, reduces the time either party has to spend waiting for data acknowledgements. Google do crazy things with this to make their homepage super fast, check it out.
  • TCP parameters are “tuned” as connections transfer more data, so having many short-lived HTTP connections – i.e. one per request – kills any benefit, as well as each HTTP connection having an associated TCP connection setup cost (of one round-trip). I’m currently playing around with multiplexing all HTTP requests down a single persistent TCP connection established between the remote and local load-balancers, which overcomes the connection set-up cost, and will allow this connection’s parameters to be tuned as time goes on.

I’ll see how I get on, and perhaps even post a follow up blog post if things go well.

Wow. You made it to the bottom. In that case, I should probably mention to you that we’re hiring!

Friday Link Party 11-11-11

Totally forgot to post here last week so this week it’s a special, bumper, ‘rollover’ edition of link goodness.  Here we go…

And on that note… laters.