Going Underground

Posted by on February 7, 2013

Back in the hazy days of last summer we kicked off a project to improve the infrastructure behind FreeAgent, to prepare ourselves for an order of magnitude (or two) of very high growth in the coming years, as well as greatly bolstering our DR capablility. Deciding on a hosting strategy is not simple. Unlike when we first started FreeAgent, today there are a multitude of options: full cloud hosting, fully-managed dedicated hosting,
private clouds, hybrid clouds, managed co-lo, full co-lo.

For the past two years we’ve been running a hybrid cloud at Rackspace, UK. This was a mix of managed, dedicated hardware (for us, Dell R710s) – some servers running on bare metal, some virtualised with VMWare – as well as on-demand VMs in the Rackspace Cloud. This has worked pretty well for us. Performance, network and physical device reliability have been extremely good. The cloud side however, whilst reasonably flexible, hasn’t met our expectations in terms of performance and reliability. Furthermore, our in-house operations experience has increased dramatically, and with this so have our demands and expectations of control at every level of the stack.

One critical focus for a financial service such as FreeAgent is security and resilience. We need the physical security that comes with a Tier 3 data centre, we need to run a VPN to provide network security, and we need fault-tolerance at every level from the web stack (load-balanced app servers), to the database tier (clustered MySQL instances, backups) to multiple peering arrangements at the data centre (DC). At every level we have worked to remove single points of failure, but ultimately the DC is itself a single point of failure (admittedly with a far higher MTBF). With this in mind, we decided to go for DC-level redundancy and operate FreeAgent out of two sites.

We looked at a number of options with the following requirements:

  • Multiple geographically isolated sites
  • At least Tier 3 facilities (power, cooling, etc)
  • Low-latency, dedicated interconnect between sites
  • Outside London (so we’re not constrained by power in the future)
  • Military-spec
    EMI protection
    and TEMPEST RFI shielding*
  • Ability to withstand a 22 kiloton thermonuclear
    bomb*

Our research (and strong recommendations from existing customers) led us to our chosen hosting provider, The Bunker. Without further ado we acquired two 42U racks, one in each site (Greenham Common and Ash, Kent) and started filling them with the required hardware. What followed was several months of infrastructure enhancements, network reconfigurations, a lot of testing (we’ve been running our heavily-used integration environments out of the Bunker sites for months), as well as providing ongoing support for our existing infrastructure. Finally, after an epic amount of work, our Ops team made the final switch over to our new home on Tuesday this week.

cue applause

Moving an extremely popular web app between data centres is an enormous challenge. Our integration, staging and production stacks now consist of over 180 virtual machines across 11 VLANs, all of which needed to be migrated from one physical data centre to two new ones. It wouldn’t have been wrong to schedule a significant, several-hour outage for FreeAgent in order to do this, but doing so would significantly inconvenience our customers, not to mention drastically affect our historically high uptime stats of which we’re immensely proud! So, we cooked up some special magic sauce and performed the move in an incredible 9 minutes. In reality, 8 minutes and 55 seconds of this time was spent double-, triple- and quadruple-checking everything had switched without issue, which indeed it had. Either way this was an astonishing achievement by our Ops team and a great live test of our DR procedure.

As FreeAgent hums along happily in her new home, we’re now busy monitoring, measuring, tweaking and optimising. The first stage was to make the move with the existing architecture. Check. The next stage is to build on this, improving performance, building tools and automating everything we touch.

Even though we’re now 30 metres underground, the future is extremely bright.

* Optional, but obviously desirable

p.s. If FreeAgent sounds like your kind of company, I should point out that we’re hiring!. We’re looking for talented web and operations engineers of all levels, from graduates to team leaders. Get in touch if you’re interested!