How we built the FreeAgent Furlough Calculator in a week

Posted by on May 28, 2020

In light of the Coronavirus events, the UK Government put in place a few measures to help businesses during those uncertain times.

At FreeAgent, we quickly adapt to help businesses during this crisis. Our goal is to keep you up-to-date with the government support available.

HMRC’s Job Retention Scheme Announcement – allowing employers to furlough employees and claim 80% of their wages plus any National Insurance and pension contributions – definitely caught our attention.

After an email from HMRC about how the claim calculator may work, followed up by a phone call to get more information, we, the Tax Team at FreeAgent, decided to tackle the work. This was a great opportunity for us to help our customers relax about tax. There was one caveat though: We only had a week to do the work! So, how did we do it? Let’s find out!

Our process 👩🏻‍🍳 [app]

1. Communication 🎤

When you’re building a feature in one week, communication is key. The first thing we did was create a Slack channel for our calculator team and add everyone who is interested and can provide valuable input in it. Slack is a great empowering tool, that gives us the opportunity to build a conversation and find answers much more quickly than email.

2. Build as we discover 🕵🏻‍♀️

So what was the most helpful solution that we could build in a week? We made a decision to integrate a claim calculator into our existing payroll system, and for businesses not using our payroll – build a separate calculator on our website.

We started with skeletons of our engineering work even as the designs were being built, so that we could “hit the ground running” when the final version was ready. From there, we took an iterative approach, responding to change as each question we had found an answer.

3. Focus on quality 

Testing is crucial to a great product. We use software to test our products but we know the importance of having real people involved in the process. We go through several stages of what we call PPT – Pre-Production Testing. This is a priority for us and the domain experts on our team made extra effort to be available and have this completed on time for the release on the following Monday.

Great! We are ready to deliver! 🌊🏄🏻‍♀️

Or are we? It was Friday, 5:30pm and we were all set to deliver, celebrating with a couple of beers, until an hour later HMRC announced some changes in the claim calculations.

We weren’t going to let this news take us down. After a well-deserved weekend, we tackled the work again early on Monday, implemented the changes, tested it all again and: Voila! Our claim calculator was ready.

Our process 👩🏻‍🍳 [website]

We initially saw an opportunity for the app but quickly realised that we could translate that to the website at the same time. So we had two different calculator projects going on in tandem.

1. Communication 🎤

For anything on our website, we typically need to engage more than one team to produce new content. In this case, we on the Website team needed to engage with Tax Engineering to provide the calculations we needed to build, three teams from our Comms department to write/design/promote this calculator, as well as our Support team to make sure it had all come together accurately and clearly. 

With so many stakeholders and supporting teams, it was key for us to make our work as visible and transparent as possible. We already had a channel for our COVID-19 communications, so used that to post any updates, questions and solicit feedback, which we would cross-post to the in-app calculator channel when relevant.

Another key part of smooth communication was having one person to coordinate support in their team where we needed multiple folks to help out. We have large Comms and Support teams who have different remits and expertise, but by leaving it to the designated representative of each of those teams to organise, we reduce the amount of distractions for the folks producing the calculator in our public communications channels (eg Slack).

Finally, we’d prioritised any work to support our users during COVID-19 above everything else, and had already made some decisions to deprioritise previously planned work. This meant everyone involved had the same priority and level of investment to get this out, which is not always easy to get with cross-team projects like this.

2. Build as we discover 🕵🏻‍♀️

The kicker for the website is that we found out about that impact on our product a few days after the Tax Engineering team. We started building in earnest on a Wednesday to have a live calculator for the same deadline – the next Monday. Usually we take about two weeks to turn around simpler calculators!

We try to follow patterns as much as possible on our website, so we had the bones of the design ready to go, which meant our Website engineers could dive straight in and have half of the calculator working by the end of that first day.

We had a designer and copywriter refining the calculator UI as the engineer improved the backend with feedback from Tax Engineering as they got further in the in-app calculator.

In order to keep the development going smoothly, we had short check-ins with key stakeholders (the folks producing the calculator plus supporting team representatives) 2-3 times per day until it went live to address questions, adjust the scope (we originally tried to tackle weekly payroll AND monthly – way too complicated for the timeframe!) and line up any additional support required.

3. Focus on quality 

The most important aspect of this project for the website calculator was certainly delivery. If we went live even a few days later, we might have completely missed the peak of users needing this kind of help. 

Despite that, there’s no point in putting out a calculator if it’s not providing correct calculations and/or isn’t easy to understand. This is where our Superhero Support team came in.

When we were about 24 hours from having a fully functional calculator, the Website team got our Support rep to line up as many folks as they could spare to cast their knowledgeable eyeballs upon our calculator. Because we were working so quickly, we had to have them on standby, but keeping that rep in the loop through our check-ins and frequently Slack messages meant there was no hold up when we were ready for them to test. Their input led to updates to our copy and well as additional verification of our calculations.

Like our Tax Engineering team, we also faced last-minute updates to our calculator which required additional engineering work and testing… but we still managed to go live on Monday, 20th April 🎉 ! 

Responding to feedback

Even when you’re producing quickly, you’ve got to keep an eye on what users are saying. We had no time for user testing before going live, but we reacted quickly to comments we received afterwards. Over the first few days, we got some feedback that users didn’t understand how we were calculating and the way we determined the pay period. We quickly made changes to the copy and tweaked how the calculator worked to help clarify how we made calculations, which seemed to do the trick.

We had two projects being completed by two teams, with several more supporting closely on both to get them out on time and promoted well. We couldn’t have done all of this without excellent communication and collaboration.

Working from Home: The desks of FreeAgent Engineering

Posted by on May 18, 2020

FreeAgent has always been a remote friendly company. When the co-founders started building the company over a decade ago, they were in different parts of the UK. In ordinary times, roughly half of our Engineering team is remote and everyone else works from our lovely, though currently empty, office in Edinburgh.

Four years ago we posted a handful of pictures of people’s engineering desks, and as the company has grown substantially and everyone is currently working from home, we thought it was time for a wee update.

James home desk setup
James / Support Engineering
Matt Home Desk setup
Matt / Tax Engineering
Patrick's home desk
Patrick / Accounting Group
Kevin's home desk
Kevin / Tax Engineering
Colin's home desk
Colin / Workflow
Simon / Tax Engineering
Simon's "desk" working at a table in an empty park
Simon / Tax Engineering (before the lockdown edition)
Iain's home desk
Iain / Radar
Thiago's home desk
Thiago / Workflow
Olly's home desk
Olly / CTO
Mag's home desk
Mags / Corporate IT
Anup's home desk, with an iPad set up for a 5 year old, along with books for younger readers
Anup / Mobile and 5 Year Old co-worker / School
Hamish' home desk
Hamish / Corporate IT
David's home desk
David / Banking
Nathan's home desk
Nathan / Ops
Stus home desk
Stu / UI Engineering
Diogo's home desk
Diogo / Practices
Ioan's home desk, with 3 flexible screens - iPad, monitor, laptop.
Ioan / Mobile
Anda's desk
Anda / Website
Diego's desk
Diego / Corporate IT
Steve's home desk
Steve / Ops
John / Platform Engineering
Three images of Matt's home desk setup, including a standing desk.
Matt / Architecture Group
3 images of the same mechanical keyboard
Matt / Architecture Group (full mechanical keyboard)
Dave's home desk
Dave / Workflow
Scott's home desk
Scott / Ops

How to Measure Pointless Things?

Posted by on May 13, 2020

Here at FreeAgent we, like so many other workplaces around the world, have been adjusting to a fully remote setup over the past few weeks. Whilst a significant number of our company’s employees are permanently home-based, only one of our seven Analytics & Data Science team members is usually based away from our Edinburgh office. It has felt strange.

We decided fairly quickly that we needed to create new opportunities for the team to ‘convene’ in lieu of the facetime we’d usually get in the office. We scheduled a short daily catch-up after lunch: its primary purpose was to make sure everybody was doing okay and perhaps to discuss the weather or our snacking levels.

Sometime during the first week, I suggested a quick round of Pointless during one of these catch-ups. I’d been gifted a Pointless quiz book a couple of years before, and I thought it’d tap into the competitive nature of many members of our team. I didn’t know the half of it…

The Game

For the uninitiated, the concept of Pointless is pretty simple. Behind the scenes, 100 members of the UK public were quizzed on a range of weird and wonderful questions. For example: in 100 seconds, name as many capital cities of European countries as you can. 98 might have said London, 92 might have said Paris, 6 might have said Zagreb. The contestants on the show are then shown the same question (name a capital city of a European country) but they would have to give the most pointless answer that they could think of – that is, the answer that fewest out of the 100 gave. In this case, London or Paris would have been bad answers, whereas Zagreb would’ve been a good one. The person who gives the best answer is the winner.

There were four players in our first game. As any good data analyst would do, I tracked players’ responses and scores in a spreadsheet. Ipek, one of our BI analysts, won the first game. We applauded her, discussed the possibility of playing again the following day, and went about our afternoons.

Player Answer Score ↓
IpekLjubljana1
LanaReykjavik4
RobBucharest4
DaveRiga8
Q: Name a capital city of a European country

The next day, we had our entire team – six players and me – for game two. Dave, our team lead, won the game with a pointless answer (an answer that none of the 100 people surveyed gave). I tracked the scores again. There was some controversy: apparently, when given a short amount of time to name as many words ending in ‘erry’ as they could, 60 out of 100 members of the UK public responded with loganberry – compared to just 28 for cranberry. Who knew the loganberry was so widely appreciated?

Player Answer Score ↓
DaveHuckleberry0
RobLingonberry2
DavidCloudberry5
OwenLoganberry60
LanaBlackberry84
IpekStrawberry92
Q: Name a word ending in _erry

After our third game, two things became clear. Firstly, we were all enjoying spending 10 minutes of our afternoons doing something a bit silly and getting competitive. Secondly – and more importantly – if this were to continue, we’d need some kind of long-term scoring metric.

Measuring Pointless Things

So, who should be crowned our Pointless champion? To date, we’ve played 16 games. I’m going to explore the five metrics that we came up with to answer this question.

Metric #1: Games Won

To begin with, we started referring to the person who had won the most games as ‘the person who was doing best’. It’s a nice, simple way to measure things. After 16 games, Ipek leads the pack on Games Won. Our top 3 looks like this:

Player Games Won ↑
Ipek4
Rob3
Dave3
Metric #1: Games Won

Metric #2: Win Rate

Sadly, not every team member is able to play every day! This means that some players have played more games than others. Games Won doesn’t account for this, and so it doesn’t reward players who have won lots of games despite not having played many. I decided to calculate a Win Rate for each player: the proportion of the games they played that they won. After 16 games, Dave knocks Ipek off the top spot, having won 38% of the games he played:

Player Games Played Games Won Win Rate ↑
Dave8338%
Ipek12433%
Rob10330%
Metric #2: Win Rate

Metric #3: Mean Rank Score

The trouble with both win-based measures is that the person coming second receives no recognition. You could come second out of six in every game and you’d still end up at the bottom of the table. To try to capture the overall performance, we gave each player a Rank Score for each game. The player with the best answer is given a Rank Score of 0, the player with the worst answer is given a Rank Score of 100, and the players in between are given a Rank Score dependent on the total number of players in that game. Over a number of games, we can then calculate the average of each player’s Rank Scores to see where they tend to end up in the rankings, from 0-100.

Let’s take the first 2 games described above as an example. In the first game Dave has the worst answer and is given a Rank Score of 100. However, in the second game he has the best answer, so he’s given a Rank Score of 0. His Mean Rank Score, after 2 games, is 50:

Player Game 1: European capitals Game 2: _erry words Mean Rank Score ↓
Answer Score Rank Score Answer Score Rank Score
RobBucharest450Lingonberry22035
DavidCloudberry54040
DaveRiga8100Huckleberry0050
IpekLjubljana10Strawberry9210050
OwenLoganberry606060
LanaReykjavik450Blackberry848065
Mean Rank Score calculation after 2 games

Taking this approach for all 16 games played so far starts to build a nice Mean Rank Score picture in the table below:

Player Games Played Mean Rank Score ↓
Rob1034
David1136
Dave848
Metric #3: Mean Rank Score

Let’s try to extract some meaning from these numbers. Remembering that the median player in each game would be given a Rank Score of 50, Dave’s Mean Rank Score of 48 indicates that, on average, he just about makes the top half each game. Rob, who leads on this metric, tends to finish around the border of the top third of players each game.

Metric #4: Mean Standardised Score

Our next metric was probably the most over-the-top yet crucial metric. Taking a look at the results for game 2 – the ‘erry’ words – we can see that there were two ‘clusters’ of scores. The best answers scored 0, 2 and 5, while the worst answers scored 60, 84 and 92. However, our Rank Score approach considers each of these players to be evenly distributed – that is, it thinks a score of 5 (David’s cloudberry) sits halfway between a score of 2 (lingonberry) and a score of 60 (loganberry). But, if you take a look at the scores, it’s easy to see that cloudberry’s 5 deserves more credit than that.

Why not just take a player’s mean raw score across all their games? Well, different games have different magnitudes of scores. If somebody missed a game in which most of the possible answers were low scorers, or vice versa, they’d be immediately disadvantaged. For example, Owen and David didn’t take part in game 1, which had loads of good answers available, and therefore didn’t have the opportunity to get as low a mean score as the others. We want an approach that doesn’t consider each player to be evenly distributed, but does account for the magnitude of scores in each game. Enter Mean Standardised Score.

In a given game we want to know, in essence, how well or badly each player performed relative to the other players – we want to standardise each player’s score. This is a two-step process. Firstly, how far above or below the mean score were they? Secondly, how did that compare to how far away everyone else was? If a score was 20 above the mean but on the whole scores were close together then we want to penalise that, whereas if a score was 20 above the mean and on the whole scores were spread out then we don’t want to penalise that as heavily.

In game 1, the mean score was 4.3 and the standard deviation of those scores (how spread out they were) was 2.9. Using Dave’s score of 8 as an example, we can firstly see how far from the mean score he was by subtracting the mean from his score (8 – 4.3 = 3.7). We can then factor in how spread out the scores were on the whole by dividing by the standard deviation (3.7 / 2.9). This gives us Dave’s Standardised Score (for game 1) as 1.28, which indicates that his score was 1.28 standard deviations higher than the mean. Ipek’s Standardised Score, -1.14, indicates that her score was 1.14 standard deviations lower than the mean (‘better than average’).

Over a number of games, we can then calculate the mean of each player’s Standardised Scores to see how their answers tend to perform relative to the mean score in each game. A Mean Standardised Score of 0 would indicate that the player tends to come close to the mean score in each game (or does badly as equally as they do well) and a negative Mean Standardised Score indicates that the player tends to outperform the average. Let’s take the first 2 games described above as an example. (Note: there has been some deliberate premature rounding to simplify the example here!)

Player Game 1: European capitals (μ = 4.3, σ = 2.9) Game 2: _erry words (μ = 40.5, σ = 43.1) Mean
Standardised
Score ↓
Answer Score Standardised Score Answer Score Standardised Score
DavidCloudberry5-0.82-0.82
RobBucharest4-0.10Lingonberry2-0.89-0.50
IpekLjubljana1-1.14Strawberry921.190.05
DaveRiga81.28Huckleberry0-0.940.17
OwenLoganberry600.450.45
LanaReykjavik4-0.10Blackberry841.010.46
Mean Standardised Score calculation after 2 games

We can see that the Standardised Score given to David’s cloudberry is much better than that given to Owen’s loganberry, solving the issue with the Mean Rank Score.

Applying this to all 16 games played so far gives us yet another top 3 in the table below. David tops the board with a Mean Standardised Score of -0.43: on average, he scores half a standard deviation below the mean each game.

Player Games Played Mean Standardised Score ↓
David11-0.43
Rob10-0.28
Owen13-0.16
Metric #4: Mean Standardised Score

One thing to note about the Mean Standardised Score is that it really punishes incorrect answers – which score 100 – as typically 100 will be several standard deviations higher than the mean. On the other hand, it rewards particularly good answers: if you win by a comfortable margin, it will reflect that better than the Mean Rank Score would have done.

Metric #5: Points

But, as the founder of the modern Olympics once said, the most important thing is not winning but taking part. We wanted to reward participation too, and so a points-based approach was suggested. For each game they play, a player receives a number of points. The number of points is determined by their rank in that game: if they got the best score, they receive 5 points; if they got the second best score, they receive 4 points, and so on. We then add up each player’s points from all the games they’ve played, which captures both performance and participation. Our Points approach puts Lana, our web analyst, top of the leaderboard:

Player Points ↑
Lana43
David42
Owen40
Metric #5: Points

Rules of Measurement

So who should be crowned our Pointless champion? Well, (almost) everyone, depending on which way you look at it:

Metric Champion
Games WonIpek
Win RateDave
Mean Rank ScoreRob
Mean Standardised ScoreDavid
PointsLana
Our Pointless champion(s)

There are many other things that we haven’t considered, such as controlling for the advantage (or disadvantage) that the order of responding may have, or calculating each player’s rank amongst all the possible correct answers.

But this isn’t intended to be the definitive guide to long-term scoring in remote games of Pointless. Whilst it has been a lot of silly fun, there are a couple of important takeaways that are immediately apparent:

  1. Deciding how to measure something – whether that be success, failure, or some scale thereof – is not always a simple task. Some would say that you need to decide how you’ll measure the success of an activity before you undertake it, as you won’t be able to measure objectively if you wait until the activity has started. To some extent, this is true. However, as shown here, the intricacies and complications around measurement may not become apparent until you start. You can try to anticipate these complications but it won’t always be possible. It’s important to have an unbiased eye, somebody with no agenda, involved in defining those measurements.
  2. With enough data, and enough motives, you can often spin your numbers to tell any story you want to tell (in fact, I decided to start writing this blog when we hit a point at which our five metrics each showed a different ‘champion’ – very meta). With careful selection of the data above, five of the six players could draw the conclusion that they ‘won’ and shout it from the rooftops. If the people receiving this conclusion didn’t do so with a critical eye, it would inevitably (in the majority of cases) lead to suboptimal decision-making somewhere down the line. It is crucial that recipients of data a) understand exactly how conclusions have been drawn and b) challenge those conclusions or methodologies appropriately.

So take care when you’re measuring things, you’re being given a measurement, or you’re being told about a measurement somebody else has received. And if your team is finding remote work lonely, get yourself a copy of a quiz book!