August 2018 – Grinding Gears

My Summer @ FreeAgent

Posted by Struan Robertson on 31 August 2018

Hi, I’m Struan. Nice to meet you. I’ve been working at FreeAgent over the summer as an Engineering Intern within the Workflow team and to round off my time here, I’ve written an Engineering Blog post about my internship, covering the main projects that I’ve been working on over the past three months.

The Beginning

My time at FreeAgent started off with meeting the Workflow team, my team for the summer, and then going to the first of many inductions. This first induction was probably one of the most important that I had that week. Mainly because it was my IT induction, and it gave me access to all the resources that I need on a daily basis. I then immediately moved into my first ever retrospective and sprint planning session.

During my team’s planning session, I was assigned my first task — creating and sending bounced email notifications. More about this later. Before I could start any of the fun stuff, I first needed to get myself set up: getting added to FreeAgent repositories on Github, and setting up the FreeAgent app in my development environment. I also took some time in my first weeks to shadow our superhero support team for a morning to find out first hand what our customers regularly need support with.

Bouncing Emails!

Now back to those bounced email notifications. This was a new feature in FreeAgent based on information that we already stored about the invoice emails that we send on behalf of our customers. Its purpose was to make it easier to work out when an email had actually bounced, and let our customers know so they can follow up and make sure they still get paid. This being my first foray into the FreeAgent code base, I took some time exploring how all our email systems work together, and how and when we get notified that an email had not been delivered. I then got to work adding the feature and just 33 comments and a test later, the code was ready to ship out to the world.

Now as we all know, all companies have their own slightly quirky traditions. FreeAgent is no different; and with your first Pull Request merge you get the Ship-It Squirrel (with lots of hats on!).

Unfortunately knowing all the code instantly is hard, so being a newbie I overlooked a couple of use cases for my first ever feature. So, I went back to what I had just finished working on and made some edits to make sure that it was not causing any unexpected behaviour. One of the awesome things was being able to get feedback about this new feature pretty much immediately. Once it was launched, it was great to see our customers saying “This is great news”, “Very happy to see this update!” and “That’s good to know! I missed out on a payment because a client didn’t receive an invoice last year”.

The Case of the Foreign Exchange

Now onto my next project: getting into the scary world of foreign exchange rates and updating the XE.com API that we use to automatically convert customers invoices, expenses and bank accounts entries to/from a foreign currency. We’ve been doing this from an older version of XE.com’s API for a while now. So, Jonathan (the other Engineering Intern) and I were tasked with upgrading this to the newest, fanciest API from XE.com. The new API had quite a few improvements over the old one, and helped to ensure that we were able to continue to provide exchange rates to our customers.

This sounds simple, but this hard-working code had been working in a mostly unmodified state for a while. So, to complement the upgrade work that we completed, we also did a good old refactor and closed a five year old Pull Request! Additionally, to help us know when something goes wrong with the API we added a new feature to our Dev Bot (our Engineering Slack Bot) to check we had received the currencies that we are expecting from XE.com, and to highlight any issues with them. To let everyone know about all the changes we’d made, we rounded it off with a talk to the Engineering team at our Engineering Forum.

How Much Did I Spend?!?

The final major project for me was the Spending Categories Report and Insight. This is a new Insight and an accompanying Report that shows FreeAgent users how much they have spent across their various accounting categories over time. This information has always been in FreeAgent, but just like bounced email notifications, it wasn’t the easiest to get hold of! So to help our users get an idea of what they are spending their money on, we created a new Insight to highlight this to our users. The Insight provides a snapshot of the spending, but what if our customers wanted to see where they’d spent their money over the last year? To complement the new Insight, we created a new Spending Categories report as well. This project involved going a bit deeper into the accounting code, finding out how we stored transactions, how to access them, and then group and sum them together.

This was probably one of the most interdependent of all the projects that I’ve done so far — I had to coordinate what was happening with product, design, communications and support departments to ensure that it looked right, sounded right, and that everyone knew it was coming. We also took some time at our weekly Town Hall to tell the whole company about and demo the new Report and Insight too.

I’ve really enjoyed my time as an Intern at FreeAgent. I’ve learnt so much about how they work and had a chance to work with some really awesome people on some cool new features. I hope this post gives you a bit of an insight into what I’ve been up to over the last couple of months.

DevBots, squirrels and bouncing emails — just another intern’s summer

Posted by Jonathan Coates on

Over the last three months, I have had the wonderful opportunity to join FreeAgent’s Workflow team as an Engineering Intern. As the internship draws to an all-too-soon close, I’ll look back at some of the things I’ve been able to participate in this summer.

Getting going

When I entered the office on the first Monday morning, I was introduced to the other members of Workflow, where we were promptly whisked off to a retrospective on the previous two-week sprint. Having never really been exposed to these before, it was all a little bit overwhelming. That said, it was incredibly useful to see how the review and planning process operated and, as the weeks went on, I was able to contribute more to these.

My first task was to look into sending emails whenever a user performed various security-related actions on their account, such as changing password or email address. This was my first dive into FreeAgent’s codebase, and so there were inevitably a few stumbling blocks, but people were always happy to explain things!

While developing these emails, I was also given my first proper exposure to the rest of the “process” outside of just writing code. Both in terms of how the spec developed as we explored how things should function in more detail, and how we ensured we shipped something of quality. While various contributions I’ve made elsewhere had exposed me to code review and quality assurance, it’s been really beneficial receiving much more through critique of my code. Over the time I’ve been here, I’ve also become more comfortable giving useful (I hope at least!) feedback on what other people have produced.

Over the internship, I’ve been able to work on all sorts of different parts of the app. An earlier project Struan¹ and I tackled was updating our exchange rate importer to use the latest version of XE’s API. This was a relatively untouched² part of the app, and so it was a fantastic opportunity to both update our XE integration, and rethink how the importer was designed.

One of the other projects I worked on was adding the ability to disable the notifications we send when an email bounces. This inevitably involved modifying the database’s schema, and so I was dunked into the ever-complex world of online migrations. As someone who’s never had to worry about uptime, it was an illuminating process.

I think one of my biggest takeaways this summer is being able to see all sorts of technologies being used in practice, rather than just in my own toy projects. I’d fiddled with Rails and React before starting the internship, but one’s small programs don’t really compare with a project with more than 10 years worth of work behind it. It’s always a little bit of a surprise when you can no longer hold the whole architecture within your head.

I’ve spent a little bit of time working on optimising a couple of bits of the app and, while I can’t say I’ve made much impact on those areas, it was a really enjoyable process trying to find various bottlenecks and reduce them. As part of that, it was interesting trying to find solutions to problems which were both somewhat elegant and efficient. I’ve definitely felt at times like I’m “fighting” ActiveRecord/ActiveModel, so there’s the fun challenge of balancing the dynamic between clean abstraction and performance.

Engaging with Engineering

One part of the week I’ve really enjoyed is the “Engineering Forum”. This is an opportunity for people to talk about projects they’ve been developing over the recent weeks and months, as well as discussing interesting bugs, or just showing off personal projects.

This is great, not only as a way for the various engineering teams to keep in contact, but also in allowing you to see technologies or domains one would otherwise not. I can’t say I’m an expert on data centre flips, but it was fascinating to see how a system could be architected to facilitate it.

Company-wide collaboration

Another thing I’ve really appreciated is how engineering is well connected with the rest of the company. Throughout the first few weeks Struan and I met with the head of each department, which meant we had a wider understanding of what the various teams did. Similarly, the weekly town hall was both a great way to know what other teams were up to, but also to see the problems that people outside of engineering were tackling.

One of the later projects I worked on with Struan was developing the spending categories insight and report. Not only was this an interesting engineering project, it required all sorts of coordination with other departments, from establishing what would be useful to accountants, to communicating the changes to our user base. It was a delight to be privy to this process and I must say a massive thank you to our product manager, Ruth, for wrangling everyone and keeping up the momentum as the project drew to a close.

All in all, it has been a real privilege to be part of FreeAgent these past few months, and I am insanely grateful to everyone both in Workflow and the wider company for helping me feel welcome and putting together a fantastic internship.

Footnotes

[1] The other engineering intern in workflow.
[2] XE’s API hadn’t really been changed since the library was first written, and so much of the code was 10 years old. It was actually really interesting going through the test data, and seeing currencies no longer in circulation.

Summer in the city: my data science internship at FreeAgent

Posted by Hannah Tribe on 30 August 2018

During a wet and windy January afternoon, I was indulging in fantasies about summer, hot weather and holidays. Pulling me back to reality was the realisation that I’m not some jet-setting socialite but a university student lacking in work experience and fast approaching the end of their academic career!

I only had one year before I’d be forced to crawl from under the comfort blanket that is student loans, friendly lecturers and campus life. I knew that in the increasingly competitive job market, real-world job experience within my degree field of mathematics would be highly beneficial.

Reluctantly, I changed my internet search from beach holidays to internships in Scotland and soon after I stumbled upon an ad for a 13-week data science internship at FreeAgent. Here’s what happened next!

Discovering FreeAgent

The more I read about the position and the company, the more attractive the prospect of spending my summer working became. I could tell there was something different about FreeAgent and the job itself seemed like a perfect fit.

During university I have become acquainted with a number of- predominantly mathematical based- programming languages, and found I both enjoyed it and was quite competent. There was, however, significant room for improvement and the job description detailed several languages in which I’d had no experience. I therefore knew the application process alone would be a challenge, but there’s nothing I love more than a steep learning curve and the chance to test my own abilities! So, after swiftly typing up a cover letter, I clicked “apply”.

As I expected, the interview process was exacting; it included a programming task, a presentation of my results during a telephone interview and a final interview a few weeks later. I was delighted when my position was confirmed by my soon-to-be team lead, Dr. Dave Evans, in April. I confirmed my start date as June 4th and spent the months between wishing away exam season and daydreaming about my summer in Edinburgh, beach holidays long forgotten.

Living in Edinburgh

When June 4th arrived, I awoke bright and very early; at that point I hadn’t found accommodation in Edinburgh, so I embarked on the commute from Dundee, which continued for the subsequent three weeks. This was probably the biggest challenge I faced, but once I’d found suitable accommodation I was able to fully engage with life in Edinburgh. Living alone gave me a lot of time to explore the beautiful Union Canal, next to FreeAgent’s offices, and discover hidden gems tucked away in Edinburgh’s winding streets.

By week 10 of my internship the Edinburgh Fringe Festival was in full swing. I’d never been to the Fringe before so if, like I was, you’re unaware of the scale of the festival I’ll give you some figures. In 2017 there were 53,232 performances of 3,398 shows, with over 2.6 million tickets issued. With it being the world’s largest arts festival, the entire city takes on a new energy, which was amazing to experience.

Working for FreeAgent

I had an inkling that FreeAgent was a bit different from other companies when I applied, but it took working there to appreciate just how special the company is. I arrived on my first day into a bright office, spanning two floors and boasting fantastic views of Edinburgh and the surrounding areas. My desk, situated behind the sprawling balcony, has a perfect view of Edinburgh Castle and Arthur’s Seat.

My first week was spent acclimatising to the working environment within the office. I was pleasantly surprised by how friendly everyone was; there’s an atmosphere at FreeAgent that encourages you to interact with everyone, even those you might not get the opportunity to work with. I especially enjoyed integrating into the data science team and getting to know Dr. Dave and our other colleagues, Dr. David (it’s as confusing as it sounds!) and Charlotte, a fellow intern. I also witnessed the interview process, from the other side of the desk, when our newest team member Owen was recruited as another permanent data scientist towards the end of my internship.

There are many traditions that contribute to FreeAgent being such a special place to work. Many of them are exclusive to our team, and the vast majority are food related! The data science team breaks up the working week with trips to the ‘The Counter’, a canal boat serving fantastic coffee and bagels, on a Thursday morning. One of my favourite things about FreeAgent, however, is the company-wide ‘town hall’ talks held every Friday afternoon. The whole company sits down with a beverage (alcoholic if you so please!), and listens to presentations from other members of staff about a wide range of fascinating topics.

FreeAgent also holds many social activities outside of work that are not to be missed! July’s scavenger hunt was a personal favourite: teams of five spent two hours running around Edinburgh completing tasks and taking photos to raise money for charity. It was fabulous fun and just another thing that helps cement FreeAgent’s wonderful company culture!

My intern experience

Life as an intern is fascinating: you come into your new place of work with a lot to learn in just a handful of weeks, and there’s a desire to leave with the feeling you’ve made a significant contribution to your place of work. I found a benefit of such a short work placement is that I had a focus.

I was tasked with producing a dashboard that would help FreeAgent staff to better understand our accountancy practice clients and enable our marketing and sales departments to offer more targeted and meaningful advice. The goals, context and impact of the project were outlined at the start of my internship so I always knew what I was working towards. This, along with FreeAgent’s agile approach to working and some amazing productivity tools such as Trello, all helped me stay focused on the task at hand.

Another benefit of being an intern is how much you’re taught in such a short period of time. I was determined to learn as much as possible, not only about data science, but also how to conduct myself within a professional environment. I made a conscious effort to contribute and ask as many questions as possible. This allowed me to have an input in meetings with important accountancy practice customers within my first week, and led to me standing up at town hall presenting my project to the entire company by my third.

The only downside of an internship (bar the fact I can’t stay here forever!!) is the time constraint! I found the more I got into my project, the more invested other people became and the more I wanted to develop it. I had to learn to prioritise requests and be upfront and realistic about what I could complete. Even though the time constraints were often inhibiting, they were also beneficial in helping refine my time management and organisational skills.

What I’ve achieved

As I approach the end of my internship, I’m starting to reflect on what I’ve achieved in my time at FreeAgent. I’m delighted to say that I’ve received a lot of positive feedback on my project and it’s extremely rewarding to know that my work will make an impact on the business. I came to FreeAgent with basic knowledge of programming but it wasn’t the strongest part of my application and there was significant room for improvement. I learnt the fundamental parts of SQL within my first week and after 13 weeks I’m confident in my ability to code in a previously unknown language.

I’d used Python for mathematical based problems during my university studies, however during this project I’ve become comfortable applying Python to a diverse range of situations. To create visualisations of the data for my project, for example, I used a Python library named Bokeh. This library focussed on interactivity, the increase in which is a common trend in advancing data science resources and I enjoyed this element of Bokeh. The minor drawbacks of Bokeh, predominantly the inability to customise, also meant I had the opportunity to work briefly with CSS, something else I’d never done before.

My final project is a dashboard filled with graphs and tables, detailing a specific practice’s interactions with FreeAgent and how these compare to the patterns of behaviour in general. I’ve worked closely with all departments in FreeAgent in order to produce something that will be both usable and beneficial. The dashboard has been updated and refined with each person’s input and I feel the final product reflects the amount of hard work gone into it.

What I’ve learned

The insecurities I felt prior to my first day at FreeAgent have been entirely replaced with a new found assuredness in my own ability. I have learnt so much: not only am I more confident in data science and programming, but I’ve had a full induction into the industry I wish to pursue a career in. FreeAgent accommodated me wherever possible to get exactly what I wanted out of my internship and I’m proud of the work that I’ve produced and the feedback I’ve received. This experience as a whole has reaffirmed my degree and career choices, as well as opening my eyes to paths I’d not previously considered.

Working at FreeAgent over the past 13 weeks has been an experience I’ll never forget and I’d like to take this opportunity to thank every member of staff here, but especially the data science team, for teaching and supporting me, giving me constructive feedback and most importantly making my time here so enjoyable!

Dealing with dirty data: useful functions for data cleaning in R

Posted by Charlotte Wooley on 15 August 2018

In this blog post, I’ll explain how to use some simple R-based data cleaning solutions (mostly in the ‘tidyverse’ package¹) to address the most common dataset errors with the help of my favourite analogy: the untidy kitchen!

NB: There are a plethora of valuable data cleaning tools in other software and even within R there are many different tools available. While the approach that I describe here is not necessarily ‘the best’ way of doing things, I’ve found that it’s what works for me.

Setting the scene

The kitchen areas at FreeAgent are usually very clean but in this scenario, let’s imagine that we have been very messy recently! We haven’t been putting our mugs in the dishwasher or generally keeping the kitchen clean so we’ve designed a daily rota to make sure that one person is responsible for giving the kitchen a five-minute blitz after lunchtime.

The data science team are interested in analysing the data to find out why people became so messy in the first place, so we asked everyone to keep details of their cleaning duty in a shared spreadsheet. We asked them:

their name
the date when they last cleaned the kitchen
whether the dishwasher was full at the time
the number of mugs they found on the side
whether they wiped the sides or sink

We made the ‘number of mugs’ question compulsory because we thought people might be lazy and not want to count them. We also allowed people to write any notes in a separate column. We were excited to see the data after the first two weeks of the rota system but we found that it was very difficult to interpret. Although we had greatly reduced the amount of mess in the kitchen, we now had a new chore: data cleaning!

Tidying tools

If you would like to follow along with this tutorial, the data we collected can be downloaded here.
Let’s take a look at the data we collected:

# Import the libraries needed

library(tidyverse)
library(lubridate)

# Read in the csv with the "read.csv" function

dat <- read.csv("cleaning_data.csv")

We can see that many of the common errors I identified in my most recent blog post are present in this dataset:

Removing NA misclassifications and white space

First of all, it looks like some ‘NAs’ (blank spaces, ‘N/A’, ‘NA’) were not recognised when the data was read in. Also, we suspect there might be some leading and trailing white space because there were free text boxes in the survey.

# Read in the csv with the alternative "write_csv" function and the
# following options to remove NA misclassifications and white space

dat <- read_csv("cleaning_data.csv", na = c("", "N/A", "NA"), trim_ws = TRUE)

Removing duplicate data entries

We can also see the final row is duplicated, where “Davie” accidentally copied and pasted a row!

# The %>% function pipes (transfers along) the data to the next
# function
# The "distinct" function removes duplicated rows

dat <- dat %>%
      distinct()

# NOTE: Sometimes duplications might not be fixable with "distinct".
# Imagine if Davie had copied the row but then changed the value
# in the "no_of_mugs" column to 5. This would mean that the data
# was similar rather than identical and the distinct function
# would no longer be effective

Re-classifying dates in different time formats

The ‘date_cleaned’ column was recorded in a free text box so is classed as a character string rather than a date and has been inputted in lots of different time formats, making date-based calculations impossible.

# The "mutate" function is used to make new columns
# The "parse_date_time" function converts dates written in different
# formats as strings into a UTC format. The "dmy" option tells R
# that the day comes first, month second and year last

dat <- dat %>%
      mutate(date_cleaned = parse_date_time(date_cleaned, "dmy"))

Separating information that has been merged together in one column

The ‘sides_and_sink’ column is difficult to interpret because it contains information about whether both the sides and the sink were cleaned, which would be easier to analyse if it was in two separate columns.

# The "case_when" function allows values to be chosen based
# on conditional boolean arguements (if something is true/false:
# do something, if not: do something else)
# The "str_detect" function allows chosen words to be detected
# within a string
# The "is.na" function identifies if a value is NA

dat <- dat %>%
      mutate(clean_sides = case_when(str_detect(sides_and_sink, c("side|both")) == TRUE ~ TRUE,
              is.na(sides_and_sink) == TRUE ~ NA,
              TRUE ~ FALSE),
             clean_sink = case_when(str_detect(sides_and_sink, c("sink|both")) == TRUE ~ TRUE,
              is.na(sides_and_sink) == TRUE ~ NA,
              TRUE ~ FALSE))

# NOTE: Take care when using "str_detect" because of language
# misinterpretation. If someone had written "Cleaned the sides
# but not the sink", this would have lead to a positive value
# for cleaning the sink although it should have been negative.
# Similarly if we had searched for "sides" rather than "side",
# then we wouldn't have detected all of the instances of when
# the sink had been cleaned. There are other tools to deal with
# truncated words which are not covered in this tutorial.

Changing alphabetic case and removing special characters

The ‘dishwasher_full’ column is difficult to analyse because the character cases are inconsistent and the column contains a special character (‘!’).

# The "toupper" function converts a string to upper case
# The "str_replace_all" function with the option "[[:punct:]]"
# removes special characters including "!"

dat <- dat %>%
      mutate(dishwasher_full = str_replace_all(toupper(dishwasher_full), "[[:punct:]]", ""))

Re-coding erroneous answers and inconsistent data recording

In the ‘notes’ column, we can see that “Davida” records that she filled in the ‘no_of_mugs’ column with ‘0’ because the question was compulsory but the value should be ‘NA’ because everyone was at a company conference that day. We can also see that the ‘no_of_mugs’ column contains text in brackets, which we need to remove so we can analyse the numbers. Additionally, some of the numbers given are a range of numbers instead of an exact number. We ideally want the mean of the range of numbers to make analysis easier.

# The function "gsub" with the regular expression ""\\(.*","""
# converts values to character strings and removes all of
# the string after the first bracket
# The function "as.Date" converts a character string to a
# date (so it can be recognised)
# The "separate" function with the "-" option splits a column
# into two separate columns, based on the character "-"

dat <- dat %>%
      mutate(no_of_mugs = case_when(date_cleaned == as.Date("2018-07-23") ~ NA_character_,
              TRUE ~ gsub("\\(.*","",no_of_mugs)))  %>%
      separate(no_of_mugs, c("lower", "upper"), "-") %>%
      mutate(no_of_mugs = case_when(!is.na(upper) == TRUE ~ (as.numeric(lower)+as.numeric(upper))/2,
              TRUE ~ as.numeric(lower)))

# NOTE: "NA_character_" is given as the value rather than "NA"
# to ensure that it is accepted by the function (the class of "NA"
# needs to be the same as the rest of the data in that column)

Removing and re-ordering columns

The final step is to remove and re-order any columns that we generated or rearranged during the cleaning process or that we no longer need.

# The "select" function chooses specific columns and put them
# in a given order

dat <- dat %>%
      select(date_cleaned, name, clean_sides, clean_sink, dishwasher_full, no_of_mugs, notes)

Purifying the preparations

So there you have it: a few useful R-based data cleaning techniques that can help you deal with dirty data after it’s been recorded. But what if we could actually reduce the amount of cleaning we had to do in the first place? In my next post I’ll look at how to reduce common errors and bias by improving survey design.

References

Wickham. H. 2017. tidyverse: Easily Install and Load the “Tidyverse”. R package version 1.2.1. Available from: https://CRAN.R-project.org/package=tidyverse

Grinding Gears

Tales of code crunching from the FreeAgent Engineering team