Six years of data science and analytics interns at FreeAgent

Posted by on January 7, 2021

It’s hard to believe we’ve been running internships in our data teams for six years now, and we’re about to start recruitment for the seventh time. Things have changed a little since our first intern started, as last year saw more than four times as many staff in the wider team and our first remote internship during the coronavirus pandemic.

I’ve always tended to think of our internships as a chance to ask what cool project a new teammate can deliver given access to our data and tools for three months. So just what have our interns achieved and what have we learned in the process?


Growing the team by adding an intern

Six years ago I’d been working as a data scientist in the comms team for about a year and a half, and we’d started to think about how we could expand the team. Running a three-month summer internship seemed like a low-risk way of exploring what we could do with a second data scientist.

Our recruitment process involved an initial application, a phone screen, a short task and a final interview for the top candidates – as is still the case today. We were talking about doubling the size of the team so it was a serious business!

There was a project running in the UX team to create customer personas and they were keen to get some more insight from our customer behavioural data. We were using a third-party customer success management tool at the time and the built-in reports weren’t compelling. I’d read about the non-negative matrix factorisation method used by Netflix a few years before and wondered if we could apply the same technique to reduce our data into a useful set of latent behaviours.

This seemed like it would make a great intern project. Our first intern was so successful that we offered her a full-time job after the internship. Fiona went on to contribute to several of our most important analyses relating to customer lifetime value, conversion and churn.


Exploring machine learning in production

Fast forward a year and a few things had changed. Now we were a team of two data scientists and we’d created our first data warehouse running on our own infrastructure, which had started to provide a view of our customers across multiple data sources.

This year we had two software engineering interns working on an application to serve predictions from a machine learning model to classify customer bank transactions, and one dedicated data science intern working on a related project to predict which customers were at risk of churn by using a boosted decision tree. These were our first attempts at running machine learning away from our local machines and our first foray into cloud computing with Amazon ECS.

Both projects demonstrated that we could successfully run machine learning in the cloud, and in fact you might recognise the bank transaction classification project as a precursor to our first customer-facing machine learning driven feature that we launched in summer 2020.


Predicting customer churn with event data

By now our data warehouse had become so well established that a project was already under way to replace it with a cloud-based alternative using Amazon Redshift. Copying the data from our now “legacy” data warehouse into a more suitable schema in Redshift allowed us to introduce our first business intelligence tool, re:dash. Now it was easy for anyone in the business to run their own reports and SQL-savvy users could even self-serve their own stats.

We wanted to advance the churn prediction work from the previous year to take into account event data that could now be easily queried with Redshift. Neural networks seemed like a hot topic and we’d read about an approach to build a time-to-churn model based on an RNN. Could we combine two buzzwords in one project?

The answer sadly was no, but discovering something doesn’t work as expected didn’t mean we learned less. The customer behavioural data we were using wasn’t detailed enough to be able to make a good prediction so we set about augmenting it. This year I delegated the day-to-day project supervision to another data scientist in the team. Intern projects make a great opportunity for others in the team to get some mentoring or project supervision experience!


Delivering data science insights to the business

Re:dash had really taken off, with much of the comms, sales and finance teams’ monthly reporting coming from a single and consistent source of truth. With myself, two full-time data scientists and two interns we were starting to feel like a pretty substantial team making a big impact on the business.

That became the focus for our intern projects this year. What more could we do with our data to influence business decision-making? We picked two projects to focus on supporting our growing sales team. Would it be possible to introduce a lead scoring tool into the sales process, and how could we help our account managers share insights based on client behaviour with our accountancy practice partners?

Hannah’s prototype practice insights dashboard set the ball rolling on several years of future intern projects working with the sales team and some really terrific engagement from our accountancy practice partners. Charlotte’s work on creating a lead scoring tool was used as part of the sales process and proved the appetite for more data-informed decision-making.


Business intelligence and data science combine!

Now with more than 200 staff in the wider team and some big ambitions for the future, we had recruited a further two dedicated business intelligence analysts to build out and support our next-generation business intelligence platform based on Looker. Re:dash had proved the appetite for business users to self serve but writing SQL queries wasn’t for everyone. Looker presented a great solution to the problem by allowing users to build their own reports while ensuring common definitions could be put in place through its internal data modelling layer.

This year I left specifying the projects and all the day-to-day supervision to Owen, another data scientist in the team. Supervising two interns for a summer is a full-time job so we had to plan around that, but it makes a great personal development opportunity for the team.

Hannah’s prototype practice insights dashboard from the year before was based on a Bokeh app with a lot of custom Python code and we wanted to know if we could create and serve the insights more scalably by taking advantage of new functionality available to us via Looker. Lea, one of our 2019 interns, took the lead on this.

Meanwhile we wanted to push our other long-running theme further – could we predict future customer behaviour based on event data? By now we’d been collecting that more detailed data that was missing in 2017 for a couple of years. Could we use it to predict which customers would engage with the application after trialling the software?

This year the answer to both questions was yes. We implemented a customer engagement model and now armed with more advanced business intelligence tools the comms team were able to use the results in their day-to-day work for the first time. The practice insights project too was such a success that it would be developed further the next year.


Our first remote internships

Things were looking promising at the start of 2020. After a little more hiring we’d grown to two full-time data scientists, three business intelligence analysts and one web analytics specialist. We were now responsible for supporting the business with Looker reporting and had started to unify our back-end data with the front-end data collected in our Google Analytics. The data science part of the team was focused full-time on shipping our first-ever machine learning driven feature to help our customers classify their bank transactions. I’d delegated running the recruitment to Owen with the intention that he would manage our two planned interns as well.

Then in March, with the coronavirus pandemic in full swing, FreeAgent switched to fully ‘work from home’ mode just before the official national lockdown in the UK. With the exception of David, one of our two data scientists, the rest of the team had always been office-based and we loved the collaborative environment that enabled. After a few serious conversations we decided that we would do an experiment and go ahead with the summer internship working remotely, but would restrict it to a single intern rather than two.

And so we started our first-ever remote internship. Our intern Mikey would join the data science part of the team and help us investigate how we could enhance the machine learning model used to classify customer bank transactions. Despite the lack of the usual in-person social activities we managed a few remote team games-and-takeaway nights and Jack’s games of Pointless kept our spirits up during the lockdown. With the launch of our first machine learning model to our full customer base in the summer there was still plenty to be enthusiastic about.

In fact, our experience of running the remote internship was so good that when our plans turned again to practice insights we were able to hire Lea for a second three-month internship. Lea had the experience of working with the team both in the office and remotely now, and she shared her thoughts in a blog post. Due to an unanticipated change in the team at the end of the year we were glad to be able to invite Lea to join us as a full-time business intelligence analyst – our second intern to progress to a permanent position.

Summary thoughts

It’s amazing to see how the data science and analytics team has grown from one to seven full-time staff since we ran our first internship. Comparing that first project working locally on CSV reports passed around by hand with the situation today is quite a contrast. Now we’re running our first machine learning model in production, serving over 100,000 customers, and we have up to 70 business users self-serving their own insights from our business intelligence platform each week.

Every year our internships have offered us a chance to reflect on where the team is and what we can do with our technologies, as well as a chance to gain some valuable work experience for the interns themselves.

So what makes a good data intern project?

  • Do have a definite end goal in mind that can be achieved in three months, allowing time for general onboarding and getting up to speed with tools and technologies.
  • Do make sure the end goal is of interest to the business. We expect that our interns will present their work to the rest of the company at one or more of our weekly town hall meetings, and blog about their experiences. So we should make sure they have something interesting to talk about.
  • Don’t select projects on the critical path. No matter how urgent the work seems to be there’s no point dumping too much responsibility on a new junior team member who’s only going to be there for three months.
  • Don’t focus on business as usual. Day-to-day requests take almost as much time to get up to speed with as a new project, and there will be less to show for it at the end.

For us, projects to investigate a new tool or technique that almost made the cut in our usual team prioritisation have worked very well. That way it’s something of interest, and a bonus to the business that we couldn’t have delivered otherwise. More than that, our internships have provided a great chance for our own team growth and development as well as for the interns themselves.

But what about from the interns’ perspective? Feedback has often highlighted opportunities to learn new skills, enthusiasm for working with the team and the opportunity to make a really meaningful contribution to the business. We always encourage our interns to blog about their experiences, so you don’t have to take my word for it:

And, more recently, some thoughts on what it was like working remotely during the coronavirus pandemic last year:

Watch this space for a follow-up on what our former interns did next, but for now I’m looking forward to seeing what this year’s batch will achieve.

Tagged , ,

About the author…

Dave Evans was the first data scientist at FreeAgent and currently leads the data science and analytics teams. Before joining FreeAgent, Dave worked at the University of California, San Diego and CERN.

See all posts by Dave

Leave a reply

Your email address will not be published. Required fields are marked *