The Mobile Apps and the Tester

Posted by on January 19, 2021

The replatforming of our hybrid mobile app to separate iOS and Android native apps was already well under way when I arrived at FreeAgent as a test engineer for the mobile team. Since then we have carved out processes that the whole team can contribute to, giving us confidence that for each release our apps are in good shape. Here are a few things we are doing to ensure this.

Automation with Appium

All end-to-end tests were being executed manually when I joined the team. As the replatforming work continued to pick up pace, manually regression testing different scenarios and their permutations became more and more time-consuming. Defects were creeping into core user flows as the complexity and size of the apps’ codebases grew. We needed something to help reassure us we weren’t breaking core functionality and to cut down the time spent on repetitive testing tasks. That’s where Appium comes in handy.

Our suite of automated end-to-end tests uses Appium and is written in Ruby.

Why Appium? The tests can be executed on both our native iOS and Android apps with the same test code, which means we only need to maintain a single codebase. We use the XCUITest and UIAutomator2 drivers for iOS and Android respectively.

Why Ruby? FreeAgent is written with love and Rails! Choosing a familiar and common programming language made sense for us as our Rails engineers can bring their Rails-fu and experience to this test codebase.

The tests are written following the Page Object Model (POM) design pattern as this helps keep the clutter of selectors, wait methods, element properties and navigation helpers away from the actual tests. The screens in our app have consistent designs throughout, so POM-ing makes elements and functions re-usable when we write tests across different areas of the app.

As you would expect, these tests don’t cover everything as they take a fair bit of time to maintain and run – such is the life of end-to-end tests. Their main responsibility is to ensure the happiest user paths are working as expected. Given this test suite can take a while to execute, it is run on a nightly schedule.

Now that the replatforming has been completed, these tests are an integral part of our release process. Our sprints run for two weeks and we usually release at the end of each sprint. These tests help ensure our main user flows haven’t broken, with new features and fixes being merged as the sprint progresses.

Manual Testing

FreeAgent uses a shift-left testing strategy to give our test engineers time and opportunities to devise and manage processes for preventing bugs before code changes reach their testing phase. This may sound like “let’s get developers to do all the testing”. Well, sort of. Developers will take on manual testing with support and guidance from the test engineer. Test scripts are written by the developers as they are developing the feature/change. The test engineer or other developers may make suggestions to improve the test script. Once the change is ready for code review, either the code reviewer, another developer (this should never be the developer who made the code change) or test engineer will execute the test script. A second set of testing eyes is always welcome if the change feels like it is high risk.

Unit Testing

We’re taking a bit of an uphill journey with unit tests but it’s a journey well worth going on. Does code coverage mean everything, though? No! But it certainly does help to gauge which parts of the app are at a higher risk of defects being introduced. It’s sometimes very easy to start treating code coverage as a numbers game. We should avoid writing test code for the sole purpose of upping the coverage percentage. As we continue to work on increasing our code coverage, our general approach to addressing existing gaps in coverage is to fill gaps where the greatest risk lies by not having unit tests in place, e.g. core functionality and complex code. Gaps in code coverage are tracked as technical debt so that we don’t forget to tackle them later.

Measuring Quality

Trying to measure something that has a different meaning to different parts of the business can be a bit tricky. A question that I’m often asking myself is “are these test processes actually working?”. We need a bit more than app ratings to tell us how well we’re really doing. Alongside app ratings, we combine a few metrics like customer issues, internally raised issues, crash-free users, open crashes and code coverage to generate an app quality score. We then compare the score month on month. It’s not an exact science but it is helpful to have a rough indicator that provides the team with a benchmark to reach for and maintain.

As we continue to boost the native mobile apps with more great features, the way we test will evolve, and I’m very much looking forward to it.

Six years of data science and analytics interns at FreeAgent

Posted by on January 7, 2021

It’s hard to believe we’ve been running internships in our data teams for six years now, and we’re about to start recruitment for the seventh time. Things have changed a little since our first intern started, as last year saw more than four times as many staff in the wider team and our first remote internship during the coronavirus pandemic.

I’ve always tended to think of our internships as a chance to ask what cool project a new teammate can deliver given access to our data and tools for three months. So just what have our interns achieved and what have we learned in the process?


Growing the team by adding an intern

Six years ago I’d been working as a data scientist in the comms team for about a year and a half, and we’d started to think about how we could expand the team. Running a three-month summer internship seemed like a low-risk way of exploring what we could do with a second data scientist.

Our recruitment process involved an initial application, a phone screen, a short task and a final interview for the top candidates – as is still the case today. We were talking about doubling the size of the team so it was a serious business!

There was a project running in the UX team to create customer personas and they were keen to get some more insight from our customer behavioural data. We were using a third-party customer success management tool at the time and the built-in reports weren’t compelling. I’d read about the non-negative matrix factorisation method used by Netflix a few years before and wondered if we could apply the same technique to reduce our data into a useful set of latent behaviours.

This seemed like it would make a great intern project. Our first intern was so successful that we offered her a full-time job after the internship. Fiona went on to contribute to several of our most important analyses relating to customer lifetime value, conversion and churn.


Exploring machine learning in production

Fast forward a year and a few things had changed. Now we were a team of two data scientists and we’d created our first data warehouse running on our own infrastructure, which had started to provide a view of our customers across multiple data sources.

This year we had two software engineering interns working on an application to serve predictions from a machine learning model to classify customer bank transactions, and one dedicated data science intern working on a related project to predict which customers were at risk of churn by using a boosted decision tree. These were our first attempts at running machine learning away from our local machines and our first foray into cloud computing with Amazon ECS.

Both projects demonstrated that we could successfully run machine learning in the cloud, and in fact you might recognise the bank transaction classification project as a precursor to our first customer-facing machine learning driven feature that we launched in summer 2020.


Predicting customer churn with event data

By now our data warehouse had become so well established that a project was already under way to replace it with a cloud-based alternative using Amazon Redshift. Copying the data from our now “legacy” data warehouse into a more suitable schema in Redshift allowed us to introduce our first business intelligence tool, re:dash. Now it was easy for anyone in the business to run their own reports and SQL-savvy users could even self-serve their own stats.

We wanted to advance the churn prediction work from the previous year to take into account event data that could now be easily queried with Redshift. Neural networks seemed like a hot topic and we’d read about an approach to build a time-to-churn model based on an RNN. Could we combine two buzzwords in one project?

The answer sadly was no, but discovering something doesn’t work as expected didn’t mean we learned less. The customer behavioural data we were using wasn’t detailed enough to be able to make a good prediction so we set about augmenting it. This year I delegated the day-to-day project supervision to another data scientist in the team. Intern projects make a great opportunity for others in the team to get some mentoring or project supervision experience!


Delivering data science insights to the business

Re:dash had really taken off, with much of the comms, sales and finance teams’ monthly reporting coming from a single and consistent source of truth. With myself, two full-time data scientists and two interns we were starting to feel like a pretty substantial team making a big impact on the business.

That became the focus for our intern projects this year. What more could we do with our data to influence business decision-making? We picked two projects to focus on supporting our growing sales team. Would it be possible to introduce a lead scoring tool into the sales process, and how could we help our account managers share insights based on client behaviour with our accountancy practice partners?

Hannah’s prototype practice insights dashboard set the ball rolling on several years of future intern projects working with the sales team and some really terrific engagement from our accountancy practice partners. Charlotte’s work on creating a lead scoring tool was used as part of the sales process and proved the appetite for more data-informed decision-making.


Business intelligence and data science combine!

Now with more than 200 staff in the wider team and some big ambitions for the future, we had recruited a further two dedicated business intelligence analysts to build out and support our next-generation business intelligence platform based on Looker. Re:dash had proved the appetite for business users to self serve but writing SQL queries wasn’t for everyone. Looker presented a great solution to the problem by allowing users to build their own reports while ensuring common definitions could be put in place through its internal data modelling layer.

This year I left specifying the projects and all the day-to-day supervision to Owen, another data scientist in the team. Supervising two interns for a summer is a full-time job so we had to plan around that, but it makes a great personal development opportunity for the team.

Hannah’s prototype practice insights dashboard from the year before was based on a Bokeh app with a lot of custom Python code and we wanted to know if we could create and serve the insights more scalably by taking advantage of new functionality available to us via Looker. Lea, one of our 2019 interns, took the lead on this.

Meanwhile we wanted to push our other long-running theme further – could we predict future customer behaviour based on event data? By now we’d been collecting that more detailed data that was missing in 2017 for a couple of years. Could we use it to predict which customers would engage with the application after trialling the software?

This year the answer to both questions was yes. We implemented a customer engagement model and now armed with more advanced business intelligence tools the comms team were able to use the results in their day-to-day work for the first time. The practice insights project too was such a success that it would be developed further the next year.


Our first remote internships

Things were looking promising at the start of 2020. After a little more hiring we’d grown to two full-time data scientists, three business intelligence analysts and one web analytics specialist. We were now responsible for supporting the business with Looker reporting and had started to unify our back-end data with the front-end data collected in our Google Analytics. The data science part of the team was focused full-time on shipping our first-ever machine learning driven feature to help our customers classify their bank transactions. I’d delegated running the recruitment to Owen with the intention that he would manage our two planned interns as well.

Then in March, with the coronavirus pandemic in full swing, FreeAgent switched to fully ‘work from home’ mode just before the official national lockdown in the UK. With the exception of David, one of our two data scientists, the rest of the team had always been office-based and we loved the collaborative environment that enabled. After a few serious conversations we decided that we would do an experiment and go ahead with the summer internship working remotely, but would restrict it to a single intern rather than two.

And so we started our first-ever remote internship. Our intern Mikey would join the data science part of the team and help us investigate how we could enhance the machine learning model used to classify customer bank transactions. Despite the lack of the usual in-person social activities we managed a few remote team games-and-takeaway nights and Jack’s games of Pointless kept our spirits up during the lockdown. With the launch of our first machine learning model to our full customer base in the summer there was still plenty to be enthusiastic about.

In fact, our experience of running the remote internship was so good that when our plans turned again to practice insights we were able to hire Lea for a second three-month internship. Lea had the experience of working with the team both in the office and remotely now, and she shared her thoughts in a blog post. Due to an unanticipated change in the team at the end of the year we were glad to be able to invite Lea to join us as a full-time business intelligence analyst – our second intern to progress to a permanent position.

Summary thoughts

It’s amazing to see how the data science and analytics team has grown from one to seven full-time staff since we ran our first internship. Comparing that first project working locally on CSV reports passed around by hand with the situation today is quite a contrast. Now we’re running our first machine learning model in production, serving over 100,000 customers, and we have up to 70 business users self-serving their own insights from our business intelligence platform each week.

Every year our internships have offered us a chance to reflect on where the team is and what we can do with our technologies, as well as a chance to gain some valuable work experience for the interns themselves.

So what makes a good data intern project?

  • Do have a definite end goal in mind that can be achieved in three months, allowing time for general onboarding and getting up to speed with tools and technologies.
  • Do make sure the end goal is of interest to the business. We expect that our interns will present their work to the rest of the company at one or more of our weekly town hall meetings, and blog about their experiences. So we should make sure they have something interesting to talk about.
  • Don’t select projects on the critical path. No matter how urgent the work seems to be there’s no point dumping too much responsibility on a new junior team member who’s only going to be there for three months.
  • Don’t focus on business as usual. Day-to-day requests take almost as much time to get up to speed with as a new project, and there will be less to show for it at the end.

For us, projects to investigate a new tool or technique that almost made the cut in our usual team prioritisation have worked very well. That way it’s something of interest, and a bonus to the business that we couldn’t have delivered otherwise. More than that, our internships have provided a great chance for our own team growth and development as well as for the interns themselves.

But what about from the interns’ perspective? Feedback has often highlighted opportunities to learn new skills, enthusiasm for working with the team and the opportunity to make a really meaningful contribution to the business. We always encourage our interns to blog about their experiences, so you don’t have to take my word for it:

And, more recently, some thoughts on what it was like working remotely during the coronavirus pandemic last year:

Watch this space for a follow-up on what our former interns did next, but for now I’m looking forward to seeing what this year’s batch will achieve.

Testing Child Processes in Ruby

Posted by on January 6, 2021

I was recently writing a piece of code that we wanted to act as a supervisor of child processes. We wanted to ask this supervisor the following “Hello there, would you mind running this task in a child process? Thanks!”. From here the supervisor would create a process, keep track of it so we can stop it if necessary, and run the given piece of code in it. This supervisor could run any number of tasks in child processes and would keep track of each one.

It took some time to figure out the best way to test it, but in the end the solution felt quite nice, and so I’ve written up what we came up with and the steps we took to get there.

Here’s a stripped down version of the class I was working with:

class Supervisor
  def fork_child_process
    fork do

You can see there’s a fork_child_process method that wraps the fork method. In here we could add in extra logic to keep track of the process ID, run before/after fork hooks, etc. but for now all that’s important is that we’re calling fork. This will create a child process and run the passed in block inside it.

An example of using it could be:

supervisor =
supervisor.fork_child_process { p "Running in the child process!" }
=> 65538 # (this is the child process id)
"Running in the child process!"

Testing attempt 1

I’m using RSpec syntax in the following examples, but the principles aren’t specific to any testing framework and so could be written in minitest or test-unit.

To test that we run the block in the child process I initially came up with the following:

it "runs the block in the child process" do
  execution_check = nil
  supervisor.fork_child_process { execution_check = "✅" }
  expect(execution_check).to eq "✅"

The approach here felt fairly solid to me:

  • set a variable in the child process
  • assert that it was set

Sadly the test failed! 💔

Failure/Error: expect(execution_check).to eq "✅"

  expected: "✅"
       got: nil

  (compared using ==)

It failed for a couple reasons:

  1. The child process gets its own version of execution_check

The fork method wraps a system call, which makes use of something called Copy-on-Write (CoW 🐄) for managing child process memory. A child process uses the same memory[1] as the parent process until one of them modifies what’s stored in it. When this happens they get their own version. This is super handy because it means child processes aren’t unnecessarily holding duplicate data, but it’s also the main reason this test ain’t gonna work. Our child process is creating a private copy of the data when it modifies the variable and this isn’t available in the parent process (the test).

  1. There’s a timing issue

Even if the parent and child processes were both looking at the same execution_check object, there’s a chance the assertion could run before the child process has set the value of execution_check to “✅”. I really wanted to avoid having a sleep in the test, so I changed my approach.

Testing attempt 2

At this point I stubbed the fork method in the test so it would immediately run what we pass into it:

it "runs the block in the child process" do
  allow(supervisor).to receive(:fork).and_yield

  execution_check = nil
  supervisor.fork_child_process { execution_check = "✅" }

  expect(execution_check).to eq "✅"

This test asserts that whatever block we pass into the fork_child_process method is executed. It passes!

Run options: include {:locations=>{"./supervisor_spec.rb"=>[14]}}

Finished in 0.00775 seconds (files took 0.07863 seconds to load)
1 example, 0 failures

I could have left it there, but mocking the fork method didn’t give me enough confidence that my class was doing the right thing. I wanted the test to actually create a child process and assert that our code was run inside it.

Testing attempt 3

Using an IO#pipe we can create a channel the parent and child processes can use to chat.

Here’s a little helper class we came up with:

class ChildProcessMessage
  def initialize
    @read, @write = IO.pipe

  # sets a value in the child process to be communicated
  # back to the waiting parent process.
  def set(value)
    @write.write value

  # to be called in the parent process, waiting for the child
  # to set a value
  def wait

Here’s what our test looked like using it:

it "runs the block in the child process" do
  execution_check =
  supervisor.fork_child_process { execution_check.set("✅") }

  expect(execution_check.wait).to eq "✅"

This does the following:

  • The child process writes “✅” to our IO#Pipe and then closes it
  • The parent process (the test) waits for the child process to write and close the pipe
  • The parent process asserts that the pipe received the expected data

The file handlers are copied across the fork, and so each end of the pipe is opened twice—once in the parent and once in the child. This is why we’re closing @write in both the set and wait methods. I haven’t closed @read in these methods for brevity, but it would make sense to do so.

Because we’re not modifying the execution_check object (unlike in attempt 1), both the parent and child process are looking at the same object.

The solution worked well for us. It did add some complexity, but in doing so we gained a higher level of confidence that our supervisor was working correctly. There are some performance implications you’d want to keep in mind between the two approaches, but that’s beyond the scope of this post. Overall I’m happy with the result, and in creating these tests I learned a little bit about interprocess communication, which I think is a good thing.

If you find yourself needing to unit test code that creates child processes, stubbing the fork method might be just what you’re after. You can gain a decent level of confidence by testing the child process code separately and using the stubbed fork method to make sure it’s called with the right code. If, however, you need something a bit more concrete, then using the ChildProcessMessage class could help. By actually creating child processes in the tests, you’re able to find issues like zombie processes much earlier.

P.S. If you’d like to learn more about some of this stuff here are some links:

[1] Interestingly, this is one of the differences between a process’s VSZ (Virtual size) and RSS (Resident set size) as reported by tools such as ps and top. The Virtual size is the address space available to the process, so includes areas that have been inherited from the parent process. Whereas the resident set size is the actual RAM occupied by that process, so excludes the inherited memory.