A company faces some unavoidably arduous tasks when taking control of their finances. One such task, which currently takes up a lot of time for our users, is explaining bank transactions. This is the process of assigning an accounting category to transactions, which is important both for internal reports generated by FreeAgent and for external submissions, for example to HMRC. At the end of June FreeAgent launched a suite of new automation features. As part of this release we have begun using a machine learning model to automatically explain our users’ bank transactions.
I previously wrote about starting a remote internship at FreeAgent this summer and how I’d be helping the data science team with developments to the above model.
The machine learning model has explained almost 100,000 transactions in the first two months after launching, marking them as ‘for approval’ in the Banking section of the app so that users get a chance to check them over. Over half of the explained transactions have gone through the approval stage with greater than 95% of those left unaltered. We will refer to this estimator of model performance as the precision: when the model makes a prediction, how often is it correct?
This is encouraging performance so far but there is room for improvement in terms of the volume of bank transactions the model attempts to explain. In the post I will give an overview of how the model works and how I’ve helped to increase its impact.
The process of assigning one out of a given set of accounting categories to a bank transaction is a classification task, which we’ve chosen to tackle using a supervised-learning approach whereby we learn a function which maps from some input features to an output (accounting category) using example input-output pairs. The current model, referred to internally as ‘Banquo’, is a support vector machine (SVM) that takes example bank transaction descriptions and amounts as inputs, alongside the associated target accounting category we hope to learn how to predict. This information is used to find optimal decision boundaries for associating transactions with accounting categories.
To simplify the problem, due to the high possibility of overlap between some of the 60+ standard accounting categories, the Banquo model in our initial launch to customers only attempts to explain transactions in four thematically distinct accounting categories: Accomodation and Meals, Bank/Finance charges, Insurance and Travel.
As mentioned above, the model inputs are the transaction description and amount for each bank transaction. To be able to feed these into the SVM, we apply preprocessing steps to normalise and extract the sign of the transaction amount and to represent the textual transaction description in a numerical form. This latter step draws on techniques from Natural Language Processing (NLP). We make use of the HashingVectorizer from scikit-learn to efficiently construct a token occurrence matrix from the input transaction descriptions.
The next stage of the pipeline is to train the SVM using the preprocessed inputs and associated accounting categories, which we have access to for 10s of millions of historical transactions explained by our users. We train an independent binary SVM for each of the four initial accounting categories mentioned above, whereby a boundary is positioned to separate as many transactions belonging to that category from the rest of the transactions as possible. This is known as the one-vs-rest approach to training a multiclass SVM.
The output of the model is a signed score for each of the categories, with the sign indicating the side of the boundary the transaction lies on and the magnitude of the score indicating the distance from the boundary for each respective category. We compare the maximum score with a fixed confidence threshold; if the threshold is exceeded then the transaction is labelled as the corresponding category.
Adding New Categories
My internship this summer has been centered around improving the performance of Banquo. Given the high precision of the model since launch to customers, the main area for improvement is the volume of transactions Banquo attempts to classify. One of the simplest ways to increase the coverage of the model is to add to the set of four accounting categories Banquo has been trained to make predictions for.
One of my biggest concerns when considering categories to add was introducing a category that could potentially surface conflicting scores with the existing categories – thus negatively impacting the current precision. To study the behaviour of the model when adding codes I split the historic transactions used to train the production model into a training and validation set; the validation set was formed of transactions from March and April 2020 and the training set was formed of transactions between January 2019 and February 2020.
Banquo is only sent transactions that have not already been explained by our hard-coded Guess rules. In order to target the most impactful categories, I checked which transactions in my validation set had already been explained by Guess. The following bar chart shows the top 15 categories which were left unexplained after Guess in March and April 2020.
The goal of my study was to identify some categories which, when added, increase the total number of explained transactions without reducing the overall precision of the model. Some of the categories commonly left unexplained by Guess raised some alarm bells. In particular, Motor Expenses includes transactions with very similar descriptions to Travel, and possibly even Insurance. From the perspective of an end user adding this category would potentially lead to the undesirable scenario that the automation features could appear to be getting worse: “Why is this transaction no longer being explained correctly? It used to work fine!”
I trained new models for a handful of seemingly promising categories each of which predicted the current four accounting categories plus an additional candidate category. One of the candidate categories that looked really promising on the validation data was Accountancy Fees. The number of transactions correctly explained rose by about 8% when we added the category without the precision dropping.
This was exactly the kind of result I was looking for. Before adding this code to the production model, it was important to monitor the candidate model performance on incoming FreeAgent transactions. The current production model is served on an AWS SageMaker endpoint which is invoked by the FreeAgent app when transactions are imported by the user. We set up a candidate endpoint and sent it the same transactions as the live model in parallel. The resulting predictions are stored for monitoring rather than surfaced to users of the app.
We want the candidate model and current live model to both make the same predictions for the original four codes. In addition to this the candidate model should make precise predictions for transactions belonging to the Accountancy Fees category. Provided that the candidate model fulfills these criteria we will look to promote the candidate model to the live endpoint in the coming weeks, which is a really exciting and impactful addition to have made.