November 2023 – Grinding Gears

One of the ways we use machine learning at FreeAgent is to help automate data entry. Keeping on top of your accounts can involve slightly tedious manual tasks like categorising bank transactions or managing your expenses. Machine learning can help here by automating aspects of these tasks so our users can nail their daily admin and focus on bigger things.

Personalisation with rules

In 2020 we launched our first operational machine learning system for categorising bank transactions. The model powering the system uses patterns learned from historical data to suggest one of over 50 different categories for a transaction. Before machine learning entered the picture we had a set of rules to categorise transactions for a business using their own historical data, referred to as Guess. Our machine learning did not replace the rules-based approach. Instead, we combine a rules-based system with machine learning to get the best of both worlds. First, the Guess rules attempt to categorise an incoming transaction, then it’s passed to our machine learning model if the rules can’t categorise it.

While most businesses categorise common transactions similarly, some will use different categories depending on what is right for them. As an example, a bank transaction for postage could be assigned to a specific ‘postage’ category or to the more general ‘cost of sales’, which can cover “postage of finished goods.” This ambiguity places a limit on how accurate¹ a machine learning model can be by learning general patterns from previous transactions. Two identical transactions might be categorised in different ways for perfectly good reasons.

One effective way to address this ambiguity is with rules-based personalisation. For example, our Guess system looks for similar transactions within a business’s historical data to help categorise new transactions. This captures both user preference and any specific categorisation requirements of a business. Most transactions from a stationary shop might be categorised as ‘stationary’ but if I buy paper there for crafts that I sell then ‘materials’ or ‘cost of sales’ is a better match. In this situation our machine learning model would suggest ‘stationary’ as the best category with confidence determined by the transaction details. Importantly, the machine learning model is both right and wrong here. It’s right that averaging over historical data ‘stationary’ is the most probable category. It’s just that for my situation it’s wrong as there are more appropriate choices. We could add more information to our model to get closer to my right answer but rules-based systems can be an easier option.

Building on our success with bank transactions, we’ve recently extended this approach to categorising expenses. Our new Smart Capture feature automates extracting data from receipts and we provide suggested categories when a receipt is converted into an expense. To provide these categories we look for matches to previous expenses from that business alongside using a general machine learning model trained on historical expenses across our customer base. As with bank transaction categorisation, we’ve found this combination of rules-based personalisation and machine learning is better than either alone. A machine learning model is great for businesses without many expenses or with one-off expenses that aren’t repeated; here we can take advantage of the typical behaviour to suggest a category.

Following our work on bank transactions, the rules-based personalisation takes precedence, with the model being used if no match is found.

Another benefit of using rules-based personalisation is that we avoid making the same mistakes for a business. A model could keep getting a certain transaction wrong but with rules-based personalisation we can quickly learn from customer feedback. As businesses build up expenses over time we can provide a more personalised experience tailored to how they categorise things.

Another area where rules-based systems can be powerful is where consistency and explainability are important. Machine learning models are becoming increasingly complex, making understanding their outputs challenging, particularly with the rise of large proprietary models. Stochastic and opaque model outputs are not appropriate in some business contexts where you want the same thing to always happen in a given situation. You can put the exact same prompt into a generative AI tool only to get a different response back with little understanding of why. Here the simplicity of rules is a real boon. A rule can be created to take an action in a situation whatever your fancy model might think. To me, it’s a fool’s errand to think probabilistic models can replace rules-based systems entirely. Sometimes we just want to do or not do something based on a set of conditions we can easily talk and reason about. The rise of guardrails in even the most cutting-edge AI systems is a reflection of this.

Summary

We’ve found that starting with rules-based systems is a great way to get something reliable and explainable up and running. If a set of rules can get you part of the way there at a fraction of the cost of fancy machine learning techniques, why not give it a go? After all, there’s a long history of using rules in AI that those of us living in the machine learning era shouldn’t forget.

Thanks to Dave Evans and Owen Turner for their comments and suggestions on this post.

There’s an interesting question about what ‘accurate’ means in this context. A decent ML model will give a prediction that is appropriate in most cases (i.e. most probable) without necessarily being what that business would have picked. ↩︎

Grinding Gears

Tales of code crunching from the FreeAgent Engineering team

Combining machine learning with rules-based personalisation

Personalisation with rules

Summary

We're totally hiring!