Combining text with numerical and categorical features for classification
Classification with transformer models A common approach for classification tasks with text data is to fine-tune a pre-trained transformer model on domain-specific data. At FreeAgent we apply this approach to automatically categorise bank transactions, using raw inputs that are a mixture of text, numerical and categorical data types. The current approach is to concatenate the input features for each transaction into a single string before passing to the model. For… Continue reading
Restructuring our analytics team
In late 2022, we restructured our analytics team by aligning each analyst to a different area of the business. In this blog post I’ll talk about what we changed, why we changed it, and how we feel the changes have gone so far. If you’ve been through a similar process (or even the opposite process!), are considering it, or have any other thoughts, we’d love to chat! Please drop us… Continue reading
Using API Gateway, Lambda, SageMaker and DynamoDB to build a categorisation service in AWS
I’ve talked previously about the value of combining rules-based and machine learning approaches to categorisation. In short, rules-based approaches make it easy to do customer-level personalisation that complements a machine learning model trained to find patterns across customers. In this post I’ll talk about how we used AWS to build an expense categorisation service that combines machine learning with a rules-based approach. This service forms part of the Smart Capture… Continue reading
Combining machine learning with rules-based personalisation
One of the ways we use machine learning at FreeAgent is to help automate data entry. Keeping on top of your accounts can involve slightly tedious manual tasks like categorising bank transactions or managing your expenses. Machine learning can help here by automating aspects of these tasks so our users can nail their daily admin and focus on bigger things. Personalisation with rules In 2020 we launched our first operational… Continue reading
Combining data from different sources with SageMaker pipelines
Generating datasets for machine learning Preparing data and generating datasets is a crucial step to train a machine learning model. If you are lucky your data might come from a single .csv file. However in most cases pulling together the input features to train your machine learning model will require combining datasets from different sources. Combining data from different sources manually can be a time consuming process, prone to errors. … Continue reading
Generative AI: Programmable Reasoning Machines of the Future
These days Generative AI is being employed for everything from interpretation and summarisation of text to problem solving with a conversational natural language interface. What sort of conceptual model should we have in mind when thinking about LLM systems? Continue reading
The Data Science Internship Chronicles: A Starfleet-worthy Tale of Numeric Exploration
In the vast expanse of the universe, I, a humble data science intern, set out on a mission to improve a classification model. As I delved deeper into the data, I encountered anomalies and outliers that threatened to disrupt my analysis. But with the guidance of my mentors and the help of advanced data tools, I navigated through the stars and uncovered the hidden patterns that led to breakthrough insights.… Continue reading
Mindfulness with GitHub
I was a researcher in chemistry in my previous career, so I have a habit of labelling everything. It is important in chemistry to be organised; you don’t want to mix unknown liquids in unlabeled beakers. Can you guess why? BOOM! I apply this habit in every area of my life. Now, everything has a place and is clearly labelled. I have a place for vertically striped socks and a… Continue reading
My experience as an Analytics Intern
I was really excited to begin interning at FreeAgent and after 9 weeks in the Data Analytics team I feel I’ve learnt a lot about working in a team inside a company, and about the culture here. I thought I would write a bit about how it was getting set up, working on my project and communicating my findings to the rest of the company. Onboarding/set-up There were a lot… Continue reading
The 4 SQL queries you need to debug Redshift performance
This blog provides four useful SQL snippets you can use to debug poor Redshift performance. Continue reading