All posts tagged with 'AWS'
Combining text with numerical and categorical features for classification
Classification with transformer models A common approach for classification tasks with text data is to fine-tune a pre-trained transformer model on domain-specific data. At FreeAgent we apply this approach to automatically categorise bank transactions, using raw inputs that are a mixture of text, numerical and categorical data types. The current approach is to concatenate the input features for each transaction into a single string before passing to the model. For… Continue reading
Using API Gateway, Lambda, SageMaker and DynamoDB to build a categorisation service in AWS
I’ve talked previously about the value of combining rules-based and machine learning approaches to categorisation. In short, rules-based approaches make it easy to do customer-level personalisation that complements a machine learning model trained to find patterns across customers. In this post I’ll talk about how we used AWS to build an expense categorisation service that combines machine learning with a rules-based approach. This service forms part of the Smart Capture… Continue reading
Combining data from different sources with SageMaker pipelines
Generating datasets for machine learning Preparing data and generating datasets is a crucial step to train a machine learning model. If you are lucky your data might come from a single .csv file. However in most cases pulling together the input features to train your machine learning model will require combining datasets from different sources. Combining data from different sources manually can be a time consuming process, prone to errors. … Continue reading
A brief introduction to ‘the cloud’ and managing infrastructure with code
Over the last decade the ‘cloud’ has become increasingly prevalent . A cloud based system allows a company to flexibly buy servers, storage, networking and various other services that are hosted externally rather than on-site, typically with a programmatic interface to allow large-scale use. According to a 2019 report, 94% of companies were utilising the cloud in one way or another. The market for cloud providers was valued at $200… Continue reading
A 12-step guide to AWS cost optimisation
This article outlines the pragmatic approach that we’ve followed here at FreeAgent in our first 18 months of using AWS to increase our cost efficiency. Using this approach, we’ve already cut our AWS spend by 50%, and we estimate we can save another 30% a year by implementing further efficiencies. Here are 12 things we’ve learned along the way. Our strategy 1. Don’t optimise for cost too early We fully… Continue reading
Fine-Tuning BERT for multiclass categorisation with Amazon SageMaker
This post describes our approach to fine-tuning a BERT model for multiclass categorisation with Hugging Face and Amazon SageMaker. Continue reading
Bank Transaction Entity Detection with AWS Comprehend
Introduction For the past year, FreeAgent has been running a machine learning model in production that categorises customer bank transactions. This model takes transaction descriptions and transaction amounts as inputs, and attempts to predict the corresponding accounting category. This summer, I joined the data science team with the more specific goal of increasing model generalisation, which would allow it to make predictions for a larger fraction of incoming transactions. One… Continue reading
Unpacking Amazon ECS
The following post is a high-level description of Amazon Web Services’ Elastic Container Service (ECS), the relationship between its components, and how they can be used to deploy a web application. ECS is a fully managed container service. It is akin to Kubernetes, but ultimately it is simpler and has fewer moving parts. ECS offers the security, reliability and scalability that is customary with AWS. The Fargate engine provides a… Continue reading
Head In The Clouds
Seven years ago we started planning our first major infrastructure migration. Nine months later we made the move, taking FreeAgent from our first home in Rackspace London to a new, co-located home in two data centres (DCs) run by The Bunker. FreeAgent has been happily humming along in Ash and Greenham Common ever since. Co-locating has been a terrific win for us over the years, providing us with a cost-effective,… Continue reading