All posts tagged with 'data science'
Combining text with numerical and categorical features for classification
Classification with transformer models A common approach for classification tasks with text data is to fine-tune a pre-trained transformer model on domain-specific data. At FreeAgent we apply this approach to automatically categorise bank transactions, using raw inputs that are a mixture of text, numerical and categorical data types. The current approach is to concatenate the input features for each transaction into a single string before passing to the model. For… Continue reading
Using API Gateway, Lambda, SageMaker and DynamoDB to build a categorisation service in AWS
I’ve talked previously about the value of combining rules-based and machine learning approaches to categorisation. In short, rules-based approaches make it easy to do customer-level personalisation that complements a machine learning model trained to find patterns across customers. In this post I’ll talk about how we used AWS to build an expense categorisation service that combines machine learning with a rules-based approach. This service forms part of the Smart Capture… Continue reading
Combining machine learning with rules-based personalisation
One of the ways we use machine learning at FreeAgent is to help automate data entry. Keeping on top of your accounts can involve slightly tedious manual tasks like categorising bank transactions or managing your expenses. Machine learning can help here by automating aspects of these tasks so our users can nail their daily admin and focus on bigger things. Personalisation with rules In 2020 we launched our first operational… Continue reading
Combining data from different sources with SageMaker pipelines
Generating datasets for machine learning Preparing data and generating datasets is a crucial step to train a machine learning model. If you are lucky your data might come from a single .csv file. However in most cases pulling together the input features to train your machine learning model will require combining datasets from different sources. Combining data from different sources manually can be a time consuming process, prone to errors. … Continue reading
The Data Science Internship Chronicles: A Starfleet-worthy Tale of Numeric Exploration
In the vast expanse of the universe, I, a humble data science intern, set out on a mission to improve a classification model. As I delved deeper into the data, I encountered anomalies and outliers that threatened to disrupt my analysis. But with the guidance of my mentors and the help of advanced data tools, I navigated through the stars and uncovered the hidden patterns that led to breakthrough insights.… Continue reading
Mindfulness with GitHub
I was a researcher in chemistry in my previous career, so I have a habit of labelling everything. It is important in chemistry to be organised; you don’t want to mix unknown liquids in unlabeled beakers. Can you guess why? BOOM! I apply this habit in every area of my life. Now, everything has a place and is clearly labelled. I have a place for vertically striped socks and a… Continue reading
What a data science degree doesn’t teach you
When I enrolled on my data science master’s degree I had limited statistical and coding knowledge. This course was designed to teach these skills from the bottom up. Having now worked as a software engineering intern, I have come to realise a lot of things were missed. Moving beyond ‘if it works… it works!’ Learning to code can seem very daunting. There are so many resources and even languages. Where… Continue reading
Getting started with Jupyter Notebook
Jupyter Notebook is a development environment that runs in your web browser and can be used with several languages, including R and Python. In this blog post, we’ll look at some of the benefits of using Jupyter Notebook and how to start using it with Python. Benefits of Jupyter Notebook Chunking code into cells Instead of having to write code in large flat files, developers can use Jupyter Notebook to… Continue reading
How we structure our data teams at FreeAgent
Since joining FreeAgent back in April I’ve been both impressed and interested with how the Data organisation is structured. I’ve come from an enterprise world where you have lots of Data Engineers, a team of dedicated Data Architects and a separate Business Intelligence org. A few things that immediately struck me at FreeAgent were: No one has the title ‘Data Engineer’Data Analytics are part of the Engineering orgNo one has… Continue reading
Trading the lab coat for the computer – my journey to data science
I became a data scientist just over two years ago. It’s not that long since I traded my lab coat for a computer job, and a few people have asked me how I made the transition, if I could help someone get into data or if I could just answer some questions about what it’s like to work in data. So I figured I would put it all together in… Continue reading