Data & ML – Grinding Gears

Creating re-usable descriptions in dbt with Jinja docs

Posted by Rob Brown on 11 August 2025

If you're working with dbt and find yourself copying the same column descriptions across multiple models, this post is for you. We'll show you how to eliminate that repetition using the Jinja doc function! Continue reading

➼ Read other posts about analytics engineering or dbt or documentation or etl

Decoding Data Orchestration Tools: Comparing Prefect, Dagster, Airflow, and Mage

Posted by Paul Barber on 29 May 2025

Introduction Data is exploding, and so are the tools to manage it. From generating and collecting, to cleaning and analyzing, these tools help create valuable products for customers and give stakeholders decisive insights. As Data Engineering at Freeagent continues to evolve, we're focusing on providing more reliability and quality in our data products. For data pipeline building, we've started to move from a no-code approach toward a software engineering focused… Continue reading

➼ Read other posts about data or data platform

The 5 rules for migrating data pipelines successfully

Posted by Rob Brown on 22 April 2025

This blog will help you to discover the 5 essential rules to navigate your large-scale data tooling transition smoothly and with minimum disruption. Continue reading

➼ Read other posts about analytics engineering or data or etl or migration

Introducing Analytics Engineering

Posted by Dave Evans on 20 March 2025

Over the last few years we’ve evolved the way our analytics team works to enable easy access to accurate and reliable data for faster, better decision-making. Recently we made one more change—our Business Intelligence Analysts are now Analytics Engineers! Continue reading

➼ Read other posts about analytics engineering

Combining text with numerical and categorical features for classification

Posted by Delphine Rabiller on 17 May 2024

Classification with transformer models A common approach for classification tasks with text data is to fine-tune a pre-trained transformer model on domain-specific data. At FreeAgent we apply this approach to automatically categorise bank transactions, using raw inputs that are a mixture of text, numerical and categorical data types. The current approach is to concatenate the input features for each transaction into a single string before passing to the model. For… Continue reading

➼ Read other posts about AWS or BERT or data science or fine-tuning or hugging face or machine learning or NLP

Restructuring our analytics team

Posted by Jack Gladas on 30 January 2024

In late 2022, we restructured our analytics team by aligning each analyst to a different area of the business. In this blog post I’ll talk about what we changed, why we changed it, and how we feel the changes have gone so far. If you’ve been through a similar process (or even the opposite process!), are considering it, or have any other thoughts, we’d love to chat! Please drop us… Continue reading

➼ Read other posts about analytics or bi

Using API Gateway, Lambda, SageMaker and DynamoDB to build a categorisation service in AWS

Posted by Ed Berry on 19 January 2024

I’ve talked previously about the value of combining rules-based and machine learning approaches to categorisation. In short, rules-based approaches make it easy to do customer-level personalisation that complements a machine learning model trained to find patterns across customers. In this post I’ll talk about how we used AWS to build an expense categorisation service that combines machine learning with a rules-based approach. This service forms part of the Smart Capture… Continue reading

➼ Read other posts about analytics or AWS or data or data science or machine learning

Combining machine learning with rules-based personalisation

Posted by Ed Berry on 27 November 2023

One of the ways we use machine learning at FreeAgent is to help automate data entry. Keeping on top of your accounts can involve slightly tedious manual tasks like categorising bank transactions or managing your expenses. Machine learning can help here by automating aspects of these tasks so our users can nail their daily admin and focus on bigger things. Personalisation with rules In 2020 we launched our first operational… Continue reading

➼ Read other posts about data science or machine learning

Combining data from different sources with SageMaker pipelines

Posted by Delphine Rabiller on 2 August 2023

Generating datasets for machine learning Preparing data and generating datasets is a crucial step to train a machine learning model. If you are lucky your data might come from a single .csv file. However in most cases pulling together the input features to train your machine learning model will require combining datasets from different sources. Combining data from different sources manually can be a time consuming process, prone to errors. … Continue reading

➼ Read other posts about athena or AWS or data or data science or redshift or SageMaker

Generative AI: Programmable Reasoning Machines of the Future

Posted by Dave Evans on 13 July 2023

These days Generative AI is being employed for everything from interpretation and summarisation of text to problem solving with a conversational natural language interface. What sort of conceptual model should we have in mind when thinking about LLM systems? Continue reading

➼ Read other posts about AI or LLM or TuringFest

Grinding Gears

Tales of code crunching from the FreeAgent Engineering team