Data & ML – Grinding Gears

Fine-tuning a DistilBERT classifier with numerical and text inputs

Posted by Paloma Jol on 10 April 2026

Text classification is often done through fine-tuning of a pretrained foundation model with domain-specific data. In FreeAgent we use transformer based models to automatically classify incoming bank transactions. Specifically we use a DistilBERT model that is fine-tuned on hundreds of millions of bank transactions with customer-labelled accounting categories. The model inputs are currently text-based, built from a combination of bank transaction descriptions and amounts. In this post we describe an approach to fine-tuning the DistilBERT model and training the classifier including the numerical amount feature as a single network. Continue reading

➼ Read other posts about AI or AWS or data science or fine-tuning or hugging face or LLMs or machine learning or NLP

Structured outputs with Pydantic AI

Posted by Ed Berry on 24 March 2026

One of the challenges of working with LLMs is getting them to respond with a consistent format, such as a given JSON schema. Anyone who has tried to solve this issue with prompt engineering knows how frustrating it can be. You add a ‘MUST’ here and an ‘always return JSON’ there, but still the output doesn't reliably parse. Maybe you're about to add a try-except block to handle parsing errors… Continue reading

➼ Read other posts about AI or data or data science or GenAI or LLMs

How we Use Dagster Automations in our Data Pipeline

Posted by Delphine Rabiller on 10 December 2025

Introduction The heart of a reliable data platform are robust and automated data pipelines. As our team migrates our data pipelines to Dagster, re-architecting our automation logic is a crucial task. Dagster offers condition-based approaches to creating or updating a data asset (table or file), moving us toward a modern, asset-centric view of data. This post details the three primary automation strategies we considered and implemented (so far) in Dagster—Schedules,… Continue reading

➼ Read other posts about analytics engineering or dagster

Streamlining DBT Macro Testing: A Unit Test Approach with Pytest and Jinja

Posted by Paul Barber on 30 September 2025

Introduction Data Engineering at FreeAgent has a mission to ensure our colleagues and customers have reliable, accurate, and secure access to the data they need. Our migration to Dagster, DBT, and DLT is a key part of that. However, it has raised numerous questions, including how we test DBT Macros. This post dives into how we're tackling this by leveraging pytest and Jinja for more efficient unit testing of DBT… Continue reading

➼ Read other posts about dagster or data or data platform or testing

Creating re-usable descriptions in dbt with Jinja docs

Posted by Rob Brown on 11 August 2025

If you're working with dbt and find yourself copying the same column descriptions across multiple models, this post is for you. We'll show you how to eliminate that repetition using the Jinja doc function! Continue reading

➼ Read other posts about analytics engineering or dbt or documentation or etl

Decoding Data Orchestration Tools: Comparing Prefect, Dagster, Airflow, and Mage

Posted by Paul Barber on 29 May 2025

Introduction Data is exploding, and so are the tools to manage it. From generating and collecting, to cleaning and analyzing, these tools help create valuable products for customers and give stakeholders decisive insights. As Data Engineering at Freeagent continues to evolve, we're focusing on providing more reliability and quality in our data products. For data pipeline building, we've started to move from a no-code approach toward a software engineering focused… Continue reading

➼ Read other posts about dagster or data or data migration or data platform

The 5 rules for migrating data pipelines successfully

Posted by Rob Brown on 22 April 2025

This blog will help you to discover the 5 essential rules to navigate your large-scale data tooling transition smoothly and with minimum disruption. Continue reading

➼ Read other posts about analytics engineering or data or etl or migration

Introducing Analytics Engineering

Posted by Dave Evans on 20 March 2025

Over the last few years we’ve evolved the way our analytics team works to enable easy access to accurate and reliable data for faster, better decision-making. Recently we made one more change—our Business Intelligence Analysts are now Analytics Engineers! Continue reading

➼ Read other posts about analytics engineering or dagster or dbt or dlt

Combining text with numerical and categorical features for classification

Posted by Delphine Rabiller on 17 May 2024

Classification with transformer models A common approach for classification tasks with text data is to fine-tune a pre-trained transformer model on domain-specific data. At FreeAgent we apply this approach to automatically categorise bank transactions, using raw inputs that are a mixture of text, numerical and categorical data types. The current approach is to concatenate the input features for each transaction into a single string before passing to the model. For… Continue reading

➼ Read other posts about AWS or BERT or data science or fine-tuning or hugging face or machine learning or NLP

Restructuring our analytics team

Posted by Jack Gladas on 30 January 2024

In late 2022, we restructured our analytics team by aligning each analyst to a different area of the business. In this blog post I’ll talk about what we changed, why we changed it, and how we feel the changes have gone so far. If you’ve been through a similar process (or even the opposite process!), are considering it, or have any other thoughts, we’d love to chat! Please drop us… Continue reading

➼ Read other posts about analytics or bi

Grinding Gears

Tales of code crunching from the FreeAgent Engineering team