Since joining FreeAgent back in April I’ve been both impressed and interested with how the Data organisation is structured. I’ve come from an enterprise world where you have lots of Data Engineers, a team of dedicated Data Architects and a separate Business Intelligence org. A few things that immediately struck me at FreeAgent were:
- No one has the title ‘Data Engineer’
- Data Analytics are part of the Engineering org
- No one has the title ‘Data/Platform/Solutions Architect’
What I want to talk about in this post is why, for an organisation like FreeAgent, these are all great features of a Data team. One quick disclaimer is that this post represents the current state at the time of writing (June 2022). Things may well change in the future!
To set the scene, it’s worth introducing our current Engineering organisation. Product & Engineering at FreeAgent is split into Product Engineering and Platform Engineering. The Data teams sit within Platform Engineering, with Data Science and Analytics grouped together as well as a separate Data Platform team, which sits within the Architecture group. A rough diagram of this structure is shown below.
This post will focus on the three Data teams: Data Science, Analytics and Data Platform.
Given the absence of Data Engineers at FreeAgent, you might wonder who does the Data Engineering? Who extracts data, transforms it, and loads it where it needs to be? For us, this work is shared between the Data Science, Analytics and Data Platform teams, as I’ll describe.
There are several data sources we want to ingest, the most important being the FreeAgent app. Ingesting these data is owned by Data Platforms, with raw files landing in our S3 Data Lake. From here a Glue Crawler populates the Data Catalog with information about the data in S3, making it queryable with Athena.
Our Analytics team uses Matillion to access the Data Lake and transform raw data into a more analytics-ready form in Redshift. We then use Looker to visualise these data in Redshift. Looker also provides its own tools for transformation as part of LookML.
Data Science also loads data into the Data Lake when extracting training data for models from the FreeAgent app, as well as using Matillion and Looker.
We can then see that the extraction of data is handled by our Data Platform team. Data Science and, primarily, Analytics then build the transformations in the middle and load data into Looker. By making use of tools like Matillion, Redshift and Looker, we are able to carry out all this ETL work without dedicated Data Engineers.
One aspect of our Data org that supports this approach is including Analytics within Engineering. This makes sense given we expect our Analytics team to have Data Engineering skills. It also reflects the importance of engineering to Analytics, as demonstrated by the emergence of Analytics Engineering in recent years.
Engineering a Data Platform is great, but it’s not an end in itself. We need to do something with the data. This is where Analytics and Data Science enter the mix. As well as building data pipelines, our Data Analysts also analyse data using Looker, R, and Python.
Another important tool for analytics is Looker, where business users are able to self-serve insights. Product teams can maintain their own dashboards as well as using dashboards maintained by Analytics.
One thing that really impressed me early on at FreeAgent was seeing a dashboard a product team had created. I was tasked with building a model using some data that the team produces and they already had a dashboard showing the distributions, volume and quality of the data.
As I’ve already noted, we don’t have dedicated Architects in our Data org. Instead, all our Data teams have a stake in our platform architecture. As the name suggests, our Data Platform team are the architects of the platform other teams build upon. Data Platform decides how our Data Lake should be set up or the best way for Matillion to access it. The modelling of Data within our Warehouse is owned by Analytics, as part of their engineering work. Data Science is also empowered to work out how we want to do Machine Learning.
While establishing ownership is important, all these teams ultimately work closely together to decide how to set things up, with input from the broader Engineering community. For example, Data Platform might be responsible for building our Data Lake, but Analytics and Data Science also contribute to how we should build it, as two of the biggest users. As Analytics are responsible for building a Data Warehouse downstream of the Data Lake, they have a stake in how it’s built.
A post on Data wouldn’t be complete without mentioning machine learning. We recently hired a new Data Scientist (👋) and have talked in a previous post about what we look for in a Data Scientist.
In Data Science, we work on building customer-facing machine learning models. Our primary focus recently has been our model to categorise customers’ bank transactions. Data Science owns the whole process, from the initial analysis to understand a problem through to building the production model in AWS. We even make changes to the FreeAgent app to interact with our models!
This approach to doing machine learning, where the same people build a model and put it in production, has become popular in recent years under the heading Machine Learning Engineering, as I’ve discussed elsewhere.
Now I’ve outlined who does what, I want to talk about why I think this is a great way to organise things.
One key benefit is that the same people design and build a given component of our data platform. This creates a sense of ownership as well as adding variety and challenge to people’s roles. Simply building or working with something you’ve had no part in designing becomes dull after a while. Hands-on experience building things also makes you a better designer.
The importance of breadth is even clearer for Data Analysts given their role as Data Evangelists within the business. Analysts work directly with the business to make us more data-driven. This gives them a unique insight into how we ought to build our data platform to maximise impact. Analytics teams who have to rely on another team, perhaps in a different org, for data transformations will either be perennially frustrated or will build a shadow data warehouse to serve their needs. Enabling analysts to do engineering, as we do, avoids these issues. This approach also empowers self-serve – if your Data Warehouse is set up with analytics in mind, people will find it a lot easier to do analytics themselves.
The final benefit of empowering data teams to do things for themselves is that you minimise hand-offs. The more end-to-end you allow a team to be, the less time you spend managing handovers and integrations between teams.
All these benefits sound great and the way we’ve structured our Data teams makes a lot of sense for FreeAgent. However, it’s worth considering the limitations of this approach and how they might be addressed.
Large enterprises with a wide range of products, data sources and technologies will likely find it difficult to do without dedicated architects. Our Analytics team looks after Data Modelling as part of doing analytics. This approach may become untenable for organisations with more complex data, requiring dedicated architects instead. Similarly, our Data Platform team design, build and maintain our data platform. Larger organisations with more complex data platforms may, again, need people dedicated to designing those platforms. However, we shouldn’t lose sight of the benefits that come from involving engineers and analysts in architectural decisions.
Another aspect of our approach that may not scale to larger organisations is the lack of dedicated Data Engineers. We are able to do our data transformations with tools accessible to Data Analysts and Data Scientists. If we had a larger volume of data in a wider range of formats we might need to use more specialised data engineering tools. It could be unreasonable to expect Data Analysts and Data Scientists to use these more specialised tools, which are better suited to dedicated Data Engineers. One interesting question to consider is how the latest generation of tools affects the point at which you need Data Engineers. Do cloud-native data warehouses like Redshift and BigQuery, combined with accessible transformation tools (Matillion, dbt), mean you can go a lot further without dedicated Data Engineers?
I feel like organisations of any size could include the Analytics Engineering and Machine Learning Engineering elements of our approach. Empowering analysts to do more engineering for themselves with tools that encourage software engineering best practices is always a plus. A quick search of ‘Analytics Engineering’ will show you the range of organisations adopting this approach. Equally, Machine Learning Engineering is here to stay as it just makes sense given the tooling and expectations for machine learning today.
This post has introduced Data at FreeAgent and the different teams we have. To recap, we have three teams:
- Analytics maintain our Data Warehouse and Looker instance, do analyses and help the business become more data-driven
- As the name suggests, Data Platform look after our data platform, the tools we use and how they all work together
- Data Science own the end-to-end production of Machine Learning models
As much as possible, teams own, design and build the tools that they use. This result is ‘T-shaped’ teams with breath outside of their core expertise. For example, in Data Science we have core expertise in machine learning while also being involved in the data platform to the left of our models and the serving infrastructure to the right. Taking such an approach gives teams the opportunity to drive their own impact within the organisation as well as creating varied and interesting roles for individuals.