Control freaks: how we built our own version control for Matillion

Posted by on April 27, 2021

We’ve all been there – it’s a couple of days or even the night before an important deadline and one of the following happens:

  • You lose your memory stick with all your work on it and scramble to recreate it but can’t get it back to the way it was.
  • You make some more ‘advancements’ to your work and suddenly everything is broken and you can’t seem to get it back to the state where it was functioning as before.

I was in my second year of university when the first scenario happened two days before I was due to hand in a piece of coursework that I was particularly proud of. I remember that horrible sinking feeling when I realised that I had no clue where my little blue memory stick that usually lived on my keys was, and therefore had no clue where my coursework source code was. I promptly wrote my lecturer an email pleading for another few days to recreate the work I had lost and then sat in the computing suites multitasking between rapidly refreshing my inbox awaiting her reply, biting my nails, and trying to remember what my previous version of the code looked like. 

After what felt like an eternity, but was more realistically about an hour and a half, a response came in – not the one I wanted, though. It read along the lines of:

“I’m sympathetic to your situation, Lea, but I cannot grant you any leniency here. After all, you are a second-year computing student and you really should be backing up your work at this point. Haven’t you heard of version control or GitHub?”

I had not. I can tell you one thing though: I have not created a project since which hasn’t lived on GitHub or been backed up somewhere else, and I will most likely never make that mistake again. 

What is version control?

Imagine you’re walking down a path and you come to a junction. You think that taking a right turn is going to make your walk much nicer. It looks sunnier and the grass is green, so you turn off to the right. After walking for a while down this new route, you hit a river and you realise you can’t progress and need to go back to the junction and make a different decision. However, the only way for you to get back to the junction is to retrace your entire journey step by step – sounds tricky, right? 

Similarly, when writing code, sometimes changes and advancements to your code can work out great, but sometimes you’ve overlooked where your changes could create a problem, and you need to go back. This is where the beauty of version control comes in, because the version control software tracks and stores each version of your code (each step you make). So we could easily jump back to the version where we’re standing at the junction, and then choose not to take a right, or rather a wrong, turn.

A popular version control software is Git, often used via GitHub, which is an internet hosted version control site for tracking software development. We use GitHub at FreeAgent to track changes to most of our code-based projects.

What is Matillion?

Matillion is the ETL (extract, transform, load) tool we use at FreeAgent for loading our data into the database. We use Matillion to load and transform our data into the best structure for reporting, meaning that the state of Matillion at any given time needs to account for the structure of our database at that time. We are constantly making changes in Matillion to include more data in our database and make adjustments to data that already exists, so it makes sense for us to keep track of those changes.

Matillion has a great user interface for visualising components in the database:

This makes it quite easy to see the different steps happening in any job without having to deal with complicated SQL. The downside of this is that there is no readable ‘code version’ of the state of Matillion at any given time, making it difficult to track changes made or search through code to see where a particular component or table is used in different jobs. This is why the ability to do a global search of Matillion is useful, because we can search through the code to find specific things we are looking for. The other main reason is related to the story I told in the beginning: it’s always good to have a backup of code in case of an error. Before the parser was implemented, if something broke due to a change in Matillion, we would have to rack our brains to figure out all the changes we made the day before and what could have caused the problem.

Matillion actually does have a git integration which the Analytics team set up when we started using the tool, but unfortunately it outputs a single very long line for each file which is almost unreadable and makes it really hard to see what’s going on:

Our solution? The Matillion parser.

How does the Matillion parser work?

The parser is a Python3 script that uses the Matillion and GitHub APIs to pull data from Matillion, transform it into a readable format and push it to GitHub. Below is an overview of the steps in the process:

  1. First, the script calls the Matillion API to get a list of the names of all our orchestration and transformation jobs. Then, for each of those jobs, it gets the JSON of content from the request and formats the JSON in a readable way using the following python json.dumps() function with the ‘indent’ argument:

The data is then encoded in a format that can be read and compared with content from the GitHub API, which requires base64 encoding and a secure hash. 

For the content to match correctly when checking to see if a file has changed, it needs to be encoded and decoded in UTF-8 as well, so the following line is used to encode the JSON from the Matillion API to compare to the GitHub API:

  1. The parser then makes a call to GitHub to check if the corresponding file for that job exists already, and if it does, whether it needs to be updated or not. 
  2. If a file needs to be created or updated then the parser makes a PUT call to the GitHub API to update or create as required. Finally, the script does a check to match the most recent list of jobs in Matillion with the current list of corresponding files in GitHub to see if there are files in GitHub that are no longer in the Matillion output, then it makes a call to delete those files – this keeps the repository up to date and not cluttered with files corresponding to jobs we no longer use.

How we use the parser to improve our time with Matillion.

So how does any of this help us? There are two main reasons: 

  1. We can track all of the changes we have made to Matillion and see when they were made. 

This reason, which luckily we haven’t needed to use the parser for yet, relates to my initial story – needing backup. Say that one day we turn on our laptops and see that there have been some alerts about something in our database being broken which relates to changes that have been made in Matillion. Now we can quickly and easily check the parser to see all of the changes that were made the previous day in the GitHub commit history:

  This means that we will hopefully, quite quickly, be able to find the problem with less investigation and reliance on memory, and make the appropriate adjustments to the database and fix the problem.

  1. We can treat our Matillion setup as a searchable codebase. 

This reason, which helps prevent the need for the first reason, is the ability we now have to search the GitHub repository. Now we can easily search the name of a component or table and get a list of every single place that component is used. In other words, we are far less likely to accidentally break something because we can know, with certainty, which other jobs will be affected by a change we are about to make and can preemptively make the appropriate changes before something gets wrongly affected and broken. This has already saved us time, as we no longer have to manually look for where a component is repeatedly used and hope that we don’t miss anything!

Control freaks? Or just common sense?

So maybe my initial title was a bit harsh. We aren’t control freaks – we just wanted a better solution to monitoring what was changing in our database ETL tool so we don’t end up like poor second-year Lea, staying up till the early hours of the morning having a near-breakdown over a misplaced semicolon. Now – thanks to the Matillion parser – we won’t.  

A day in the life of a Web Analyst

Posted by on April 9, 2021

Hi, I’m Lana and I’m a Web Analyst within the Analytics team here at FreeAgent. I primarily look after our Google platform products such as Google Analytics 360, Google Optimize, Google Data Studio and Google Tag Manager. When I tell people my job title, usually the next question I get is ‘So what is it you actually do then?’ To help answer this question, I decided to spend a day recording everything I did…

9am – 10am: Team retrospective

The first meeting of the day is usually our team’s daily stand-up meeting at 9.10am. However, today is slightly different – we have our retro team meeting where we look back over the past few weeks to see what’s been working and what could be improved. We like to use Google Jamboard for these meetings as it’s very interactive and works well remotely.  We spend around 10-15 minutes adding post-it notes to the board about anything and everything we want to discuss or mention. The remaining time is spent talking through each card, spotting common themes and making notes of any learnings we want to apply in future. Some key things that are raised during this retro include looking at different ways we can be more productive when working from home and how we can continue to optimise our current work processes. Retros are usually a fun group session for us and today’s session is very positive – we all come away agreeing that we love working with each other. 

10am – 12pm: Matillion and Looker work

This week I’m working on something totally different for me which is a bit outside my comfort zone. To broaden my skill set and learn new things I’ve been discovering how to move data around using Matillion (our ETL tool) and then present that data in our business intelligence  tool, Looker.

Each morning for the past week I’ve set aside a few hours to work with my line manager, Rob, and we focus solely on this project. Rob recently wrote a blog about the similarities between Looker and supermarkets and I’ve found this to be a useful metaphor to help me understand how the platform works. Overall it’s been quite intense and I feel like I’ve been chucked in at the deep end a bit, as Rob’s encouraged me to do most of the work rather than just sit and watch, but I really feel like I’ve started to get to grips with how the process works and how data gets from one location (or several) to another.  

Loading data into staging table

12pm – 12.15pm: Meet McTagger

Straight from Matillion and Looker work I have a quick meeting booked in to show a colleague in the Support team how to use our in-house UTM builder, which we have lovingly named McTagger. I worked with our website team last year to build this when we realised we had the demand internally for a tool which could help to standardise the UTM tracking whilst also helping to reduce any common, easy-to-make mistakes such as capitalisation (as these appear separately within GA and can ultimately dilute your data). McTagger is a simple solution (essentially it’s just a very fancy spreadsheet with some scripts), but it has been really effective. Since we launched it last year it’s helped to generate over 1,200 tracked links and is used throughout the business by several teams. 

12.15pm – 12.30pm: Slack check-in

As I’ve been on calls most of the morning, I’ve not had much chance to reply to any Slack messages so I have a quick browse. One message is asking for advice around adding Google Analytics tracking in our Knowledge Base section of the website so I have a quick look and play around to check the current tracking and pass on my comments and suggestions.

12.30pm – 1pm: Build transformation jobs

Towards the end of my Matillion/Looker session with Rob earlier, he challenged me to build three more transformation jobs in Matillion all on my own and, since I have some free time now before lunch, I have a go at building these. It’s slightly daunting as I’ve only got a few hours of Matillion experience behind me, and while I do feel like I’m getting to grips with it, it feels a bit like when you pass your driving test and you go on that first drive on your own and there’s no one else in the car telling you what to do. But Rob assures me that I can’t break anything so I give it my best shot and build what I hope are the correct transformation jobs. 

1pm – 2pm: Lunch

(I had beans on toast, if you were wondering.) 

2pm – 2.30pm: 1-2-1 with line manager

As I’m the only Web Analyst at FreeAgent I spend most of my time working across different teams, in particular with our Comms and Website teams. During our weekly 1-2-1 I normally catch Rob up with everything I’ve been working on, but this week we joke that we have nothing to talk about as we’ve spent every day working together – and he knows exactly what I’ve been up to! We eventually find something else work and non-work related to talk about before deciding we should head to our team’s daily catch-up call. 

One tradition that we do have though is at the end of each 1-2-1 we need to share a recipe with each other, and today’s winner is a homemade sage and nutmeg sausage roll (it’s a Jamie Oliver recipe). 

2.30pm – 3pm: Wider team catch-up 

Since the company has gone fully remote we’ve made a real effort to ensure we still have time for a general socialise between the Analytics team and Data Science team. For the past year we’ve been working though several game shows such as Pointless, Family Fortunes and Who Wants to be a Millionaire. Being a data-driven group of people, we started to become quite competitive with these and maybe took it a bit too seriously trying to decide who would win each game. We’ve now reached the point where we’ve run out of game shows and have resorted to Sporcle quizzes, so today we play a logic quiz on countries. It’s actually harder than we anticipated and we aren’t able to complete it, so we screenshot our current progress and agree to finish it during tomorrow’s catch-up. 

3pm – 3.15pm: Unplanned chat 

At the end of catch-up, Rob asks Lea and me for a quick chat about an upcoming project. He’s just come from another meeting where he has some feedback from other users within the business, so we make a quick note for when we begin work on this in a few weeks.

3.15pm – 3.30pm: Optimize check-in

We are currently running a redirect A/B test on our website via Optimize and the test is approaching its halfway point, so I just give the data a quick health check to ensure everything looks fine and the traffic split still looks even, which it does. Once the test has completed, we’ll do our in-depth analysis to see how well the variation has done and present the findings to the main stakeholders (and ultimately the rest of the business). 

3.30pm – 4pm Comms meeting

As I work quite closely with the Comms team, I join one of their regular meetings so I can be aware of what they’ve been up to and any upcoming work that they have planned. There’s always so much going on so I like this meeting as it brings it all together in one place. This week however I have to duck out early as I have some interview prep to do. 

4pm – 4.15pm: Interview prep

We’ve been recruiting for the past few months, looking for someone for our summer internship role that we run each year, and we’re at the final stage of interviewing our top candidates. Part of the hiring process involved sending out a two-part task to the candidates where we tested them on analysing data and presenting their findings. We had a lot of fun this year creating the test and we specifically left one part of the task open to the candidates’ interpretation so it’s been interesting to see the variety of responses that have come back from this.  

While I have been heavily involved in the intern hiring process this year and have already reviewed all the candidates’ applications, I still use this time to review the interviewee’s application, CV and task submission so that it’s fresh in my mind and I can make notes of any specific questions that I want to ask during the interview. 

4.15pm -5pm: Intern interview 

During the interview we ask the candidate some questions around their task submission, alongside some more general questions around why they are interested in the internship. Our previous interns have done some really cool projects in the past which are still being used today and we’ve been pleased with the quality of the candidates this year so we’re excited to see what can be achieved by our successful candidate.

5pm: Home time

And that’s how my day went – although I should admit that as a Web Analyst I’ve never worked the same day twice. One day I could spend the whole day working on just one project whereas the next I could be working on multiple different things – it just depends on what the demands of the business are at that time. But that’s what keeps it fun and interesting!