Sourcing a suitable sample: understanding selection bias in survey data
During my time at FreeAgent, I have been analysing attitudinal customer survey data to predict their behaviour. Getting to the bottom of how exactly this data was collected has helped me to understand the data and has given me a few ideas about how the data could be collected in the future. This blog focuses on how we choose who is selected to take part in surveys: a process is… Continue reading
Dealing with dirty data: useful functions for data cleaning in R
In this blog post, I’ll explain how to use some simple R-based data cleaning solutions (mostly in the ‘tidyverse’ package1) to address the most common dataset errors with the help of my favourite analogy: the untidy kitchen! NB: There are a plethora of valuable data cleaning tools in other software and even within R there are many different tools available. While the approach that I describe here is not necessarily… Continue reading
Clean house: clear mind. Clean data: clear findings.
Soon after settling in at FreeAgent and getting to grips with my role as a data science intern, I got the opportunity to present some of the data that I had been working on at a ‘town hall’, a company-wide weekly meeting where everyone gets together to present their work, share news and pitch ideas. The data I presented was attitudinal survey data from accountancy practices that had contracts with… Continue reading
Querying the past
I’ve been learning to love the ActiveRecord query interface over the past few months. Whilst I find it infuriating when I’m battling it to do what I actually want, I also relish the power and convenience it gives me for many simple queries. So, when it came to designing a query language for historical data in our systems, ActiveRecord was a natural choice. We can now do queries like: FA::Subscriptions.… Continue reading