Deriving and verifying the uncertainty on conversion rate predictions
For the past few weeks I’ve been working on building a machine learning model that can estimate the probability that a customer will convert from the free trial to a paid subscriber. In practice, I combine the predictions from this model for cohorts of companies, which are defined by their acquisition channel and acquisition month, and so a method is required for calculating the conversion rate uncertainties for each cohort.… Continue reading
How to count what counts
At FreeAgent we’re building a new platform to allow our teams to explore their data and glean new insights from it. The platform is built using Looker on top of Amazon Redshift, and so far it’s been enthusiastically received by the teams that use it. However, the process of building up the platform and driving adoption hasn’t been entirely straightforward. There has been a recurring issue that we’ve have had… Continue reading
Micro-batching Event Data Into Amazon Redshift
Data is at the heart of our business. We use data to make business critical decisions on a daily basis. It is important that this data is not only accurate but also available when required. Traditionally reports would be generated at a set schedule which made it difficult to decide on next steps in a timely fashion. New technologies like Amazon Kinesis Data Streams enable us to generate these reports… Continue reading
Separating job applicants in multiple dimensions
The team I work in at FreeAgent is achieving great things - from rolling out a new Business Intelligence tool, to working on machine learning models to improve our product. With so many ideas but not enough time to action them, we recently advertised a number of roles to expand our team. FreeAgent is a superb place to work, and the roles are a real opportunity for someone to achieve… Continue reading
Accurately ascertaining attitudes: designing unbiased survey questions
It would be great if we knew exactly what our customers thought so that we could adapt our products and improve our performance. However, it isn’t possible to read the minds of our customers, so surveys are a popular alternative to understand their attitudes. Yet, if the answers recorded on surveys don’t truly reflect the attitudes of the customers, how can we really know how to improve? This blog post… Continue reading
Sourcing a suitable sample: understanding selection bias in survey data
During my time at FreeAgent, I have been analysing attitudinal customer survey data to predict their behaviour. Getting to the bottom of how exactly this data was collected has helped me to understand the data and has given me a few ideas about how the data could be collected in the future. This blog focuses on how we choose who is selected to take part in surveys: a process is… Continue reading
Dealing with dirty data: useful functions for data cleaning in R
In this blog post, I’ll explain how to use some simple R-based data cleaning solutions (mostly in the ‘tidyverse’ package1) to address the most common dataset errors with the help of my favourite analogy: the untidy kitchen! NB: There are a plethora of valuable data cleaning tools in other software and even within R there are many different tools available. While the approach that I describe here is not necessarily… Continue reading
Clean house: clear mind. Clean data: clear findings.
Soon after settling in at FreeAgent and getting to grips with my role as a data science intern, I got the opportunity to present some of the data that I had been working on at a ‘town hall’, a company-wide weekly meeting where everyone gets together to present their work, share news and pitch ideas. The data I presented was attitudinal survey data from accountancy practices that had contracts with… Continue reading
Querying the past
I’ve been learning to love the ActiveRecord query interface over the past few months. Whilst I find it infuriating when I’m battling it to do what I actually want, I also relish the power and convenience it gives me for many simple queries. So, when it came to designing a query language for historical data in our systems, ActiveRecord was a natural choice. We can now do queries like: FA::Subscriptions.… Continue reading