Down the Ruby Mine, Part III: Splat and splat again

Posted by on August 29, 2019

Hello and welcome to another Down the Ruby Mine. I’m Sam, one of the Engineering Interns working at FreeAgent over the Summer and I am here to shed some light on a Ruby language feature. If you’re out of the loop, you may have missed my previous posts which can be found here and here. Today we’ll be diving into the most questionably named Ruby feature out there: splat.

Functions are a developer’s best friend. They allow you to make your code more modular and readable without significant efficiency costs. A common roadblock that developers come up against, is allowing functions to take a variable number of arguments. Fortunately, Ruby provides two useful mechanisms for doing just that.

The * or splat operator converts an array into a list of arguments and vice versa. If we want to feed arguments from an array into a function we can do the following:

def show_arguments(arg1, arg2, arg3)
  puts arg1
  puts arg2
  puts arg3
end

argument_array = ["Arg1", "Arg2", "Arg3"]

show_arguments(*argument_array)
# Arg1
# Arg2
# Arg3

Notice that by using the splat operator we can feed in each element of the array as a separate argument. Splat does not have to be used alone in a function call, it can be used alongside other arguments:

def show_arguments(arg1, arg2, arg3, arg4)
  puts arg1
  puts arg2
  puts arg3
  puts arg4
end

argument_array = ["Arg2", "Arg3"]

show_arguments("Arg1", *argument_array, "Arg4")
# Arg1
# Arg2
# Arg3
# Arg4

Approaching from a different angle, the splat operator can be included in the function declaration. This allows a variable number of arguments to be gathered into an array that is accessible from within the function:

def show_arguments(*arg_array)
  arg_array.each do |arg|
    puts arg
  end
end

show_arguments("Arg1", "Arg2", "Arg3")
# Arg1
# Arg2
# Arg3

argument_array = ["Arg1", "Arg2", "Arg3"]
show_arguments(*argument_array)
# Arg1
# Arg2
# Arg3

Splat works well for positional arguments but does not provide any benefits if your function takes in keyword arguments. Luckily we can use the ** or double splat to resolve this. (Yes, Ruby devs have really thought of everything!)

In the same way that splat allows arrays to be decomposed into arguments, double splat extends this to hashes. Below we can see that a double splat-ed hash can be used on a function that takes in keyword arguments:

def show_arguments(first: "first", second: "second")
  puts first
  puts second
end

show_arguments
# first
# second

argument_hash = {first: "new_first", second: "new_second"}
show_arguments(**argument_hash)
# new_first
# new_second 

Similarly if you add the double splat in the function declaration, keyword arguments are parsed into a dictionary which can be used inside the function:

def show_arguments(**arg_hash)
  arg_hash.each do |key, value|
    puts key
    puts value
  end
end

show_arguments
# 

show_arguments(first: "first_arg", second: "second_arg")
# first
# first_arg
# second
# second_arg

If you really want to impress your friends, you can combine both splat and double splat into what is not-so-commonly-known-as a multi-splat-function. This is useful when you want to receive a variable number of optional positional and keyword arguments:

def show_arguments(main_argument, *additional_positional, **additional_keyword)
  puts main_argument

  additional_positional.each do |positional_arg|
    puts "Positional: #{positional_arg}"
  end

  additional_keyword.each do |keyword_arg_key, keyword_arg_value|
    puts "Keyword: key -- #{keyword_arg_key}, value -- #{keyword_arg_value}"
  end
end

show_arguments("main_arg")
# main_arg

show_arguments("main_arg", first: "first_arg", second: "second_arg")
# main_arg
# Keyword: key -- first, value -- first_arg
# Keyword: key -- second, value -- second_arg


show_arguments("main_arg", "other_arg", first: "first_arg", second: "second_arg")
# main_arg
# Positional: other_arg
# Keyword: key -- first, value -- first_arg
# Keyword: key -- second, value -- second_arg

When calling a function the splat operators can be used to turn an array or hash into a list of arguments. Additionally, they allow a variable number of arguments to be gathered up into an array or dictionary within a function. They are two important tools in any Ruby developer’s toolbox, so go out there and use them without hesitation!

I hope that this trio of Ruby tidbits has sated your thirst for language features. As a software engineering intern at FreeAgent, I have been developing an understanding of Ruby and a multitude of cool Rails features throughout my time here. Everyone has been friendly and helpful, happy to answer questions and provide constructive criticism. One thing that has continually stood out to me during my time at FreeAgent is the constant cross-team communication. As cliché as it sounds working here is like being part of a family, one with technically-literate, modern parents and supportive siblings.

Down the Ruby Mine, Part II: Ruby’s seemingly illogical logical operators

Posted by on August 23, 2019

Hello there, my name’s Sam and I’m one of the Summer 2019 Engineering Interns at FreeAgent. As part of my time here I’m writing a series of blog posts on Ruby language features. If you’re a first time Down the Ruby Mine reader then don’t fret, because the posts aren’t dependant on each other. However, if you are interested, you can find the first post here. Today we’ll be exploring the wondrously strange world of Ruby’s logical operators. 

In a language like Java, logical operators are straightforward, uninteresting constructs. They are typically binary operators that take two parameters and return a boolean (true or false value). Below is an example in Java:

boolean this_is_true = true
boolean this_is_false = false

this_is_true && this_is_false
// false

this_is_true || this_is_false
// true

On first inspection Ruby behaves the same way:

this_is_true = true
this_is_false = false

this_is_true && this_is_false
# false

this_is_true || this_is_false
# true

However, the behaviour of the boolean operators change when we move away from true or false values. Ruby groups all objects into two boolean categories: truthy or falsy, using this property to evaluate boolean expressions. Everything in Ruby is considered truthy except nil and false. So in Ruby, 0 is evaluated as true in a boolean expression. What a crazy world we live in!

this_is_true = 5
this_is_also_true = true
this_is_false = nil

this_is_true && this_is_false
# nil

this_is_true || this_is_false
# 5

this_is_true && this_is_also_true
# true

Ruby’s boolean operators behave spookily with non-boolean arguments

Ruby’s boolean operators do not always return boolean values. As shown above, they return the last argument evaluated in the expression. In the case where an expression is short-circuited (it does not need to evaluate the second argument to know the outcome) the first argument is returned. This can be seen in the second example above: when the first argument of a logical OR is true, the resulting expression is always true so the first argument — 5 — is returned. 

Now that you are a master of Ruby’s boolean expressions we can discuss the hip assignment operators ||= and &&=.

||= is a widely loved assignment operator, used to assign an object to a variable if that variable is empty. To be more specific, it checks if the object on the left hand side is falsy and if it is, assigns it the object on the right hand side. Under the hood it is performing the logical OR operation:

a ||= b

# is equivalent to

a = (a || b)

If object a is falsy then it gets assigned b. If a is truthy then it gets assigned itself (doesn’t change)

Thus, it is commonly used for conditionally assigning to a variable when it evaluates to nil. Be reticent though, with great power comes great responsibility. Remember that the other falsy value in Ruby is false. Therefore, this operator also triggers an assignment when the first argument is false.

The &&= gets less love and attention from the community, yet it still performs its operation dutifully:

a &&= b

# is equivalent to

a = (a && b)

If object a is truthy then it gets assigned b. If a is falsy then it gets assigned itself (doesn’t change)

The inverse of ||=, the &&= operator can be used to assign a variable a new object when it already points to an existing (truthy) object. The most common use of &&= is when assigning an object the result of a method call on itself. By using this operator you can ensure that the object will only be reassigned if it existed in the first place. In general, it is less obvious to find applications for the &&= operator but that makes it all the more special when you do!

Congratulations for making it to the end. With your new-found logical knowledge you can ponder some of humanity’s oldest questions that have troubled philosophers for millennia. I’ll be back with another post in a week or so, so you can tell me what answers you’ve found then!

Delivering Practice Insights: My internship at FreeAgent

Posted by on August 22, 2019

The phrase ‘practice insights’ was something that I heard very often in my first couple of weeks at FreeAgent, between wrapping my head around the data model and trying hard to remember everyone’s names. The reason I mention it in the context of the first couple of weeks is because at that point I didn’t fully know what it meant. Of course I knew what the words meant, and I knew that this phrase was the base for my project but delivering these ‘practice insights’ seemed like the answer to a question that I hadn’t yet figured out. 

After just the first couple of meetings I began to realise that, with my project, there was potential to have a real impact on the accountancy practices we work with by providing them with information about how their clients use FreeAgent. The question, for me, felt like ‘how can we use our data to understand and empower our accountancy practice partners?’. This aligns with one of our company-level objectives; to delight the accountancy practices we work with, so I set out to try and answer this question to the best of my ability.

Project summary

My project has two main stages which will hopefully link together towards the end of my internship. The first stage was the creation of insights dashboards to show details of accountancy practices and their activity, which are now being trialed in the company. The second part of my project consists of segmentation of customer behaviour to measure engagement. In this post I will be talking about step 1: the dashboards.

Getting information at a glance: dashboards

At this point you might be wondering what I mean by ‘dashboards’. In this context a dashboard is a grid of summary plots to provide accountancy practices with high-level information about their FreeAgent usage, benchmarked against their peers. Throughout this section I’ll provide some examples of the visualisations we share with FreeAgent’s practice partners. The tool we use at FreeAgent to do this is Looker, which is a business intelligence software. Having never worked with Looker before it was daunting in the beginning, but after a few weeks of running queries to get a feel for it, it became second nature to explore our data and get the results I needed in minutes. 

Figure 1: Practice Specific Dashboard

There are three dashboards, the first shows aggregated overall insights into the accountancy practices, the second shows insights for a specific accountancy practice (Figure 1), and the third shows insights for a specific company. The idea is that they each link to each other and can be used in conjunction to get a better overall view of the accountancy practice partners and their clients. I’ll talk through a couple of the plots from the practice specific dashboard and why they could be of interest, which will hopefully give you a good overview of the sorts of things they can be used for without going into too much detail.

Figure 2: Percentage time spent on the mobile app by end-users

The plot in the top left corner of the dashboard (closeup in Figure 2) displays the percentage of total time spent on the FreeAgent application that is accessed via the mobile app, averaged across the clients of this specific practice by month. The green line shows the average across all clients of the specific practice and the blue line shows the average across clients of all practices. This visualisation quickly summarises adoption of the mobile app by accountancy practice clients  and provides an opportunity to start a conversation with a given practice on whether they are aware of different mobile app training that we offer.

Figure 3: Accountant usage ratio by month
Figure 4: Accountant usage ratio by tenure day groupings

A few of the plots refer to the ‘account manager usage ratio’ (Figure 3, Figure 4). This is defined as the fraction of a client’s FreeAgent usage that can be attributed to an accountant assuming the clients FreeAgent account. We would expect this ratio to decrease over time if an accountancy practice is managing to onboard their clients efficiently. The bar chart shown in Figure 4 allows us to explore how practices stack up against this expectation by plotting  the account manager usage ratio for different FreeAgent tenure groupings. In the example shown, we can see that this specific practice has a lower account manager usage ratio in the beginning of their client’s time with FreeAgent, which then increases over time. This information can be used to frame a discussion with the practice about their onboarding journey and the time they spend continually maintaining their client’s records.

Figure 5: Accountant vs. end-user usage in minutes over the last 6 months

Another useful visualisation is the scatter plot in the bottom right hand corner (Figure 5). This plot displays end-user usage in minutes versus account usage in minutes, with each blue dot representing a company. Companies with high accountant or end-user usage are easily spotted in this visualisation, allowing for especially time-intensive companies to be examined in more detail. To do this, each of the dots in this scatter plot provide an option to drill into the company specific dashboard by clicking on it.

Company specific dashboard

Figure 6: Weekly activity in minutes for accountants and end-users
Figure 7: Percentage of time accessing FreeAgent from each data source

The company specific dashboard contains some panels which provide general information regarding the company’s name, subscription status, type, last known active date, number of days since activation and number of active bank feeds. It also shows a breakdown of usage between the accountant and end-user by week (Figure 6), a list of the actions they frequently perform whilst using FreeAgent and a bar chart showing the media used to  access the FreeAgent application (Figure 7).

Day-to-day dashboard usage

These dashboards were created with the purpose of helping teams within FreeAgent understand the accountancy practices as well as having the possibility to inform the accountancy practices themselves about their data. Collecting and warehousing information on the activity and usage of both companies and accountancy practices has allowed us to delve deeper into FreeAgent usage patterns and to develop a new tool to aid our communication with our practice partners. I’m glad to say that these dashboards have been really well received by everyone involved, and are being used to drive conversations with practices already!

This project has been really exciting for me, especially due to the opportunity to create something that could actually be adopted into the day-to-day work of other people within the company. I didn’t expect to be so involved across multiple areas of the business  during my internship, but the positivity and support that I have received throughout this project has been really outstanding. I can’t believe that there are only a couple of weeks left now, the summer really has flown by.

Down the Ruby Mine, Part I: The code insertion trinity

Posted by on August 16, 2019

Hey there, my name’s Sam and I am one of four software engineering interns working at FreeAgent over the summer. This is my first time writing in Ruby and I’ve had a great time exploring the language. As developers I believe it’s important to develop a fundamental understanding of the core of a language, even when its supplemented by a feature-rich framework like Rails.

Over the next few weeks I’ll be releasing a series of blog posts exploring an eclectic bunch of language features that I’ve encountered during my time at FreeAgent. The function of these features were not clear to me upon first inspection and I hope that by sharing their inner workings with you, we will all end up closer to enlightenment. Time to take a good ol’ fashioned spelunk into the magical, awe-inspiring depths of Ruby.

One thing I first noticed navigating the FreeAgent codebase was the ubiquitous use of the include statement. When it first caught my eye I froze and a cold feeling permeated my body; the fear of the unknown. However, once I had recovered, I realised it looked similar to constructs in languages I was familiar with. It was reminiscent of the import keyword in Python or Java used to get code from one file into another file – a quick google search confirmed this. Venturing on with renewed courage I soon encountered extend and reeled out of my chair in fright. To prevent further office disruptions, I decided it was time to flesh out these keywords once and for all.

Include and extend are closely related to a third keyword, prepend, which I’ll explain for completeness. All three keywords are used to add functionality from one module into a class or another module. The differences between them lie in the way they interact with the class/module being added to.

Before we go into how they work it’s important to understand Ruby’s class hierarchy. Ruby’s object oriented design results in each class having a list of ancestors that form a hierarchy of inheritance. The top-most ancestor of all objects is the BasicObject class — since Ruby 1.9 — while the bottom-most is (almost always) the class itself. Let’s take a look at the ancestors of the friendly Array class:

The Array class is a descendant of many other ruby classes from which it inherits functionality

If you don’t specify a parent class when creating a class in Ruby, the class will implicitly inherit from Object. Object’s parent is Kernel and Kernel’s parent is BasicObject so don’t be alarmed when you find Object, Kernel and BasicObject lurking in the ancestor hierarchy of most Ruby classes.

With the ancestor hierarchy in mind we can begin to dive into the include keyword. Include inserts the included module as a parent into a class or module. The included module’s methods are then accessible as instance methods. When inserted into a class this means that objects of that class can call the included module’s methods. The code below illustrates how this works:

module IncludedModule
  def was_included
    puts "Hello, I am from the IncludedModule"
  end
end
class MainClass
  include IncludedModule
end
MainClass.ancestors
# [MainClass, IncludedModule, Object, Kernel, BasicObject]

mc = MainClass.new
mc.was_included
# Hello, I am from the IncludedModule

We can see that the IncludedModule has been inserted as a parent to MainClass. IncludedModule’s methods are accessible to MainClass objects, meaning they are now instance methods of MainClass.

Note that as modules cannot be instantiated, including a module in another module does not have an obvious effect (besides altering the module’s ancestors). However, the included module’s methods are still added as instance methods. So when a class includes this module, it will allow objects of the class to access both the module’s methods and the methods of the module, the module included. If at this point you feel sick of the word “module”, let me apologise and lay it out visually:

If B includes A and C includes B then an object d, of class C, can use methods from A and B

Extend works like include but instead of adding included methods as instance methods, it adds them as class or module methods. This means that they can be accessed from the class or module but not from an instance:

module ExtendedModule
  def was_extended
    puts "Hello, I am from the ExtendedModule"
  end
end
class MainClass
  extend ExtendedModule
end
MainClass.ancestors
# [MainClass, Object, Kernel, BasicObject]

MainClass::was_extended
# Hello, I am from the ExtendedModule

mc = MainClass.new
mc.was_extended
# NoMethodError

Extend is unique in that it does not change the inheritance hierarchy

module MainModule
  extend ExtendedModule
end
MainModule::was_extended
# Hello, I am from the ExtendedModule

As a bonus it allows modules to share each other’s methods

Last but not least is prepend. Prepend works in a similar way to include but instead of the inserted module being added as a parent in the inheritance hierarchy, it is added as a child. This means that if there are overlapping methods between the included module and the main class/module the included methods override the others:

module PrependedModule
  def was_prepended
    puts "Hello, I am from the PrependedModule"
  end
end
class MainClass
  prepend PrependedModule

  def was_prepended
    puts "Hello, I am from the MainClass
  end
end
MainClass.ancestors
# [PrependedModule, MainClass, Object, Kernel, BasicObject]

mc = MainClass.new
mc.was_prepended
# Hello, I am from the Prepended Module

As the prepended module sits at the bottom of the hierarchy, its method gets called instead of the method defined in the class

Hopefully this has taught you something new about Ruby’s code insertion keywords. Moving code around programmatically is often overlooked, but knowing exactly the right keyword for your use-case could prevent a few headaches down the line. Stay tuned for another post next week, I’ve lined up something that will have you on the edge of your seat: logical operators.

Deriving and verifying the uncertainty on conversion rate predictions

Posted by on August 12, 2019

For the past few weeks I’ve been working on building a machine learning model that can estimate the probability that a customer will convert from the free trial to a paid subscriber. In practice, I combine the predictions from this model for cohorts of companies, which are defined by their acquisition channel and acquisition month, and so a method is required for calculating the conversion rate uncertainties for each cohort.

Uncertainty matters

The uncertainty on an observed value quantifies how confident we are that the observation wasn’t a fluke1. In this context, providing the value of a conversion rate without also providing the uncertainty on that value is dangerous and open to misinterpretation. For example, an acquisition channel with a predicted conversion rate of 30%, with an uncertainty of 2% may be more successful than one with a predicted conversion rate of 90% but with an uncertainty of 80%! If uncertainty values were missing we would be misled by the prediction values and come to the conclusion that the second acquisition channel is much better than the first.

The sample of probabilities of subscription is an example of the occurrence of the little-known Poisson-Binomial distribution2 in a real-life process. In this post I will cover how I derived the uncertainty value for the conversion rate analytically, using the Poisson-Binomial distribution, and how I verified its accuracy numerically using a Monte Carlo simulation3.

Bernoulli Distribution

For each company the model outputs a probability of subscription p_i which corresponds to a random variable X_i. This is an example of a Bernoulli random variable which has only two possible outcomes – success (X_i = 1) or failure (X_i = 0). In this case, success is subscription which has probability p_i.

The expected value of the random variable  X_i is:

    \[E(X_i) = 0 \times Pr(X_i = 0) + 1 \times Pr(X_i = 1) = p_i\]

and the variance:

    \[Var(X_i)=E({X_i}^{2})-[E(X_i)]^2\]

    \[=0^2 \times Pr(X_i=0) + 1^2 \times Pr(X_i=1) -[E(X_i)]^2\]

    \[=p_i-{p_i}^2=p_i (1-p_i)\]

Poisson-Binomial Distribution

For a cohort of companies, we have a sequence of not necessarily identically distributed Bernoulli random variables, each with a probability of success p_i. To predict the conversion rate of a cohort we need to estimate the number of companies that are expected to subscribe. Let this be denoted by Y, which is the sum of all the Bernoulli random variables

    \[ Y= X_1 + X_2 + ... + X_n = \sum_{i=1}^{n} X_i \\\]

where n is the number of companies in the cohort.

Therefore, the expected number of companies to subscribe in the cohort is:

    \[E(Y) = E(\sum_{i=1}^{n} X_i \\) = \sum_{i=1}^{n} E(X_i) \\ = \sum_{i=1}^{n} p_i \\\]

Since we assume each company subscribes independently of each other, the Bernoulli random variables are independent and we can simply sum their variances to calculate the variance of Y:

    \[Var(Y) = Var(\sum_{i=1}^{n} X_i \\) = \sum_{i=1}^{n} Var(X_i) \\ = \sum_{i=1}^{n} p_i(1-p_i) \\\]

The sum of independent non-identical Bernoulli trials follows the Poisson-Binomial Distribution. The more well-known Binomial Distribution is a special case of the Poisson-Binomial Distribution where all Bernoulli random variables have equal probabilities of success.

Let C denote the conversion rate. For a cohort of n companies, we have C=Y/n. Therefore:

    \[E(C)=E(Y/n)=\frac{1}{n}E(Y)=\frac{1}{n} \sum_{i=1}^{n}p_i\\\]

This is just the arithmetic mean of the probabilities.

The uncertainty on the conversion rate C is its standard deviation, which is the square root of the variance.

    \[ SD(C)=\sqrt{Var(C)}=\sqrt{Var(Y/n)}\]

    \[=\sqrt{\frac{1}{n^2}Var(Y)}=\frac{1}{n}\sqrt{\sum_{i=1}^{n} p_i(1-p_i) \\}\]

Calculating the conversion rate and uncertainty using python

For this we are using numpy

import numpy as np

For the purposes of this example we are going to create an one-dimensional array of 1,000 fake randomly generated probabilities of subscription.

np.random.seed(0)
n = 1000
probs = np.random.rand(n)

We can calculate the expected conversion rate along with the relevant uncertainty (standard deviation) based on the formulae derived above:

conversion_rate = np.mean(probs)
uncertainty = np.sqrt(np.sum(probs*(1-probs))) / n

For this example the conversion rate is found to be 0.49592 and the uncertainty 0.01287

Verifying the results using a Monte Carlo simulation

The general idea of a Monte Carlo simulation is to repeat random sampling multiple times to (in this case) estimate the conversion rate and its uncertainty. For each company I took a random number and if the probability of subscription of the company is greater than that number, I consider the company subscribed, otherwise not subscribed. After this process is completed for all companies I calculate the conversion rate and store it in a list. I repeat this process 10,000 times and then I calculate the mean of all the conversion rates as the estimated conversion rate. The uncertainty is the standard deviation of this set of conversion rates.

MC_conversion_rates = []
for i in range(10000):
    thresholds = np.random.rand(n)
    subscribed = probs > thresholds
    MC_conversion_rates.append(sum(subscribed) / n)
MC_conversion_rate = np.mean(MC_conversion_rates)
MC_uncertainty = np.std(MC_conversion_rates)

The estimated conversion rate from the Monte Carlo simulation is 0.49592 and the uncertainty of this is 0.01289, which is consistent with the result derived above.

The next steps for my project are to integrate the subscription probability predictions into a cloud-based machine learning pipeline. This pipeline will generate new predictions each day, allowing us to surface the predictions and uncertainties to stakeholders in the business, to help optimise our marketing efforts in different acquisition channels.

References

  1. Holdgraf, C. (2014). The importance of Uncertainty. [Blog] Berkeley Science Review. Available from: https://berkeleysciencereview.com/importance-uncertainty/ [Accessed 8th August 2019]
  2. Wang, W. H. (1993). On the number of successes in independent trials. Statistica Sinica. 3: 295-312. Available from: http://www3.stat.sinica.edu.tw/statistica/oldpdf/A3n23.pdf [Accessed 8th August 2019]
  3. Pease, C. (2018). An overview of Monte Carlo methods. [Blog] Towards Data Science. Available from: https://towardsdatascience.com/an-overview-of-monte-carlo-methods-675384eb1694 [Accessed 8th August 2019]