Here at FreeAgent we, like so many other workplaces around the world, have been adjusting to a fully remote setup over the past few weeks. Whilst a significant number of our company’s employees are permanently home-based, only one of our seven Analytics & Data Science team members is usually based away from our Edinburgh office. It has felt strange.

We decided fairly quickly that we needed to create new opportunities for the team to ‘convene’ in lieu of the facetime we’d usually get in the office. We scheduled a short daily catch-up after lunch: its primary purpose was to make sure everybody was doing okay and perhaps to discuss the weather or our snacking levels.

Sometime during the first week, I suggested a quick round of Pointless during one of these catch-ups. I’d been gifted a Pointless quiz book a couple of years before, and I thought it’d tap into the competitive nature of many members of our team. I didn’t know the half of it…

## The Game

For the uninitiated, the concept of Pointless is pretty simple. Behind the scenes, 100 members of the UK public were quizzed on a range of weird and wonderful questions. For example: *in 100 seconds, name as many capital cities of European countries as you can*. 98 might have said *London*, 92 might have said *Paris*, 6 might have said *Zagreb*. The contestants on the show are then shown the same question (*name a capital city of a European country*) but they would have to give the most *pointless* answer that they could think of – that is, the answer that fewest out of the 100 gave. In this case, *London* or *Paris* would have been bad answers, whereas *Zagreb* would’ve been a good one. The person who gives the best answer is the winner.

There were four players in our first game. As any good data analyst would do, I tracked players’ responses and scores in a spreadsheet. Ipek, one of our BI analysts, won the first game. We applauded her, discussed the possibility of playing again the following day, and went about our afternoons.

Player | Answer | Score ↓ |
---|---|---|

Ipek | Ljubljana | 1 |

Lana | Reykjavik | 4 |

Rob | Bucharest | 4 |

Dave | Riga | 8 |

The next day, we had our entire team – six players and me – for game two. Dave, our team lead, won the game with a *pointless* answer (an answer that *none* of the 100 people surveyed gave). I tracked the scores again. There was some controversy: apparently, when given a short amount of time to name as many words ending in ‘erry’ as they could, 60 out of 100 members of the UK public responded with *loganberry* – compared to just 28 for *cranberry*. Who knew the loganberry was so widely appreciated?

Player | Answer | Score ↓ |
---|---|---|

Dave | Huckleberry | 0 |

Rob | Lingonberry | 2 |

David | Cloudberry | 5 |

Owen | Loganberry | 60 |

Lana | Blackberry | 84 |

Ipek | Strawberry | 92 |

After our third game, two things became clear. Firstly, we were all enjoying spending 10 minutes of our afternoons doing something a bit silly and getting competitive. Secondly – and more importantly – if this were to continue, we’d need some kind of long-term scoring metric.

## Measuring Pointless Things

So, *who should be crowned our Pointless champion?* To date, we’ve played 16 games. I’m going to explore the five metrics that we came up with to answer this question.

#### Metric #1: Games Won

To begin with, we started referring to the person who had won the most games as ‘the person who was doing best’. It’s a nice, simple way to measure things. After 16 games, Ipek leads the pack on *Games Won*. Our top 3 looks like this:

Player | Games Won ↑ |
---|---|

Ipek | 4 |

Rob | 3 |

Dave | 3 |

#### Metric #2: Win Rate

Sadly, not every team member is able to play every day! This means that some players have played more games than others. *Games Won* doesn’t account for this, and so it doesn’t reward players who have won lots of games despite not having played many. I decided to calculate a *Win Rate* for each player: the proportion of the games they played that they won. After 16 games, Dave knocks Ipek off the top spot, having won 38% of the games he played:

Player | Games Played | Games Won | Win Rate ↑ |
---|---|---|---|

Dave | 8 | 3 | 38% |

Ipek | 12 | 4 | 33% |

Rob | 10 | 3 | 30% |

#### Metric #3: Mean Rank Score

The trouble with both win-based measures is that the person coming second receives no recognition. You could come second out of six in every game and you’d still end up at the bottom of the table. To try to capture the overall performance, we gave each player a *Rank Score* for each game. The player with the best answer is given a *Rank Score* of 0, the player with the worst answer is given a *Rank Score* of 100, and the players in between are given a *Rank Score* dependent on the total number of players in that game. Over a number of games, we can then calculate the average of each player’s *Rank Scores* to see where they tend to end up in the rankings, from 0-100.

Let’s take the first 2 games described above as an example. In the first game Dave has the worst answer and is given a *Rank Score* of 100. However, in the second game he has the best answer, so he’s given a *Rank Score* of 0. His *Mean Rank Score*, after 2 games, is 50:

Player | Game 1: European capitals | Game 2: _erry words | Mean Rank Score ↓ | ||||
---|---|---|---|---|---|---|---|

Answer | Score | Rank Score | Answer | Score | Rank Score | ||

Rob | Bucharest | 4 | 50 | Lingonberry | 2 | 20 | 35 |

David | – | – | – | Cloudberry | 5 | 40 | 40 |

Dave | Riga | 8 | 100 | Huckleberry | 0 | 0 | 50 |

Ipek | Ljubljana | 1 | 0 | Strawberry | 92 | 100 | 50 |

Owen | – | – | – | Loganberry | 60 | 60 | 60 |

Lana | Reykjavik | 4 | 50 | Blackberry | 84 | 80 | 65 |

*Mean Rank Score*calculation after 2 games

Taking this approach for all 16 games played so far starts to build a nice *Mean Rank Score* picture in the table below:

Player | Games Played | Mean Rank Score ↓ |
---|---|---|

Rob | 10 | 34 |

David | 11 | 36 |

Dave | 8 | 48 |

Let’s try to extract some meaning from these numbers. Remembering that the median player in each game would be given a *Rank Score* of 50, Dave’s *Mean Rank Score* of 48 indicates that, on average, he just about makes the top half each game. Rob, who leads on this metric, tends to finish around the border of the top third of players each game.

#### Metric #4: Mean Standardised Score

Our next metric was probably the most over-the-top yet crucial metric. Taking a look at the results for game 2 – the ‘erry’ words – we can see that there were two ‘clusters’ of scores. The best answers scored 0, 2 and 5, while the worst answers scored 60, 84 and 92. However, our *Rank Score* approach considers each of these players to be evenly distributed – that is, it thinks a score of 5 (David’s *cloudberry*) sits halfway between a score of 2 (*lingonberry*) and a score of 60 (*loganberry*). But, if you take a look at the scores, it’s easy to see that *cloudberry*’s 5 deserves more credit than that.

Why not just take a player’s mean raw score across all their games? Well, different games have different magnitudes of scores. If somebody missed a game in which most of the possible answers were low scorers, or vice versa, they’d be immediately disadvantaged. For example, Owen and David didn’t take part in game 1, which had loads of good answers available, and therefore didn’t have the opportunity to get as low a mean score as the others. We want an approach that *doesn’t* consider each player to be evenly distributed, but *does* account for the magnitude of scores in each game. Enter *Mean Standardised Score*.

In a given game we want to know, in essence, how well or badly each player performed relative to the other players – we want to *standardise* each player’s score. This is a two-step process. Firstly, how far above or below the mean score were they? Secondly, how did that compare to how far away everyone else was? If a score was 20 above the mean but on the whole scores were close together then we want to penalise that, whereas if a score was 20 above the mean and on the whole scores were spread out then we don’t want to penalise that as heavily.

In game 1, the mean score was 4.3 and the standard deviation of those scores (how spread out they were) was 2.9. Using Dave’s score of 8 as an example, we can firstly see how far from the mean score he was by subtracting the mean from his score (8 – 4.3 = 3.7). We can then factor in how spread out the scores were on the whole by dividing by the standard deviation (3.7 / 2.9). This gives us Dave’s *Standardised Score* (for game 1) as 1.28, which indicates that his score was 1.28 standard deviations higher than the mean. Ipek’s *Standardised Score*, -1.14, indicates that her score was 1.14 standard deviations lower than the mean (‘better than average’).

Over a number of games, we can then calculate the mean of each player’s *Standardised Scores* to see how their answers tend to perform relative to the mean score in each game. A *Mean Standardised Score* of 0 would indicate that the player tends to come close to the mean score in each game (or does badly as equally as they do well) and a negative *Mean Standardised Score* indicates that the player tends to outperform the average. Let’s take the first 2 games described above as an example. (Note: there has been some deliberate premature rounding to simplify the example here!)

Player | Game 1: European capitals (μ = 4.3, σ = 2.9) | Game 2: _erry words (μ = 40.5, σ = 43.1) | Mean Standardised Score ↓ |
||||
---|---|---|---|---|---|---|---|

Answer | Score | Standardised Score | Answer | Score | Standardised Score | ||

David | – | – | – | Cloudberry | 5 | -0.82 | -0.82 |

Rob | Bucharest | 4 | -0.10 | Lingonberry | 2 | -0.89 | -0.50 |

Ipek | Ljubljana | 1 | -1.14 | Strawberry | 92 | 1.19 | 0.05 |

Dave | Riga | 8 | 1.28 | Huckleberry | 0 | -0.94 | 0.17 |

Owen | – | – | – | Loganberry | 60 | 0.45 | 0.45 |

Lana | Reykjavik | 4 | -0.10 | Blackberry | 84 | 1.01 | 0.46 |

*Mean Standardised Score*calculation after 2 games

We can see that the *Standardised Score* given to David’s *cloudberry* is much better than that given to Owen’s *loganberry*, solving the issue with the *Mean Rank Score*.

Applying this to all 16 games played so far gives us yet another top 3 in the table below. David tops the board with a *Mean Standardised Score* of -0.43: on average, he scores half a standard deviation below the mean each game.

Player | Games Played | Mean Standardised Score ↓ |
---|---|---|

David | 11 | -0.43 |

Rob | 10 | -0.28 |

Owen | 13 | -0.16 |

One thing to note about the *Mean Standardised Score* is that it really punishes incorrect answers – which score 100 – as typically 100 will be several standard deviations higher than the mean. On the other hand, it rewards particularly good answers: if you win by a comfortable margin, it will reflect that better than the *Mean Rank Score* would have done.

#### Metric #5: Points

But, as the founder of the modern Olympics once said, *the most important thing is not winning but taking part*. We wanted to reward participation too, and so a points-based approach was suggested. For each game they play, a player receives a number of points. The number of points is determined by their rank in that game: if they got the best score, they receive 5 points; if they got the second best score, they receive 4 points, and so on. We then add up each player’s points from all the games they’ve played, which captures both performance and participation. Our *Points* approach puts Lana, our web analyst, top of the leaderboard:

Player | Points ↑ |
---|---|

Lana | 43 |

David | 42 |

Owen | 40 |

## Rules of Measurement

So who should be crowned our Pointless champion? Well, (almost) everyone, depending on which way you look at it:

Metric | Champion |
---|---|

Games Won | Ipek |

Win Rate | Dave |

Mean Rank Score | Rob |

Mean Standardised Score | David |

Points | Lana |

There are many other things that we haven’t considered, such as controlling for the advantage (or disadvantage) that the order of responding may have, or calculating each player’s rank amongst all the possible correct answers.

But this isn’t intended to be the *definitive* guide to long-term scoring in remote games of Pointless. Whilst it has been a lot of silly fun, there are a couple of important takeaways that are immediately apparent:

- Deciding how to measure something – whether that be success, failure, or some scale thereof – is
*not*always a simple task. Some would say that you need to decide how you’ll measure the success of an activity*before*you undertake it, as you won’t be able to measure objectively if you wait until the activity has started. To some extent, this is true. However, as shown here, the intricacies and complications around measurement may not become apparent until you start. You can try to anticipate these complications but it won’t always be possible. It’s important to have an unbiased eye, somebody with no agenda, involved in defining those measurements. - With enough data, and enough motives, you can often spin your numbers to tell any story you want to tell (in fact, I decided to start writing this blog when we hit a point at which our five metrics each showed a different ‘champion’ – very meta). With careful selection of the data above, five of the six players could draw the conclusion that they ‘won’ and shout it from the rooftops. If the people receiving this conclusion didn’t do so with a critical eye, it would inevitably (in the majority of cases) lead to suboptimal decision-making somewhere down the line. It is crucial that recipients of data a) understand exactly how conclusions have been drawn and b) challenge those conclusions or methodologies appropriately.

So take care when you’re measuring things, you’re being given a measurement, or you’re being told about a measurement somebody else has received. And if your team is finding remote work lonely, get yourself a copy of a quiz book!

Update after 25 games: Lana has just won her first game on a music/sport dual-topic round. Owen has clinched the lead on the Points metric. David is now champion across the other 4 metrics, possibly bringing us the closest we’ve been yet to a definitive answer to our question…