September 2018 – Grinding Gears

It would be great if we knew exactly what our customers thought so that we could adapt our products and improve our performance. However, it isn’t possible to read the minds of our customers, so surveys are a popular alternative to understand their attitudes.

Yet, if the answers recorded on surveys don’t truly reflect the attitudes of the customers, how can we really know how to improve? This blog post will try to resolve this problem by focusing on the design of unbiased questions and is the second half of a two-part series that discusses the potential for bias in consumer-based surveys.

Querying the questions

One of the most important parts of questionnaire design is recognising exactly what we want to find out. It sounds obvious but if a specific answer is desired, a specific question needs to be asked.

Let’s return to the scenario from my previous blog post about bias. The data science team made some dried aubergines and we were interested in improving our recipe based on the views of people who tried them. We gave them out for people to try at a company-wide FreeAgent recruitment event and asked people who tried them if they would be willing to take part in a telephone survey. Once we had some willing participants, we had to figure out the best way of obtaining people’s opinions using a survey!

There are many types of biased question that could be included into the survey, which we wanted to avoid. Imagine we start the survey with the following question:

How much did you enjoy the FreeAgent recruitment event and our aubergines?

This is an example of a double barrelled question — one question that requires two separate answers¹–³. Imagine if people loved the FreeAgent event so much that they forgot to answer anything about our aubergines? We could improve this question by asking a separate question about the FreeAgent event, and keeping this question simple:

How much did you enjoy our aubergines?

However… This is an example of a free-text question — a question that allows an open response, which can be problematic if we require an answer in a particular format or with a certain level of detail for analysis purposes¹–³. People might use different language to describe the same feelings — enthusiastic Eddie might say the aubergines were ‘exquisitely tasty’ while serious Sally might say they were ‘very good’, which makes their answers difficult to analyse. We could improve this question by deciding upon some set, scaled answers to give the participants which allows us to obtain the level of detail that we require, on a standardised scale:

How much did you enjoy our aubergines?

Somewhat disliked
Neither like nor disliked
Somewhat liked
Strongly liked

However… This is an example of a leading question with unbalanced answers — a question that encourages the participant to answer in a particular way, by presuming an attitude is true prior to asking the question and/or by not providing balanced answer options¹–³. Although we (obviously) love the taste of aubergines, we are presuming that everyone else does too! We could improve this question by removing the assumption that the participants enjoyed our aubergines and by allowing them to give answers at both extremes of the scale:

How did you feel about our aubergines?

Strongly disliked
Somewhat disliked
Neither like nor disliked
Somewhat liked
Strongly liked

However… This is an example of an ambiguous question — a question that is not clear about the exact piece of information you require and could be interpreted in different ways¹–³. Although it is clear to us that we want to know if people enjoyed the taste of our aubergines, this question could have a multitude of different interpretations: are we asking if they thought the aubergines were well presented, if they were served with a smile, if making them was a waste of valuable company time or if they tasted nice? We should make this a more specific question: We could improve this question by making it explicit that we want to know about the taste of our aubergines:

Ambiguous questions can lead to confusing answers

How did you feel about the taste of our aubergines?

Strongly disliked
Somewhat disliked
Neither like nor disliked
Somewhat liked
Strongly liked

However… This is an example of a non-exhaustive question — a question that can have answers that are outside of the boundaries of the answers expected¹–³. People may not wish to answer this question, they might need to hang up the phone half way through or they might have not got chance to try the aubergines because they were so popular! We could improve this question by trying to consider answers that are ‘outside of the box’:

How did you feel about the taste of our aubergines?

Strongly disliked
Somewhat disliked
Neither like nor disliked
Somewhat liked
Strongly liked
Did not answer
NA (Did not try the aubergines e.g they were eaten too fast!)

Imagine we also wanted to know how likely someone would be to tell their friends about our amazing aubergines. We have taken on board all of the pitfalls of problematic questions and we come up with the following:

How unlikely would it be for you to not recommend our aubergines to a friend?

Very unlikely
Unlikely
Neutral
Likely
Very likely
Did not answer
NA (Did not try the aubergines e.g they were eaten too fast!)

However… This is an example of a double negative question — a question that uses two negative terms and can confuse the reader¹–³. People might answer that they were very likely to not recommend our aubergines when in fact they meant they were very likely to recommend them! We could improve this question by removing the negative aspects of it:

How likely would it be for you to recommend our aubergines to a friend?

Very unlikely
Unlikely
Neutral
Likely
Very likely
Did not answer
NA (Did not try the aubergines e.g they were eaten too fast!)

This example illustrates how easy it is to be tripped up when we are on our journey to good survey design. Considering the impact of our questions when we ask them is vital if we want to get answers that actually mean what we expect them to!

Analysing the answers

Now we have well-formulated questions, it is good practice to validate the survey, which allows us to have a look at the types of answer people might give. Validation takes place before the actual survey begins and allows us to adjust the original survey design to minimise unforeseen misunderstandings, problems and biases. However, it impossible to anticipate the variation in people’s answers and even when careful survey design has taken place, there is almost always a requirement for data cleaning before analysis. If you are struggling with data cleaning you might find my previous two blogs about common errors in survey data and techniques to deal with them useful.

References

UCDenver. (Not dated). Examples Of Bad Questions & Suggestions Of How To Fix Them! UCDenver. Available from: http://www.ucdenver.edu/academics/colleges/SPA/FacultyStaff/Faculty/Documents/Callie%20Rennison%20Documents/example%20of%20bad%20survey%20questions.pdf. Accessed 17th September 2018.
Choi, B. C. K., & Pak, A. W. P. (2005). A Catalog of Biases in Questionnaires. Preventing Chronic Disease, 2(1), A13.
Kalton, G., & Schuman, H. (1982). The Effect of the Question on Survey Responses: A Review. Journal of the Royal Statistical Society. Series A (General), 145(1), 42–73.

During my time at FreeAgent, I have been analysing attitudinal customer survey data to predict their behaviour. Getting to the bottom of how exactly this data was collected has helped me to understand the data and has given me a few ideas about how the data could be collected in the future. This blog focuses on how we choose who is selected to take part in surveys: a process is known as ‘sampling’.

Determining the definition

The Merriam-Webster dictionary defines the process of sampling as:

“…selecting a representative part of a population for the purpose of determining parameters or characteristics of the whole population¹”

Therefore, a ‘sample’, by definition, is never a perfect representation of the population that we are truly interested in. The only perfectly representative sample would be the entire population at a given time, which is not a ‘sample’ at all! During survey sampling, we are artificially sieving out individuals that are accessible and willing to give us information, which inevitably leads to bias. This virtual sieve has become finer in recent times due to increasingly stringent data protection laws and a reduction in the number of people that are contactable². It is more important than ever that we acknowledge that each survey has its own susceptibilities for sample bias. Knowing about this allows us to draw appropriate conclusions from our results and might influence how we design future surveys.

Evaluating an example

We have discussed what sampling means but how does it work in real life? Let’s consider a food-based survey scenario (always my favourite type of survey!).

Recently, the FreeAgent data science team have made the most out of the fantastic hot weather by making dried aubergines in the sun on the balcony. We liked our aubergines so much that we were interested in getting people’s opinions about them to see how we can improve in future!

Here, our ‘target population’ is everyone who eats aubergines in the UK. However, we don’t have any money for our survey and we only very limited time before the British summer ends and we can no longer make our aubergines! We also need to make sure people get the chance to actually try our aubergines before they take the survey, which is a significant practical restraint. What sampling method shall we use to best suit our needs?

Our dried aubergines with mozzarella and tomato!

Mulling over the methods

The following sampling methods are an introduction to some of the most common used for surveys. This is not an exclusive list and there are many different variations and combinations used, depending on time, legal, financial and practical constraints. Let’s find out if we could apply any of these methods to our aubergine survey!

Probability sampling methods involve selecting a sample at random, such as forming a team by pulling names out of a hat³. There are several types:

Simple random sampling — involves randomly selecting individuals for the sample from the target population so that they have an equal chance of being selected⁴.

Pros:

Easy
Generally fair representation of the target population

Cons:

Can lead to under-representation of certain subgroups (e.g. sex, race)
Not always practical

Example: Obtain a list of everyone who eats aubergines in the UK and use a computer to generate a random sample the goal number of people.

Method evaluation: Not practically feasible due to practical, financial, time and probably legal restraints!

Systematic random sampling — involves randomly ordering individuals and selecting them in equal intervals (i.e. choosing every 5th individual)⁴.

Pros

Easy
Generally fair representation of the target population
Can save time in certain sampling situations (e.g. seated individuals)

Cons

Can lead to under-representation of certain subgroups (e.g. sex, race)
Not always practical

Example: Obtain a list of everyone who eats aubergines in the UK, shuffle the list and choose the goal number of people at equal intervals along the list to sample.

Method evaluation: Same issues as simple random sampling.

Stratified random sampling — involves dividing the target population into groups and randomly selecting individuals from each group⁴.

Pros

Ensures equal representation of subgroups (e.g. sex, race)
Generally fair representation of the target population
May reduce variation in the sample

Cons

Difficult to decide how to divide into groups
Not always practical

Example: Obtain a list of everyone who eats aubergines in the UK, divide these into groups of people that we know perceive taste differently based on research (e.g. sex, ethnicity, age, smoking status) and use a computer to generate a random sample within each group, totalling the goal number of individuals.

Method evaluation: Same issues as simple random sampling, with the added complexity of obtaining a considerable amount of personal information.

Clustered (AKA area) random sampling — involves involves dividing the target population into groups (usually by geographical area) and randomly selecting some groups (all individuals within these groups are included)⁴.

Pros

Can save time/money in certain situations

Cons

Can lead to under-representation of certain subgroups (e.g. sex, race)
Less fair representation of the target population
Not always practical

Example: Obtain a list of everyone who eats aubergines in the UK, divide these into groups of people within geographical locations (e.g. output area) and use a computer to randomly choose groups to sample.

Method evaluation: Same issues as simple random sampling, with the added complexity of obtaining geographical information.

Non-probability sampling methods involve selecting a sample but not at random, such as for a specific purpose³. There are several types:

Convenience sampling — involves selecting individuals that are practically easy to reach⁴.

Pros

Easy
Practical and saves money

Cons

Unlikely to be a fair representation of the target population

Example: Ask everyone in the FreeAgent office if they eat aubergines and select everyone who does to be in our sample.

Method evaluation: A practical solution but would not represent the opinions of the UK population and we might not be able to obtain a large enough sample.

Quota sampling — involves dividing the target population into groups and aiming to hit a target (quota) for a number/proportion of individuals within those groups to be included⁴.

Pros

More likely to be a fair representation of the target population
More practical than stratified random sampling

Cons

Difficult to decide how to divide into groups and assign quotas
Time consuming

Example: Obtain a list of everyone who eats aubergines in the UK, divide these into groups of people that we know perceive taste differently based on research (e.g. sex, ethnicity, age, smoking status) and sample the same number of people with each group, until the goal number of individuals is reached.

Method evaluation: Same issues as stratified random sampling.

Modal instance and heterogeneity sampling — opposite methods that involve selecting individuals that represent the majority or the extremes of the target population respectively⁴.

Pros

Good when it is only the majority or the extremes of the target population that we are interested in respectively

Cons

Only valid in specific situations
Difficult to decide which individuals represent the majority or extremes
Each method does not represent the variation or the average respectively
Not a fair representation of the wider population

Example: Obtain a list of everyone who eats aubergines in the UK and a score of how much they like eating them, then survey the people that give the most common score (modal instance) or the lowest and highest scores (heterogeneity sampling).

Method evaluation: Same issues as simple random sampling, with the added complexity of obtaining aubergine scores.

Expert sampling — involves selecting a panel of individuals that are experts in a specific topic⁴.

Pros

Good when it is only the opinion of experts that we are interested in

Cons

Only valid in specific situations
Not a fair representation of the wider population

Example: Research aubergine-eating food critics in the UK and include them in the sample.

Method evaluation: Same issues as convenience sampling with the added complexities that it probably wouldn’t be practical and aubergine experts might expect financial compensation for giving us their opinions!

Snowball sampling — involves selecting individuals that fit the inclusion criteria for your study and asking them to recommend other people for inclusion into the study⁴.

Pros

Practical and saves money
Good for obtaining individuals in hard to reach areas (e.g. homeless people)

Cons

Unlikely to be a fair representation of the target population
Time consuming

Example: Find a few people that eat aubergines that we know and include them in our sample, then ask them if they know any other people who eat aubergines and include them in our sample, until we have reached the goal number of individuals.

Method evaluation: Same issues as convenience sampling with the added risk that it would take a considerable amount of time to recruit a large enough sample.

Finally, multiple sampling methods involve a combination of two or more different types of sampling, usually in order to fit the practicalities for the requirements of the survey whilst aiming to obtain a more representative sample.

Being balanced about bias

For our aubergine survey scenario, imagine we chose convenience sampling because it was the only method practically feasible. In an attempt to make our sample a little more diverse, we took advantage of an upcoming company-wide FreeAgent recruitment event! Part of the event sign-up process involved asking people that came if they would agree to taking part in a telephone survey about the aubergines. We offered our aubergines to everyone at the event and recorded who tried them. We then telephoned everyone who had agreed to participate and had tried our aubergines and surveyed everyone who answered the phone. Let’s consider the different types of bias that our sample could be prone to:

Selection bias — the concept that individuals that are selected for a sample are not representative of the wider (population)⁵. E.g. were individuals that attended the FreeAgent recruitment event more likely to:

Be of a certain gender or sexuality?
Be of a particular nationality, culture or ethnic group?
Work in technology/accountancy rather than other employment sectors?
Have particular food preferences?

Volunteer bias — the concept that individuals that volunteer for surveys are different from those who do not⁵. E.g. were individuals that volunteered:

Of different personalities, nationalities, cultures, educational levels, social lives, backgrounds etc?
Likely to have an agenda that would influence their decision, such as potential job candidates?
People that we work closely with or friends?

Non-response bias — the concept that individuals who do not respond to a survey are different from those who do not⁵. E.g. did individuals that did not answer the telephone:

Have busier lives because they are at a different life stage (career, family etc)?
Have jobs in sectors where they work out of standard office hours?
Not want to take part in the survey because they hated our aubergines?
Not want to take part in the survey because they had a bad time at the event?

At first look, this seems like a whole lot of bias! Does this mean we have to scrap the entire survey and give up? Of course not! Being aware of these biases helps us draw balanced conclusions about the outcomes of the survey. For example, if we discovered that our sample mostly consisted of Scottish, white males who work in tech, it would not be possible to claim that our survey demonstrates the attitudes of all people in the UK. However, this survey could still help us understand how we could improve our aubergines for this particular audience and we might choose to target a different population of people the next time we conduct the survey. In my next blog, I will approach the subject of designing surveys themselves and consider how we can avoid different types of bias introduced by the way we ask questions and record answers.

References

Merriam-Webster. 2018. Sampling. Merriam-Webster. Available from: https://www.merriam-webster.com/dictionary/sampling. [Accessed 22 August 2018].
GOV. UK. 2018. Data Protection Act 2018. Crown Copyright. https://www.gov.uk/government/collections/data-protection-act-2018. [Accessed 22 August 2018].
Web Centre for Social Research Methods. 2006. Sampling. William M.K. Trochim, https://socialresearchmethods.net/kb/sampling.php. [Accessed 22 August 2018].
Labrakas, P. J. 2008. Encyclopaedia of survey research methods. SAGE publications LTD. London, UK.
Sedgwick, P. 2013. Questionnaire surveys: sources of bias. BMJ; 347 :f5265.

Grinding Gears

Tales of code crunching from the FreeAgent Engineering team

Accurately ascertaining attitudes: designing unbiased survey questions

Querying the questions

Analysing the answers

References

Sourcing a suitable sample: understanding selection bias in survey data

Determining the definition

Evaluating an example

Mulling over the methods

Being balanced about bias

References

We're totally hiring!