Social data is a very rich source that our Intelligence team often use to uncover insights for our Strategy and Studio teams. However, as with any form of research methodology, there are sampling constraints to consider in order to design our approach accordingly.
What is Research Sampling?
As per the Cambridge Dictionary, a sample is “a small amount of something that shows you what the rest is, or should be like”. It’s “a group of people or things that is chosen out of a larger number and is asked questions or tested in order to get information about the larger group”.
Let’s say we’re generating a campaign for the new Tesla and we want to know the percentage of people in London who own a car. In a perfect world, we would contact every person in London, ask them if they own a car, then grab our calculator and figure out the answer. However, the London population in 2019 is estimated to be 9,176,530 people. It would be a tedious and expensive process. The general lesson here is that the audience we’re interested in researching is usually large and it’s either impractical or impossible to gather data from everyone. Sampling is a way of reducing the burden of the analysis. It lets us analyse a fraction of the population and make inferences about the audience based on this fraction.
Assumptions are the Constraints of Sampling
As an aside, it’s worth noting that the data we get from social listening platforms is already a sample as not all of the data is released through the API. We can’t do anything about this, so we have to assume that API data is representative of all the total posts.
Sometimes social sampling mirrors research sampling…
In some instances, social sampling is based on the same premise as outlined above for research sampling. That is, if the volume of data exceeds our listening platform limits, then our goal is to take a sample that represents an underlying population. Here social sampling is just research sampling with a novel data source.
However, even when their premises align, social differs from traditional research in that there are often multiple layers of sampling. Extra care is required when we write multiple queries, themes, topics etc. as we are then sampling a sample. This is because, as we apply more lenses, we change the definition of the underlying population that our sample is representative of.
Sometimes it doesn’t…
In other instances, the population we’re interested in is a needle in a haystack. In such cases, we’re using ‘sampling’ not to make inferences, but to strip away all of the data we don’t care about. Now, it’s not about ensuring that our sample is representative of a broader population – it’s about writing effective queries, themes, topics, and filters to best capture the phenomenon being researched. Essentially, the sample is the population.
Conclusion & Best Practices
In summary, how you should structure your research depends on the type of analysis you are looking to undertake.
If you are studying a broad topic with massive volumes of data, you need to make sure you have a clear process in mind for how you are going to sample the data and make inferences. Avoid overlapping boundaries unnecessarily, i.e. location, gender, subtopics, as it becomes increasingly difficult to establish sample validity.
If you have a naturally niche topic to study, the onus switches to making sure your queries, API calls, and filters are best to capture your topic to ensure you conduct your analysis on the most accurate sample of data available.
There are some really rare cases where more creative solutions to sampling social data are required, but we hope to cover these cases and the maths behind the method in later blog posts. In the meantime, please don’t hesitate to get in touch with us by email if you have any questions about social sampling in research.