# Political polls – why do they work – or don’t

This is written in the week before the 2017 New Zealand General Election and it is an exciting time. Many New Zealanders are finding political polls fascinating right now. We wait with bated breath for each new announcement – is our team winning this time? If it goes the way we want, we accept the result with gratitude and joy. If not, then we conclude that the polling system was at fault.

Many wonder how on earth asking 1000 people can possibly give a reading of the views of all New Zealanders. This is not a silly question. I have only occasionally been polled, so how can I believe the polls reflect my view? As a statistical communicator, I have given some thought to this. If you are a statistician or a teacher of statistics, how would you explain that inference works?

Here is my take on it.

## A bowl of seeds

Imagine you have a bowl of seeds – mustard and rocket. All the seeds are about the same size, and have been mixed up. These seeds are TINY, so several million seeds only fill up a large bowl. We will call this bowl the population. Let’s say for now that the bowl contains exactly half and half mustard and rocket, and you suspect that to be the case, but you do not know for sure.

Say you take out 10 seeds. The most likely result is that you will get 4,5 or 6 mustard seeds. There is a 65% chance, that that is what will happen. If you got any of those results, you would think that the bowl might be about half and half. You would be surprised if they were all mustard seeds. But it is possible that all ten seeds are the same. The probability of getting all mustard seeds or all rocket seeds from a bowl of half and half is about 0.002 or one chance in five hundred.

Now, if you draw out 1000 seeds, it is quite a different story. If all the 1000 seeds drawn out were mustard, you would justifiably conclude that the bowl is not half and half, and may in fact have no rocket seeds. But where do we draw the line? How likely is it to get 40% or less mustard from our 50/50 bowl? Well it is about one chance in 12,000. It is possible, but extremely unlikely – though not as unlikely as winning Lotto. We can see that the sample of 1000 seeds gives us a general idea of what is in the bowl, but we would never think it was an exact representation. If our sample was 51% mustard, we would not sensibly conclude that the seeds in the bowl were not half and half. In fact, there is only a 47% chance that we will get a sample of seeds that is between 49% and 51%.

## People are not seeds

Of course we know we are not little seeds, but people. In fact we like to think we are all special snowflakes.  (The scene from “Life of Brian” springs to mind. Brian – “You are all individuals”, crowd – “We are all individuals”, single response – “I’m not!”)

But the truth is that as a group we do act in surprisingly consistent ways. Every year as a university lecturer I tried new things to help my students to learn. And every year the pass rate was disappointingly consistent. I later devised a course that anyone could pass if they put the work in. They could keep resitting the tests until they passed. And the pass rate stayed around the same.

People do tend to act in similar ways. So if one person changes their viewpoint, there is a pretty good chance that others will have also. So long as we are aware of the limitations in precision, samples are good indicators of the populations from which they are drawn.

I have described why polls generally work. The media tends to dwell on the times that they fail, so let’s look at why that may be.

## Sampling error

Sometimes the poll may just be the one that takes an unlikely sample.  There is a one in a thousand chance that ten seeds from my bowl will all be mustard – and a one in a thousand chance that all will be rocket. It is not very likely, but it can happen. Similarly there is a teeny chance that we will get a result of less than 45% or more than 55% when we take out 1000 seeds. Not likely, but possible. This is called sampling error, and that is what the margin of error is about. Political polls in NZ generally take a sample of 1000 people, which leads to a margin of error of about 3%. What margin of error means is that we can make an interval of 3% either side of the estimate and be pretty sure that it encloses the real value from the population. So if a poll says 45% following for the Mustard Party, then we can be pretty sure that the actual following back in the population is between 42% and 48%. And what does “pretty sure” mean? It means that about one time in twenty we will get it wrong and the actual following, back in the population is outside that range. The problem is we NEVER know if this is the right one or the wrong one.  (Though I personally choose to decide that the polls that I don’t like are the wrong ones. ;))

Non-sampling error and bias

There are other problems – known as non-sampling error. I wrote a short post on it previously.

And this is where the difference between seeds and people becomes important. Some issues are:

When we take a handful of seeds from a well-mixed up bowl, every seed really does have an equal chance of being selected. But getting such a sample from the population of New Zealand is much more difficult. When landlines were in most homes, a phone poll could be a pretty representative sample. However, these days many people have only mobile phones, and which means they are less likely to be called. This would not be a problem if there were no differences politically between landline holders and others. I think most people would see that younger people are less likely to be polled than older, if landlines are used, and younger people quite possibly have different political views. Good polling companies are aware of this and use quota sampling and other methods to try to mitigate this.

The wording of the question and the order of questions can affect what people say. You can usually find out what question has been asked in a particular poll, and it should be reported as part of the report.

Unlike seeds, people do not always show their true colours. If a person is answering a poll within earshot of another family member, they may give a different answer to what they actually tick on election day. Some people are undecided, and may change their mind in the booth. Undecided voters are difficult to account for in statistics, as an undecided voter swinging between two possible coalition partners will have a different impact from a person who has not opinion or may vacillate wildly.

## When the poll is held

In a volatile political environment like the one we are experiencing in New Zealand, people can change their mind from day to day as new leaders emerge, scandals are uncovered, and even in response to reporting of political polls. The results of a poll can be affected by the day and time that the questions were asked.

## Can you believe a poll?

On balance, polls are a blunt instrument, that can give a vague idea about who people are likely to  vote for. They do work, within their limitations, but the limitations are fairly substantial. We need to be sceptical of polls, and bear in mind that the margin of error only  deals with sampling error, not all the other sources of error and bias.

And as they say – the only truly correct poll is the one on Election Day.

# Understanding Statistical Inference

Inference is THE big idea of statistics. This is where people come unstuck. Most people can accept the use of summary descriptive statistics and graphs. They can understand why data is needed. They can see that the way a sample is taken may affect how things turn out. They often understand the need for control groups. Most statistical concepts or ideas are readily explainable. But inference is a tricky, tricky idea. Well actually – it doesn’t need to be tricky, but the way it is generally taught makes it tricky.

## Procedural competence with zero understanding

I cast my mind back to my first encounter with confidence intervals and hypothesis tests. I learned how to calculate them (by hand  – yes I am that old) but had not a clue what their point was. Not a single clue. I got an A in that course. This is a common occurrence. It is possible to remain blissfully unaware of what inference is all about, while answering procedural questions in exams correctly.

But, thanks to the research and thinking of a lot of really smart and dedicated statistics teachers, we are able put a stop to that. And we must.

We need to explicitly teach what statistical inference is. Students do not learn to understand inference by doing calculations. We need to revisit the ideas behind inference frequently. The process of hypothesis testing, is counter-intuitive and so confusing that it spills its confusion over into the concept of inference. Confidence intervals are less confusing so a better intermediate point for understanding statistical inference. But we need to start with the concept of inference.

# What is statistical inference?

The idea of inference is actually not that tricky if you unbundle the concept from the application or process.

The concept of statistical inference is this –

We want to know stuff about a large group of people or things (a population). We can’t ask or test them all so we take a sample. We use what we find out from the sample to draw conclusions about the population.

That is it. Now was that so hard?

# Developing understanding of statistical inference in children

I have found the paper by Makar and Rubin, presenting a “framework for thinking about informal statistical inference”, particularly helpful. In this paper they summarise studies done with children learning about inference. They suggest that “ three key principles … appeared to be essential to informal statistical inference: (1) generalization, including predictions, parameter estimates, and conclusions, that extend beyond describing the given data; (2) the use of data as evidence for those generalizations; and (3) employment of probabilistic language in describing the generalization, including informal reference to levels of certainty about the conclusions drawn.” This can be summed up as Generalisation, Data as evidence, and Probabilistic Language.

We can lead into informal inference early on in the school curriculum. The key Ideas in the NZ curriculum suggest that “ teachers should be encouraging students to read beyond the data. Eg ‘If a new student joined our class, how many children do you think would be in their family?’” In other words, though we don’t specifically use the terms population and sample, we can conversationally draw attention to what we learn from this set of data, and how that might relate to other sets of data.

When teaching adults we may use a more direct approach, explaining explicitly, alongside experiential learning to understanding inference. We have just completed made a video: Understanding Inference. Within the video we have presented three basic ideas condensed from the Five Big Ideas in the very helpful book published by NCTM, “Developing Essential Understanding of Statistics, Grades 9 -12”  by Peck, Gould and Miller and Zbiek.

## Ideas underlying inference

• A sample is likely to be a good representation of the population.
• There is an element of uncertainty as to how well the sample represents the population
• The way the sample is taken matters.

These ideas help to provide a rationale for thinking about inference, and allow students to justify what has often been assumed or taught mathematically. In addition several memorable examples involving apples, chocolate bars and opinion polls are provided. This is available for free use on YouTube. If you wish to have access to more of our videos than are available there, do email me at n.petty@statslc.com.