# Political polls – why do they work – or don’t

This is written in the week before the 2017 New Zealand General Election and it is an exciting time. Many New Zealanders are finding political polls fascinating right now. We wait with bated breath for each new announcement – is our team winning this time? If it goes the way we want, we accept the result with gratitude and joy. If not, then we conclude that the polling system was at fault.

Many wonder how on earth asking 1000 people can possibly give a reading of the views of all New Zealanders. This is not a silly question. I have only occasionally been polled, so how can I believe the polls reflect my view? As a statistical communicator, I have given some thought to this. If you are a statistician or a teacher of statistics, how would you explain that inference works?

Here is my take on it.

## A bowl of seeds

Imagine you have a bowl of seeds – mustard and rocket. All the seeds are about the same size, and have been mixed up. These seeds are TINY, so several million seeds only fill up a large bowl. We will call this bowl the population. Let’s say for now that the bowl contains exactly half and half mustard and rocket, and you suspect that to be the case, but you do not know for sure.

Say you take out 10 seeds. The most likely result is that you will get 4,5 or 6 mustard seeds. There is a 65% chance, that that is what will happen. If you got any of those results, you would think that the bowl might be about half and half. You would be surprised if they were all mustard seeds. But it is possible that all ten seeds are the same. The probability of getting all mustard seeds or all rocket seeds from a bowl of half and half is about 0.002 or one chance in five hundred.

Now, if you draw out 1000 seeds, it is quite a different story. If all the 1000 seeds drawn out were mustard, you would justifiably conclude that the bowl is not half and half, and may in fact have no rocket seeds. But where do we draw the line? How likely is it to get 40% or less mustard from our 50/50 bowl? Well it is about one chance in 12,000. It is possible, but extremely unlikely – though not as unlikely as winning Lotto. We can see that the sample of 1000 seeds gives us a general idea of what is in the bowl, but we would never think it was an exact representation. If our sample was 51% mustard, we would not sensibly conclude that the seeds in the bowl were not half and half. In fact, there is only a 47% chance that we will get a sample of seeds that is between 49% and 51%.

## People are not seeds

Of course we know we are not little seeds, but people. In fact we like to think we are all special snowflakes.  (The scene from “Life of Brian” springs to mind. Brian – “You are all individuals”, crowd – “We are all individuals”, single response – “I’m not!”)

But the truth is that as a group we do act in surprisingly consistent ways. Every year as a university lecturer I tried new things to help my students to learn. And every year the pass rate was disappointingly consistent. I later devised a course that anyone could pass if they put the work in. They could keep resitting the tests until they passed. And the pass rate stayed around the same.

People do tend to act in similar ways. So if one person changes their viewpoint, there is a pretty good chance that others will have also. So long as we are aware of the limitations in precision, samples are good indicators of the populations from which they are drawn.

I have described why polls generally work. The media tends to dwell on the times that they fail, so let’s look at why that may be.

## Sampling error

Sometimes the poll may just be the one that takes an unlikely sample.  There is a one in a thousand chance that ten seeds from my bowl will all be mustard – and a one in a thousand chance that all will be rocket. It is not very likely, but it can happen. Similarly there is a teeny chance that we will get a result of less than 45% or more than 55% when we take out 1000 seeds. Not likely, but possible. This is called sampling error, and that is what the margin of error is about. Political polls in NZ generally take a sample of 1000 people, which leads to a margin of error of about 3%. What margin of error means is that we can make an interval of 3% either side of the estimate and be pretty sure that it encloses the real value from the population. So if a poll says 45% following for the Mustard Party, then we can be pretty sure that the actual following back in the population is between 42% and 48%. And what does “pretty sure” mean? It means that about one time in twenty we will get it wrong and the actual following, back in the population is outside that range. The problem is we NEVER know if this is the right one or the wrong one.  (Though I personally choose to decide that the polls that I don’t like are the wrong ones. ;))

Non-sampling error and bias

There are other problems – known as non-sampling error. I wrote a short post on it previously.

And this is where the difference between seeds and people becomes important. Some issues are:

When we take a handful of seeds from a well-mixed up bowl, every seed really does have an equal chance of being selected. But getting such a sample from the population of New Zealand is much more difficult. When landlines were in most homes, a phone poll could be a pretty representative sample. However, these days many people have only mobile phones, and which means they are less likely to be called. This would not be a problem if there were no differences politically between landline holders and others. I think most people would see that younger people are less likely to be polled than older, if landlines are used, and younger people quite possibly have different political views. Good polling companies are aware of this and use quota sampling and other methods to try to mitigate this.

The wording of the question and the order of questions can affect what people say. You can usually find out what question has been asked in a particular poll, and it should be reported as part of the report.

Unlike seeds, people do not always show their true colours. If a person is answering a poll within earshot of another family member, they may give a different answer to what they actually tick on election day. Some people are undecided, and may change their mind in the booth. Undecided voters are difficult to account for in statistics, as an undecided voter swinging between two possible coalition partners will have a different impact from a person who has not opinion or may vacillate wildly.

## When the poll is held

In a volatile political environment like the one we are experiencing in New Zealand, people can change their mind from day to day as new leaders emerge, scandals are uncovered, and even in response to reporting of political polls. The results of a poll can be affected by the day and time that the questions were asked.

## Can you believe a poll?

On balance, polls are a blunt instrument, that can give a vague idea about who people are likely to  vote for. They do work, within their limitations, but the limitations are fairly substantial. We need to be sceptical of polls, and bear in mind that the margin of error only  deals with sampling error, not all the other sources of error and bias.

And as they say – the only truly correct poll is the one on Election Day.

# Sampling error and non-sampling error

The subject of statistics is rife with misleading terms. I have written about this before in such posts as Teaching Statistical Language and It is so random. But the terms sampling error and non-sampling error win the Dr Nic prize for counter-intuitivity and confusion generation.

# Confusion abounds

To start with, the word error implies that a mistake has been made, so the term sampling error makes it sound as if we made a mistake while sampling. Well this is wrong. And the term non-sampling error (why is this even a term?) sounds as if it is the error we make from not sampling. And that is wrong too. However these terms are used extensively in the NZ statistics curriculum, so it is important that we clarify what they are about.

Fortunately the Glossary has some excellent explanations:

## Sampling Error

“Sampling error is the error that arises in a data collection process as a result of taking a sample from a population rather than using the whole population.

Sampling error is one of two reasons for the difference between an estimate of a population parameter and the true, but unknown, value of the population parameter. The other reason is non-sampling error. Even if a sampling process has no non-sampling errors then estimates from different random samples (of the same size) will vary from sample to sample, and each estimate is likely to be different from the true value of the population parameter.

The sampling error for a given sample is unknown but when the sampling is random, for some estimates (for example, sample mean, sample proportion) theoretical methods may be used to measure the extent of the variation caused by sampling error.”

## Non-sampling error:

“Non-sampling error is the error that arises in a data collection process as a result of factors other than taking a sample.

Non-sampling errors have the potential to cause bias in polls, surveys or samples.

There are many different types of non-sampling errors and the names used to describe them are not consistent. Examples of non-sampling errors are generally more useful than using names to describe them.

And it proceeds to give some helpful examples.

These are great definitions, and I thought about turning them into a diagram, so here it is:

Table summarising types of error.

And there are now two videos to go with the diagram, to help explain sampling error and non-sampling error. Here is a link to the first:

One of my earliest posts, Sampling Error Isn’t, introduced the idea of using variation due to sampling and other variation as a way to make sense of these ideas. The sampling video above is based on this approach.

Students need lots of practice identifying potential sources of error in their own work, and in critiquing reports. In addition I have found True/False questions surprisingly effective in practising the correct use of the terms. Whatever engages the students for a time in consciously deciding which term to use, is helpful in getting them to understand and be aware of the concept. Then the odd terminology will cease to have its original confusing connotations.

# The importance of being wrong

## We don’t like to think we are wrong

One of the key ideas in statistics is that sometimes we will be wrong. When we report a 95% confidence interval, we will be wrong 5% of the time. Or in other words, about 1 in 20 of 95% confidence intervals will not contain the population parameter we are attempting to estimate. That is how they are defined. The thing is, we always think we are part of the 95% rather than the 5%. Mostly we will be correct, but if we do enough statistical analysis, we will almost definitely be wrong at some point. However, human nature is such that we tend to think it will be someone else. There is also a feeling of blame associated with being wrong. The feeling is that if we have somehow missed the true value with our confidence interval, it must be because we have made a mistake. However, this is not true. In fact we MUST be wrong about 5% of the time, or our interval is too big, and not really a 95% confidence interval.

The term “margin of error” appears with increasing regularity as elections approach and polling companies are keen to make money out of sooth-saying. The common meaning of the margin of error is half the width of a 95% confidence interval. So if we say the margin of error is 3%, then about one time in twenty, the true value of the proportion will actually be more than 3% away from the reported sample value.

What doesn’t help is that we seldom do know if we are correct or not. If we knew the real population value we wouldn’t be estimating it. We can contrive situations where we do know the population but pretend we don’t. If we do this in our teaching, we need to be very careful to point out that this doesn’t normally happen, but does in “classroom world” only. (Thanks to MD for this useful term.) General elections can give us an idea of being right or wrong after the event, but even then the problem of non-sampling error is conflated with sampling error. When opinion polls turn out to miss the mark, we tend to think of the cause as being due to poor sampling, or people changing their minds, or all number of imaginative explanations rather than simple, unavoidable sampling error.

So how do we teach this in such a way that it goes beyond school learning and is internalised for future use as efficient citizens?

## Teaching suggestions

I have two suggestions. The first is a series of True/False statements that can be used in a number of ways. I have them as part of on-line assessment, so that the students are challenged by them regularly. They could be well used in the classroom as part of a warm-up exercise at the start of a lesson. Students can write their answers down or vote using hands.

Here are some examples of True/False statements (some of which could lead to discussion):

1. You never know if your confidence interval contains the true population value.
2. If you make your confidence interval wide enough you can be sure that you contain the true population value.
3. A confidence interval tells us where we are pretty sure the sample statistic lies.
4. It is better to have a narrow confidence interval than a wide one, as it gives us more certain information, even though it is more likely to be wrong.
5. If your study involves twenty confidence intervals, then you know that exactly one of them will be wrong.
6. If a confidence interval doesn’t contain the true population value, it is because it is one of the 5% that was calculated incorrectly.

## Experiential exercise

The other teaching suggestion is for an experiential exercise. It requires a little set up time.

Make a set of cards for students with numbers on them that correspond to the point estimate of a proportion, or a score that will lead to that. (Specifications for a set of 35 cards representing the results from a proportion of 0.54 and 25 trials is given below).

Introduce the exercise as follows:
“I have a computer game, and have set the ratio of wins to losses at a certain value. Each of you has played 25 times, and the number of wins you have obtained will be on your card. It is really important that you don’t look at other people’s cards.”

Hand them out to the students. (If you have fewer than 35 in your class, it might be a good idea to make sure you include the cards with 8 and 19 in the set you use – sometimes it is ok to fudge slightly to teach a point.)
“Without getting information from anyone else, write down your best estimate of the true proportion of wins to losses in the game. Do you think you are correct? How close do you think you are to the true value?”

They will need to divide the number of wins by 25, which should not lead to any computational errors! The point is that they really can’t know how close their estimate is to the true value – and what does “correct” mean?

Then work out the margin of error for a sample of size 25, which in this case is estimated at 20%. Get the students to calculate their 95% confidence intervals, and decide if they have the interval that contains the true population value. Get them to commit one way or the other.

Now they can talk to each other about the values they have.

There are several ways you can go from here. You can tell them what the population proportion was from which the numbers were drawn (0.54). They can then see that most of them had confidence intervals that included the true value, and some didn’t. Or you can leave them wondering, which is a better lesson about real life. Or you can do one exercise where you do tell them and one where you don’t.

This is an area where probability and statistics meet. You could make a nice little binomial distribution problem out of being correct in a number of confidence intervals. There are potential problems with independence, so you need to be a bit careful with the wording. For example: Fifteen  students undertake separate statistical analyses on the topics of their choice, and construct 95% confidence intervals. What is the probability that all the confidence intervals are correct, in that they do contain the estimated population parameter? This is well modelled by a binomial distribution with n =15 and p=0.05. P(X=0)=0.46. And another interesting idea – what is the probability that two or more are incorrect? 0.17 is the answer. So there is a 17% chance that more than one of the confidence intervals does not contain the population parameter of interest.

This is an area that needs careful teaching, and I suspect that some teachers have only a sketchy understanding of the idea of confidence intervals and margins of error. It is so important to know that statistical results are meant to be wrong some of the time.

Data for the 35 cards:

 Number on card 8 9 10 11 12 13 14 15 16 17 18 19 Number of cards 1 1 2 3 5 5 6 5 3 2 1 1

# Statistical Misconception Removal

A central city street soon after the February earthquake - before demolition

Our central city is being “deconstructed”. That’s the modern word for demolition. We live in Christchurch, New Zealand where many of our buildings were badly damaged by a string of serious earthquakes over the last 18 months, beginning on 4 September 2010. Over a thousand buildings in the central city are being demolished, because they are no longer safe. The larger ones will take up to a year to bring down, and experts have come from other parts of the world to assist in the process. It is pretty sad, really as we love our buildings. But we need to build new buildings, based on the knowledge we have gained on the geology of Christchurch. We cannot build new, strong buildings until the old ones are removed. And even some of the buildings which are structurally sound cannot yet be occupied because of danger from an adjacent unstable building.

They call the process deconstruction these days. The word deconstruction seems rather twee to me, and reminds me of feminist scholarship and words like “epistemology” and “discourse”. I presume it implies a surgical removal of parts of the building rather than whacking at it with a big metal ball. Either way, the bad stuff has to go.

When we are teaching, we endeavor to remove wrong ideas before we try to replace them with correct ideas. Interesting side thought – is there room for the wrecking-ball approach at times in teaching, or does that result in too much collateral damage? Lets’s go for gentle removal for now.

Let’s take the example of Sampling Error. Anyone with half a brain can see that the words tell us that sampling error is a mistake you make when sampling. Or at least it is caused by a bad sample. Well unfortunately, as you can read in a previous post, Sampling Error Isn’t, the half-brained person is wrong. The term is misleading. But enduring. Unless the initial wrong idea is removed, like a broken building, the learner will not accommodate the new idea. Actually maybe wrong ideas are more like weeds that pop up again if you don’t get them out at the roots. Sometimes we need to repeatedly remove wrong ideas.

The constructivist view of learning proposes that we build knowledge on our prior experiences and knowledge. Some times construction, like in Christchurch, involves preparatory deconstruction.

We have discovered a quick and effective way to work with this idea. It has not been scientifically tested but passes the ISTW criterion. (It Seems To Work).

Using the Learning Management System we provide short true/false quizzes. These are more activities than tests. The questions proceed systematically through concepts, and address possible misconceptions as statements for the students to decide are true or false, before presenting correct ideas in the feedback. The feedback is rather like notes in a textbook, but the students engage with the idea through the question, before being presented with the correct concept. Part of the ISTW evidence is that students are not required to do these tests, but choose to take them, often three times or more, generally until they get 100% in them.

An example of questions in the short test involving sampling error. (Click to make it big enough to read)

We have transferred this method to our new app, AtMyPace: Statistics. The app has a video for each topic, then a pair of follow-up quizzes as described. The quizzes are parallel, to help reinforce the ideas. My son (who is a genetic anomaly, having zero mathematical aptitude despite being the off-spring of an operations researcher and a land surveyor) happily worked away at the quizzes and commented that is was almost like a game! And he did better on the second quiz each time. (ISTW!)

Feedback after wrong answer to a question in the iPhone app, AtMyPace:Statistics.

We’d love some feedback on this approach. If you’d like a promocode for AtMyPace: Statistics, which is now available on iPhone, iPod touch and Ipad, contact me through the AtMyPace Facebook page. Leave comments here or there! Or to @Rogonic on Twitter.

# Sampling Error Isn’t

I hope you committed to a response in the box before reading this post.

This is an important topic. Recently I read an amusing blog regarding poor sampling technique. The tweet that led to the link called it “a humorous look at sample error”. I’m hoping the person who tweeted meant bad sampling, because the problem is, the story was not about sampling error.

And that is because sampling error isn’t. Isn’t what? It isn’t error. It doesn’t occur by mistake. It is not caused by bad procedures. There is nothing practical you can do when sampling to avoid sampling error. Sampling error exists because you are taking a sample. The only way to avoid sampling error is to test the entire population – in which case it isn’t a sample, it’s a census.

This is a vivid example of when a word in common use is given a different very specific meaning within a discipline that then confuses the heck out of everyone.

It has been found that even students who get A grades in first year statistics at university, often have serious flaws and gaps in their understanding of statistics. I would predict that the idea of sampling error will be a cavernous hole of misunderstanding for most.

The problem is not sampling error, but bias. Take a perfect random sample, where each object in the population has an equal probability of selection. This will reduce, and perhaps even eliminate bias. But sampling error will remain.

Because of natural variation it is unlikely that all people send the same number of texts in a day.

So how do you teach this? I use the approach of talking about variation*. Variation is inherent in all natural, human and manufacturing processes. We then classify variation into four categories: Natural, Explainable, Sampling and Bias. The term “natural variation” describes the omnipresence of variation in real life. “Explainable variation” is what we are often looking for in statistical analysis – can we use age of a car to help explain some of the variation in prices of cars, for instance. Sampling variation (also known as sampling error) occurs when we take a sample and use it to draw conclusions about the population. We would not expect two samples from the same population to yield exactly the same results. The fourth category is variation due to biased sampling.

This approach is not comprehensive, and can be a bit clunky in the terminology, jumping between variation and error. But it gives a framework for students to identify the difference between sampling error/variation and error due to biased sampling. We do classroom activities where students get different samples from the same population to illustrate sampling variation/error.

This is important. It is important that people in general understand that samples are not going to represent the population exactly. They also need to understand that through the use of theoretical probability models statisticians and analysts do allow for that sampling error. Bias, however, is another story for another day.

You can see how we explain the different kinds of variation in this YouTube video:

By the way – the correct answer to the question at the start of the post is False. No sampling method, no matter how good it is, will eliminate sampling error.

Let’s see if you get it – here are some statements about variation. Classify each of the following as examples of natural variation, explainable variation, sampling variation or variation due to biased sampling. I’ll put the answers in the comments to this blog.

• When I bike to work, sometimes it takes me longer than other times.
• When I bike to work with a head wind, it generally takes me longer than with a tail wind.
• Two students each took random samples of ten students from their class and asked them how many friends they have on Facebook. They got different values for their means.
• Two students each asked eight of their friends how many friends they have on Facebook. They got different values for their means.

*Note: This approach is based on the thought-provoking work by Wild and Pfannkuch, reported in “Statistical Thinking in Empirical Enquiry” International Statistical Review (1999) p235.