The subject of statistics is rife with misleading terms. I have written about this before in such posts as Teaching Statistical Language and It is so random. But the terms sampling error and non-sampling error win the Dr Nic prize for counter-intuitivity and confusion generation.

# Confusion abounds

To start with, the word error implies that a mistake has been made, so the term sampling error makes it sound as if we made a mistake while sampling. Well this is wrong. And the term non-sampling error (why is this even a term?) sounds as if it is the error we make from not sampling. And that is wrong too. However these terms are used extensively in the NZ statistics curriculum, so it is important that we clarify what they are about.

Fortunately the Glossary has some excellent explanations:

## Sampling Error

“Sampling error is the error that arises in a data collection process as a result of taking a sample from a population rather than using the whole population.

Sampling error is one of two reasons for the difference between an estimate of a population parameter and the true, but unknown, value of the population parameter. The other reason is non-sampling error. Even if a sampling process has no non-sampling errors then estimates from different random samples (of the same size) will vary from sample to sample, and each estimate is likely to be different from the true value of the population parameter.

The sampling error for a given sample is unknown but when the sampling is random, for some estimates (for example, sample mean, sample proportion) theoretical methods may be used to measure the extent of the variation caused by sampling error.”

## Non-sampling error:

“Non-sampling error is the error that arises in a data collection process as a result of factors other than taking a sample.

Non-sampling errors have the potential to cause bias in polls, surveys or samples.

There are many different types of non-sampling errors and the names used to describe them are not consistent. Examples of non-sampling errors are generally more useful than using names to describe them.

And it proceeds to give some helpful examples.

These are great definitions, and I thought about turning them into a diagram, so here it is:

And there are now two videos to go with the diagram, to help explain sampling error and non-sampling error. Here is a link to the first:

One of my earliest posts, Sampling Error Isn’t, introduced the idea of using variation due to sampling and other variation as a way to make sense of these ideas. The sampling video above is based on this approach.

Students need lots of practice identifying potential sources of error in their own work, and in critiquing reports. In addition I have found True/False questions surprisingly effective in practising the correct use of the terms. Whatever engages the students for a time in consciously deciding which term to use, is helpful in getting them to understand and be aware of the concept. Then the odd terminology will cease to have its original confusing connotations.