The subject of statistics is rife with misleading terms. I have written about this before in such posts as Teaching Statistical Language and It is so random. But the terms sampling error and non-sampling error win the Dr Nic prize for counter-intuitivity and confusion generation.

# Confusion abounds

To start with, the word error implies that a mistake has been made, so the term sampling error makes it sound as if we made a mistake while sampling. Well this is wrong. And the term non-sampling error (why is this even a term?) sounds as if it is the error we make from not sampling. And that is wrong too. However these terms are used extensively in the NZ statistics curriculum, so it is important that we clarify what they are about.

Fortunately the Glossary has some excellent explanations:

## Sampling Error

“Sampling error is the error that arises in a data collection process as a result of taking a sample from a population rather than using the whole population.

Sampling error is one of two reasons for the difference between an estimate of a population parameter and the true, but unknown, value of the population parameter. The other reason is non-sampling error. Even if a sampling process has no non-sampling errors then estimates from different random samples (of the same size) will vary from sample to sample, and each estimate is likely to be different from the true value of the population parameter.

The sampling error for a given sample is unknown but when the sampling is random, for some estimates (for example, sample mean, sample proportion) theoretical methods may be used to measure the extent of the variation caused by sampling error.”

## Non-sampling error:

“Non-sampling error is the error that arises in a data collection process as a result of factors other than taking a sample.

Non-sampling errors have the potential to cause bias in polls, surveys or samples.

There are many different types of non-sampling errors and the names used to describe them are not consistent. Examples of non-sampling errors are generally more useful than using names to describe them.

And it proceeds to give some helpful examples.

These are great definitions, and I thought about turning them into a diagram, so here it is:

And there are now two videos to go with the diagram, to help explain sampling error and non-sampling error. Here is a link to the first:

One of my earliest posts, Sampling Error Isn’t, introduced the idea of using variation due to sampling and other variation as a way to make sense of these ideas. The sampling video above is based on this approach.

Students need lots of practice identifying potential sources of error in their own work, and in critiquing reports. In addition I have found True/False questions surprisingly effective in practising the correct use of the terms. Whatever engages the students for a time in consciously deciding which term to use, is helpful in getting them to understand and be aware of the concept. Then the odd terminology will cease to have its original confusing connotations.

These concepts have been developed much further within the framework of total survey error. See the special issue of Public Opinion Quarterly on TSE: http://poq.oxfordjournals.org/content/74/5.toc, or at the very least the representation and measurement error branches of the TSE diagram, http://poq.oxfordjournals.org/content/74/5/849/F3.expansion.html.

LikeLike

can you pliz eplain more for me about the sampling error like giving example

LikeLike

Hi

Another way of looking at it is to call it sampling variation. Say the true and unknown population mean weight of something is 55kg. We take a which sample happens to contain items that gave a mean of 52. The sample may be representative and not have much non-sampling error at all, but there is sampling error.

Or another example could be Lotto balls. In NZ there are 40 lotto balls, numbered from 1 to 40, so the mean of them is 20.5. When 6 balls are drawn randomly, there is no non-sampling error as this is a gambling machine, that requires a high level of attention to eliminating bias and other non-sampling error. However, there is a high likelihood that any sample taken will have a mean different from 20.5. This is sampling error.

I hope that helps

Nic

LikeLike

That’s a great way of teaching. Thanks.

LikeLike

Your work is great. I would however love to see specific examples of sampling errors. God bless you in Jesus name.

LikeLike

I’m happy you like the blog.

You can’t have examples of sampling error. Sampling error, or sampling variation, which is a better term for it, exists because you take a sample of the population. Any examples of error you make due to sampling, are in fact non-sampling error.

LikeLike

Can you please explain more about the types of non sampling errors other than examples

LikeLike

can you tell me the non sampling error arise during the research study?

LikeLike

Hi, can you please let me know – if my population size is 1000 items, out of which I select 100 items and do a quality check on the 100 items, and if I discover 6 errors, is the error percentage 6% (6/100) or 0.6% (6/1000)? I felt the 100 is representative of the 1000, so the errors discovered in the 100 items are taken as having been discovered from out of the 1000 items. Can you throw some light on this please? Thanks.

LikeLike

Hi. The error percentage is 6/100, which we can use as an estimate of the percentage of errors in the whole population.

LikeLike

Thank you I had it all.mixed up.

LikeLike

So glad it helped. It can be very confusing.

LikeLike

HEY, I JUST WANT THE SPECIFIC EXAMPLES OF BOTH TYPES OF ERRORS..

LikeLike

Hi Norbert

I think you might need to read the post again. Basically any kind of error you think of is likely to be a non-sampling error. Sampling error occurs because the sample is not the whole population.

Nic

LikeLike

Pingback: Political polls – why they work – or don’t | Learn and Teach Statistics and Operations Research

Pingback: Political polls – why they work – or don’t | A bunch of data

Thanks for your help! Could you explain this problem with sampling error:

The percentage of a random sample of white respondents (N = 400) who say they have a favorable attitude toward the police is 53%. The percentage of a random sample of black respondents (N = 300) who say they have a favorable attitude toward the police is 45%.

You are asked if there is a real difference between the percentage of whites and blacks who have a positive attitude toward the police in the larger population, or is this sample difference likely to have occurred by random chance or sampling error.

How do you respond? Explain your answer.

Construct a 95% confidence interval for the proportion of Blacks in the population who have a favorable attitude toward the police.

LikeLike

Hi Leonardo

This looks a lot like a homework problem! You need to perform a difference of two proportions to find out the p-value. The p-value is the probability that you would get this result due to chance (sampling error). So really this question is not about sampling error.

LikeLike

Pingback: The Central Limit Theorem – with Dragons | Learn and Teach Mathematics and Statistics