Sampling error and non-sampling error

The subject of statistics is rife with misleading terms. I have written about this before in such posts as Teaching Statistical Language and It is so random. But the terms sampling error and non-sampling error win the Dr Nic prize for counter-intuitivity and confusion generation.

Confusion abounds

To start with, the word error implies that a mistake has been made, so the term sampling error makes it sound as if we made a mistake while sampling. Well this is wrong. And the term non-sampling error (why is this even a term?) sounds as if it is the error we make from not sampling. And that is wrong too. However these terms are used extensively in the NZ statistics curriculum, so it is important that we clarify what they are about.

Fortunately the Glossary has some excellent explanations:

Sampling Error

“Sampling error is the error that arises in a data collection process as a result of taking a sample from a population rather than using the whole population.

Sampling error is one of two reasons for the difference between an estimate of a population parameter and the true, but unknown, value of the population parameter. The other reason is non-sampling error. Even if a sampling process has no non-sampling errors then estimates from different random samples (of the same size) will vary from sample to sample, and each estimate is likely to be different from the true value of the population parameter.

The sampling error for a given sample is unknown but when the sampling is random, for some estimates (for example, sample mean, sample proportion) theoretical methods may be used to measure the extent of the variation caused by sampling error.”

Non-sampling error:

“Non-sampling error is the error that arises in a data collection process as a result of factors other than taking a sample.

Non-sampling errors have the potential to cause bias in polls, surveys or samples.

There are many different types of non-sampling errors and the names used to describe them are not consistent. Examples of non-sampling errors are generally more useful than using names to describe them.


And it proceeds to give some helpful examples.

These are great definitions, and I thought about turning them into a diagram, so here it is:

Table summarising types of error.

Table summarising types of error.

And there are now two videos to go with the diagram, to help explain sampling error and non-sampling error.

Video about sampling error

 Video about non-sampling error

One of my earliest posts, Sampling Error Isn’t, introduced the idea of using variation due to sampling and other variation as a way to make sense of these ideas. The sampling video above is based on this approach.

Students need lots of practice identifying potential sources of error in their own work, and in critiquing reports. In addition I have found True/False questions surprisingly effective in practising the correct use of the terms. Whatever engages the students for a time in consciously deciding which term to use, is helpful in getting them to understand and be aware of the concept. Then the odd terminology will cease to have its original confusing connotations.

Teaching random variables and distributions

Why do we teach about random variables, and why is it so difficult to understand?

Probability and statistics go together pretty well and basic probability is included in most introductory statistics courses. Often maths teachers prefer the probability section as it is more mathematical than inference or exploratory data analysis. Both probability and statistics deal with the idea of uncertainty and chance, statistics mostly being about what has happened, and probability about what might happen. Probability can be, and often is, reduced to fun little algebraic puzzles, with little link to reality. But a sound understanding of the concept of probability and distribution, is essential to H.G. Wells’s “efficient citizen”.

When I first started on our series of probability videos, I wrote about the worth of probability. Now we are going a step further into the probability topic abyss, with random variables. For an introductory statistics course, it is an interesting question of whether to include random variables. Is it necessary for the future marketing managers of the world, the medical practitioners, the speech therapists, the primary school teachers, the lawyers to understand what a random variable is? Actually, I think it is. Maybe it is not as important as understanding concepts like risk and sampling error, but random variables are still important.

Random variables

Like many concepts in our area, once you get what a random variable is, it can be hard to explain. Now that I understand what a random variable is, it is difficult to remember what was difficult to understand about it. But I do remember feeling perplexed, trying to work out what exactly a random variable was. The lecturers use the term freely, but I remember (many decades ago) just not being able to pin down what a random variable is. And why it needed to exist.

To start with, the words “random variable” are difficult on their own. I have dedicated an entire post to the problems with “random”, and in the writing of it, discovered another inconsistency in the way that we use the word. When we are talking about a random sample, random implies equal likelihood. Yet when we talk about things happening randomly, they are not always equally likely. The word “variable” is also a problem. Surely all variables vary? Students may wonder what a non-random variable is – I know I did.

I like to introduce the idea of variables, as part of mathematical modelling. We can have a simple model:

Cost of event = hall hire + per capita charge x number of guests.

In this model, the hall hire and per capita charge are both constants, and the number of guests is a variable. The cost of the event is also a variable, and can be expressed as a function of the number of guests. And vice versa! Now if we know the number of guests, we can then calculate the cost of the event. But the number of guests may be uncertain – it could be something between 100 and 120. It is thus a random variable.

Another way to look at a random variable is to come from the other direction – start with the random part and add the variable part. When something random happens, sometimes the outcome is discrete and non-numerical, such as the sex of a baby, the colour of a tulip, or the type of fruit in a lunchbox. But when the random outcome is given a value, then it becomes a random variable.


Pictorial representation of different distributions

Pictorial representation of different distributions

Then we come to distributions. I fear that too often distributions are taught in such a way that students believe that the normal or bell curve is a property guiding the universe, rather than a useful model that works in many different circumstances. (Rather like Adam Smith’s invisible hand that economists worship.) I’m pretty sure that is what I believed for many years, in my fog of disconnected statistical concepts. Somewhat telling, is the tendency for examples to begin with the words, “The life expectancy of a particular brand of lightbulb is normally distributed with a mean of …” or similar. Worse still, they don’t even mention the normal distribution, and simply say “The mean income per household in a certain state is $9500 with a standard deviation of $1750. The middle 95% of incomes are between what two values?” Students are left to assume that the normal distribution will apply, which in the second case is only a very poor approximation as incomes are likely to be skewed. This sloppy question-writing perpetuates the idea of the normal distribution as the rule that guides the universe.

Take a look at the textbook you use, and see what language it uses when asking questions about the normal distribution. The two examples above are from a popular AP statistics test preparation text.

I thought I’d better take a look at what Khan Academy did to random variables. I started watching the first video and immediately got hit with the flipping coin and rolling dice. No, people – this is not the way to introduce random variables! No one cares how many coins are heads. And even worse he starts with a zero/one random variable because we are only flipping one coin. And THEN he says that he could define a head as 100 and tail as 703 and…. Sorry, I can’t take it anymore.

A good way to introduce random variables

After LOTS of thinking and explaining, and trying stuff out, I have come up with what I think is a revolutionary and fabulous way to introduce random variables and distributions. You can see it for yourself. To begin with we use a discrete empirical distribution to illustrate the idea of a random variable. The random variable models the number of ice creams per customer.

Then we use that discrete distribution to teach about expected value and standard deviation, and combining random variables.

The third video introduces the idea of families of distributions, and shows how different distributions can be used to model the same random process.

Another unusual feature, is the introduction of the triangular distribution, which is part of the New Zealand curriculum. You can read here about the benefits of teaching the triangular distribution.

I’m pretty excited about this approach to teaching random variables and distributions. I’d love some feedback about it!


It is so random! Or is it? The meaning of randomness

The concept of “random” is a tough one.

First there is the problem of lexical ambiguity. There are colloquial meanings for random that don’t totally tie in with the technical or domain-specific meanings for random.

Then there is the fact that people can’t actually be random.

Then there is the problem of equal chance vs displaying a long-term distribution.

And there is the problem that there are several conflicting ideas associated with the word “random”.

In this post I will look at these issues, and ask some questions about how we can better teach students about randomness and random sampling. This problem exists for many domain specific terms, that have colloquial meanings that hinder comprehension of the idea in question. You can read about more of these words, and some teaching ideas in the post, Teaching Statistical Language.

Lexical ambiguity

First there is lexical ambiguity. Lexical ambiguity is a special term meaning that the word has more than one meaning. Kaplan, Rogness and Fisher write about this in their 2014 paper “Exploiting Lexical Ambiguity to help students understand the meaning of Random.” I recently studied this paper closely in order to present the ideas and findings to a group of high school teachers. I found the concept of leveraging lexical ambiguity very interesting. As a useful intervention, Kaplan et al introduced a picture of “random zebras” to represent the colloquial meaning of random, and a picture of a hat to represent the idea of taking a random sample. I think it is a great idea to have pictures representing the different meanings, and it might be good to get students to come up with their own.

Representations of the different meanings of the word, random.

Representations of the different meanings of the word, random.

So what are the different meanings for random? I consulted some on-line dictionaries.

Different meanings

Without method

The first meaning of random describes something happening without pattern, method or conscious decision. An example is “random violence”.
Example: She dressed in a rather random faction, putting on whatever she laid her hand on in the dark.

Statistical meaning

Most on-line dictionaries also give a statistical definition, which includes that each item has an equal probability of being chosen.
Example: The students’ names were taken at random from a pile, to decide who would represent the school at the meeting.

Informal or colloquial

One meaning: Something random is either unknown, unidentified, or out of place.
Example: My father brought home some random strangers he found under a bridge.

Another colloquial meaning for random is odd and unpredictable in an amusing way.
Example: My social life is so random!

People cannot be random

There has been considerable research into why people cannot provide a sequence of random numbers that is like a truly randomly generated sequence. In our minds we like things to be shared out evenly and the series will generally have fewer runs of the same number.

Animals aren’t very random either, it seems. Yesterday I saw a whole lot of sheep in a paddock, and while they weren’t exactly lined up, there was a pretty similar distance between all the sheep.

Equal chance vs long-term distribution

In the paper quoted earlier, Kaplan et al used the following definition of random:

“We call a phenomenon random if individual outcomes are uncertain, but there is nonetheless a regular distribution of outcomes in a large number of repetitions.” From Moore (2007) The Basic Practice of Statistics.

Now to me, that does not insist that each outcome be equally likely, which matches with my idea of randomness. In my mind, random implies chance, but not equal likelihood. When creating simulation models we would generate random variates following all sorts of distributions. The outcomes would be far from even, but in the long run they would display a distribution similar to the one being modelled.

Yet the dictionaries, and the later parts of the Kaplan paper insist that randomness requires equal opportunity to be chosen. What’s a person to do?

I propose that the meaning of the adjective, “random” may depend on the noun that it is qualifying. There are random samples and random variables. There is also randomisation and randomness.

A random sample is a sample in which each object has an equal opportunity of being chosen, and each choice of object is by chance, and independent of the previous objects chosen. A random variable is one that can take a number of values, and will generally display a pattern of outcomes similar to a given distribution.

I wonder if the problem is that randomness is somehow equated with fairness. Our most familiar examples of true randomness come from gambling, with dice, cards, roulette wheels and lotto balls. In each case there is the requirement that each outcome be equally likely.

Bearing in mind the overwhelming evidence that the “statistical meaning” of randomness includes equality, I begin to think that it might not really matter if people equate randomness with equal opportunity.

However, if you think about medical or hazard risk, the story changes. Apart from known risk increasing factors associated with lifestyle, whether a person succumbs to a disease appears to be random. But the likelihood of succumbing is not equal to the likelihood of not succumbing. Similarly there is a clear random element in whether a future child has a disability known to be caused by an autorecessive gene. It is definitely random, in that there is an element of chance, and that the effects on successive children are independent. But the probability of a disability is one in four. I suppose if you look at the outcomes as being which children are affected, there is an equal chance for each child.

But then think about a “lucky dip” containing many cheap prizes and a few expensive prizes. The choice of prize is random, but there is not an even chance of getting a cheap prize or an expensive prize.

I think I have mused enough. I’m interested to know what the readers think. Whatever the conclusion is, it is clear that we need to spend some time making clear to the students what is meant by randomness, and a random sample.


Teaching Confidence Intervals

If you want your students to understand just two things about confidence intervals, what would they be?

What and what order

When making up a teaching plan for anything it is important to think about whom you are teaching, what it is you want them to learn, and what order will best achieve the most important desired outcomes. In my previous life as a university professor I mostly taught confidence intervals to business students, including MBAs. Currently I produce materials to help teach high school students. When teaching business students, I was aware that many of them had poor mathematics skills, and I did not wish that to get in the way of their understanding. High School students may well be more at home with formulas and calculations, but their understanding of the outside world is limited. Consequently the approaches for these two different students may differ.

Begin with the end in mind

I use the “all of the people, some of the time” principle when deciding on the approach to use in teaching a topic. Some of the students will understand most of the material, but most of the students will only really understand some of the material, at least the first time around. Statistics takes several attempts before you approach fluency. Generally the material students learn will be the material they get taught first, before they start to get lost. Therefore it is good to start with the important material. I wrote a post about this, suggesting starting at the very beginning is not always the best way to go. This is counter-intuitive to mathematics teachers who are often very logical and wish to take the students through from the beginning to the end.

At the start I asked this question – if you want your students to understand just two things about confidence intervals, what would they be?

To me the most important things to learn about confidence intervals are what they are and why they are needed. Learning about the formula is a long way down the list, especially in these days of computers.

The traditional approach to teaching confidence intervals

A traditional approach to teaching confidence intervals is to start with the concept of a sampling distribution, followed by calculating the confidence interval of a mean using the Z distribution. Then the t distribution is introduced. Many of the questions involve calculation by formula. Very little time is spent on what a confidence interval is and why we need them. This is the order used in many textbooks. The Khan Academy video that I reviewed in a previous post does just this.

A different approach to teaching confidence intervals

My approach is as follows:
Start with the idea of a sample and a population, and that we are using a sample to try to find out an unknown value from the population. Show our video about understanding a confidence interval. One comment on this video decried the lack of formulas. I’m not sure what formulas would satisfy the viewer, but as I was explaining what a confidence interval is, not how to get it, I had decided that formulas would not help.

The new New Zealand school curriculum follows a process to get to the use of formal confidence intervals. Previously the assessment was such that a student could pass the confidence interval section by putting values into formulas in a calculator. In the new approach, early high school students are given real data to play with, and are encouraged to suggest conclusions they might be able to draw about the population, based on the sample. Then in Year 12 they start to draw informal confidence intervals, based on the sample. This uses a simple formula for the confidence interval of a median and is shown in the following video:

Then in Year 13, we introduce bootstrapping as an intuitively appealing way to calculate confidence intervals. Students use existing data to draw a conclusion about two medians. This video goes through how this works and how to use iNZight to perform the calculations.

In a more traditional course, you could instead use the normal-based formula for the confidence interval of a mean. We now have a video for that as well.

You could then examine the idea of the sampling distribution and the central limit theorem.

The point is that you start with getting an idea of what a confidence interval is, and then you find out how to find one, and then you start to find out the theory underpinning it. You can think of it as successive refinement. Sometimes when we see photos downloading onto a device, they start off blurry, and then gradually become clearer as we gain more information. This is a way to learn a complex idea, such as confidence intervals. We start with the big picture, and not much detail, and then gradually fill out the details of the how and how come of the calculations.

When do we teach the formulas?

Some teachers believe that the students need to know the formulas in order to understand what is going on. This is probably true for some students, but not all. There are many kinds of understanding, and I prefer a conceptual and graphical approaches. If formulas are introduced at the end of the topic, then the students who like formulas are satisfied, and the others are not alienated. Sometimes it is best to leave the vegetables until last! (This is not a comment on the students!)

For more ideas about teaching confidence intervals see other posts:
Good, bad and wrong videos about confidence intervals
Confidence Intervals: informal, traditional, bootstrap
Why teach resampling

Deterministic and Probabilistic models and thinking

The way we understand and make sense of variation in the world affects decisions we make.

Part of understanding variation is understanding the difference between deterministic and probabilistic (stochastic) models. The NZ curriculum specifies the following learning outcome: “Selects and uses appropriate methods to investigate probability situations including experiments, simulations, and theoretical probability, distinguishing between deterministic and probabilistic models.” This is at level 8 of the curriculum, the highest level of secondary schooling. Deterministic and probabilistic models are not familiar to all teachers of mathematics and statistics, so I’m writing about it today.


The term, model, is itself challenging. There are many ways to use the word, two of which are particularly relevant for this discussion. The first meaning is “mathematical model, as a decision-making tool”. This is the one I am familiar with from years of teaching Operations Research. The second way is “way of thinking or representing an idea”. Or something like that. It seems to come from psychology.

When teaching mathematical models in entry level operations research/management science we would spend some time clarifying what we mean by a model. I have written about this in the post, “All models are wrong.”

In a simple, concrete incarnation, a model is a representation of another object. A simple example is that of a model car or a Lego model of a house. There are aspects of the model that are the same as the original, such as the shape and ability to move or not. But many aspects of the real-life object are missing in the model. The car does not have an internal combustion engine, and the house has no soft-furnishings. (And very bumpy floors). There is little purpose for either of these models, except entertainment and the joy of creation or ownership. (You might be interested in the following video of the Lego Parisian restaurant, which I am coveting. Funny way to say Parisian!)

Many models perform useful functions. My husband works as a land-surveyor, and his work involves making models on paper or in the computer, of phenomenon on the land, and making sure that specified marks on the model correspond to the marks placed in the ground. The purpose of the model relates to ownership and making sure the sewers run in the right direction. (As a result of several years of earthquakes in Christchurch, his models are less deterministic than they used to be, and unfortunately many of our sewers ended up running the wrong way.)

Our world is full of models:

  • a map is a model of a location, which can help us get from place to place.
  • sheet music is a written model of the sound which can make a song
  • a bus timetable is a model of where buses should appear
  • a company’s financial reports are a model of one aspect of the company

Deterministic models

A deterministic model assumes certainty in all aspects. Examples of deterministic models are timetables, pricing structures, a linear programming model, the economic order quantity model, maps, accounting.

Probabilistic or stochastic models

Most models really should be stochastic or probabilistic rather than deterministic, but this is often too complicated to implement. Representing uncertainty is fraught. Some more common stochastic models are queueing models, markov chains, and most simulations.

For example when planning a school formal, there are some elements of the model that are deterministic and some that are probabilistic. The cost to hire the venue is deterministic, but the number of students who will come is probabilistic. A GPS unit uses a deterministic model to decide on the most suitable route and gives a predicted arrival time. However we know that the actual arrival time is contingent upon all sorts of aspects including road, driver, traffic and weather conditions.

Model as a way of thinking about something

The term “model” is also used to describe the way that people make sense out of their world. Some people have a more deterministic world model than others, contributed to by age, culture, religion, life experience and education. People ascribe meaning to anything from star patterns, tea leaves and moon phases to ease in finding a parking spot and not being in a certain place when a coconut falls. This is a way of turning a probabilistic world into a more deterministic and more meaningful world. Some people are happy with a probabilistic world, where things really do have a high degree of randomness. But often we are less happy when the randomness goes against us. (I find it interesting that farmers hit with bad fortune such as a snowfall or drought are happy to ask for government help, yet when there is a bumper crop, I don’t see them offering to give back some of their windfall voluntarily.)

Let us say the All Blacks win a rugby game against Australia. There are several ways we can draw meaning from this. If we are of a deterministic frame of mind, we might say that the All Blacks won because they are the best rugby team in the world.  We have assigned cause and effect to the outcome. Or we could take a more probabilistic view of it, deciding that the probability that they would win was about 70%, and that on the day they were fortunate.  Or, if we were Australian, we might say that the Australian team was far better and it was just a 1 in 100 chance that the All Blacks would win.

I developed the following scenarios for discussion in a classroom. The students can put them in order or categories according to their own criteria. After discussing their results, we could then talk about a deterministic and a probabilistic meaning for each of the scenarios.

  1. The All Blacks won the Rugby World Cup.
  2. Eri did better on a test after getting tuition.
  3. Holly was diagnosed with cancer, had a religious experience and the cancer was gone.
  4. A pet was given a homeopathic remedy and got better.
  5. Bill won $20 million in Lotto.
  6. You got five out of five right in a true/false quiz.

The regular mathematics teacher is now a long way from his or her comfort zone. The numbers have gone, along with the red tick, and there are no correct answers. This is an important aspect of understanding probability – that many things are the result of randomness. But with this idea we are pulling mathematics teachers into unfamiliar territory. Social studies, science and English teachers have had to deal with the murky area of feelings, values and ethics forever.  In terms of preparing students for a random world, I think it is territory worth spending some time in. And it might just help them find mathematics/statistics relevant!

Proving causation

Aeroplanes cause hot weather

In Christchurch we have a weather phenomenon known as the “Nor-wester”, which is a warm dry wind, preceding a cold southerly change. When the wind is from this direction, aeroplanes make their approach to the airport over the city. Our university is close to the airport in the direct flightpath, so we are very aware of the planes. A new colleague from South Africa drew the amusing conclusion that the unusual heat of the day was caused by all the planes flying overhead.

Statistics experts and educators spend a lot of time refuting claims of causation. “Correlation does not imply causation” has become a catch cry of people trying to avoid the common trap. This is a great advance in understanding that even journalists (notoriously math-phobic) seem to have caught onto. My own video on important statistical concepts ends with the causation issue. (You can jump to it at 3:51)

So we are aware that it is not easy to prove causation.

In order to prove causation we need a randomised experiment. We need to make random any possible factor that could be associated, and thus cause or contribute to the effect. This next video, about experimental design, addresses this concept. It is possible to prove that one factor causes an effect by using randomised design.

There is also the related problem of generalizability. If we do have a randomised experiment, we can prove causation. But unless the sample is also a random representative sample of the population in question, we cannot infer that the results will also transfer to the population in question. This is nicely illustrated in this matrix from The Statistical Sleuth by Fred L. Ramsey and Daniel W Schafer.

The relationship between the type of sample and study and the conclusions that may be drawn.

The relationship between the type of sample and study and the conclusions that may be drawn.

The top left-hand quadrant is the one in which we can draw causal inferences for the population.

Causal claims from observational studies

A student posed this question:  Is it possible to prove a causal link based on an observational study alone?

It would be very useful if we could. It is not always possible to use a randomised trial, particularly when people are involved. Before we became more aware of human rights, experiments were performed on unsuspecting human lab rats. A classic example is the Vipeholm experiments where patients at a mental hospital were the unknowing subjects. They were given large quantities of sweets in order to determine whether sugar caused cavities in teeth. This happened into the early 1950s. These days it would not be acceptable to randomly assign people to groups who are made to smoke or drink alcohol or consume large quantities of fat-laden pastries. We have to let people make those lifestyle choices for themselves. And observe. Hence observational studies!

There is a call for “evidence-based practice” in education to follow the philosophy in medicine. But getting educational experiments through ethics committee approval is very challenging, and it is difficult to use rats or fruit-flies to impersonate the higher learning processes of humans. The changing landscape of the human environment makes it even more difficult to perform educational experiments.

To find out the criteria for justifying causal claims in an observational study I turned to one of my favourite statistics text-books, Chance Encounters by Wild and Seber  (page 27). They cite the Surgeon General of the United States. The criteria for the establishment of a cause and effect relationship in an epidemiological study are the following:

  1. Strong relationship: For example illness is four times as likely among people exposed to a possible cause as it is for those who are not exposed.
  2. Strong research design
  3. Temporal relationship: The cause must precede the effect.
  4. Dose-response relationship: Higher exposure leads to a higher proportion of people affected.
  5. Reversible association: Removal of the cause reduces the incidence of the effect.
  6. Consistency: Multiple studies in different locations producing similar effects
  7. Biological plausibility: there is a supportable biological mechanism
  8. Coherence with known facts.

Teaching about causation

In high school, and entry-level statistics courses, the focus is often on statistical literacy. This concept of causation is pivotal to correct understanding of what statistics can and cannot claim. It is worth spending some time in the classroom discussing what would constitute reasonable proof and what would not. In particular it is worthwhile to come up with alternative explanations for common fallacies, or even truths in causation. Some examples for discussion might be drink-driving and accidents, smoking and cancer, gender and success in all number of areas, home game advantage in sport, the use of lucky charms, socks and undies. This also ties nicely with probability theory, helping to tie the year’s curriculum together.

How to learn statistics (Part 2)

Some more help (preaching?) for students of statistics

Last week I outlined the first five principles to help people to learn and study statistics.

They focussed on how you need to practise in order to be good at statistics and you should not wait until you understand it completely before you start applying. I sometimes call this suspending disbelief. Next I talked about the importance of context in a statistical investigation, which is one of the ways that statistics is different from pure mathematics. And finally I stressed the importance of technology as a tool, not only for doing the analysis, but for exploring ideas and gaining understanding.

Here are the next five principles (plus 2):

6. Terminology is important and at times inconsistent

There are several issues with regard to statistical terminology, and I have written a post with ideas for teachers on how to teach terminology.

One issue with terminology is that some words that are used in the study of statistics have meanings in everyday life that are not the same. A clear example of this is the word, “significant”. In regular usage this can mean important or relevant, yet in statistics, it means that there is evidence that an effect that shows up in the sample also exists in the population.

Another issue is that statistics is a relatively young science and there are inconsistencies in terminology. We just have to live with that. Depending on the discipline in which the statistical analysis is applied or studied, different terms can mean the same thing, or very close to it.

A third language problem is that mixed in with the ambiguity of results, and judgment calls, there are some things that are definitely wrong. Teachers and examiners can be extremely picky. In this case I would suggest memorising the correct or accepted terminology for confidence intervals and hypothesis tests. For example I am very fussy about the explanation for the R-squared value in regression. Too often I hear that it says how much of the dependent variable is explained by the independent variable. There needs to be the word “variation” inserted in there to make it acceptable. I encourage my students to memorise a format for writing up such things. This does not substitute for understanding, but the language required is precise, so having a specific way to write it is fine.

This problem with terminology can be quite frustrating, but I think it helps to have it out in the open. Think of it as learning a new language, which is often the case in new subject. Use glossaries, to make sure you really do know what a term means.

7. Discussion is important

This is linked with the issue of language and vocabulary. One way to really learn something is to talk about it with someone else and even to try and teach it to someone else. Most teachers realise that the reason they know something pretty well, is because they have had to teach it. If your class does not include group work, set up your own study group. Talk about the principles as well as the analysis and context, and try to use the language of statistics. Working on assignments together is usually fine, so long as you write them up individually, or according to the assessment requirements.

8. Written communication skills are important

Mathematics has often been a subject of choice for students who are not fluent in English. They can perform well because there is little writing involved in a traditional mathematics course. Statistics is a different matter, though, as all students should be writing reports. This can be difficult at the start, but as students learn to follow a structure, it can be made more palatable. A statistics report is not a work of creative writing, and it is okay to use the same sentence structure more than once. Neither is a statistics report a narrative of what you did to get to the results. Generous use of headings makes a statistical report easier to read and to write. A long report is not better than a short report, if all the relevant details are there.

9. Statistics has an ethical and moral aspect

This principle is interesting, as many teachers of statistics come from a mathematical background, and so have not had exposure to the ethical aspects of research themselves. That is no excuse for students to park their ethics at the door of the classroom. I will be pushing for more consideration of ethical aspects of research as part of the curriculum in New Zealand. Students should not be doing experiments on human subjects that involve delicate subjects such as abuse, or bullying. They should not involve alcohol or other harmful substances. They should be aware of the potential to do harm, and make sure that any participants have been given full information and given consent. This can be quite a hurdle, but is part of being an ethical human being. It also helps students to be more aware when giving or withholding consent in medical and other studies.

10. The study of statistics can change the way you view the world

Sometimes when we learn something at school, it stays at school and has no impact on our everyday lives. This should not be the case with the study of statistics. As we learn about uncertainty and variation we start to see this in the world around us. When we learn about sampling and non-sampling errors, we become more critical of opinion polls and other research reported in the media. As we discover the power of statistical analysis and experimentation, we start to see the importance of evidence-based practice in medicine, social interventions and the like.

11. Statistics is an inherently interesting and relevant subject.

And it can be so much fun. There is a real excitement in exploring data, and becoming a detective. If you aren’t having fun, you aren’t doing it right!

12. Resources from Statistics Learning Centre will help you learn.

Of course!

Statistics is not beautiful (sniff)

Statistics is not really elegant or even fun in the way that a mathematics puzzle can be. But statistics is necessary, and enormously rewarding. I like to think that we use statistical methods and principles to extract truth from data.

This week many of the high school maths teachers in New Zealand were exhorted to take part in a Stanford MOOC about teaching mathematics. I am not a high school maths teacher, but I do try to provide worthwhile materials for them, so I thought I would take a look. It is also an opportunity to look at how people with an annual budget of more than 4 figures produce on-line learning materials. So I enrolled and did the first lesson, which is about people’s attitudes to math(s) and their success or trauma that has led to those attitudes. I’m happy to say that none of this was new to me. I am rather unhappy that it would be new to anyone! Surely all maths teachers know by now that how we deal with students’ small successes and failures in mathematics will create future attitudes leading to further success or failure. If they don’t, they need to take this course. And that makes me happy – that there is such a course, on-line and free for all maths teachers. (As a side note, I loved that Jo, the teacher switched between the American “math” and the British/Australian/NZ “maths”).

I’ve only done the first lesson so far, and intend to do some more, but it seems to be much more about mathematics than statistics, and I am not sure how relevant it will be. And that makes me a bit sad again. (It was an emotional journey!)

Mathematics in its pure form is about thinking. It is problem solving and it can be elegant and so much fun. It is a language that transcends nationality. (Though I have always thought the Greeks get a rough deal as we steal all their letters for the scary stuff.) I was recently asked to present an enrichment lesson to a class of “gifted and talented” students. I found it very easy to think of something mathematical to do – we are going to work around our Rogo puzzle, which has some fantastic mathematical learning opportunities. But thinking up something short and engaging and realistic in the statistics realm is much harder. You can’t do real statistics quickly.

On my run this morning I thought a whole lot more about this mathematics/statistics divide. I have written about it before, but more in defense of statistics, and warning the mathematics teachers to stay away or get with the programme. Understanding commonalities and differences can help us teach better. Mathematics is pure and elegant, and borders on art. It is the purest science. There is little beautiful about statistics. Even the graphs are ugly, with their scattered data and annoying outliers messing it all up. The only way we get symmetry is by assuming away all the badly behaved bits. Probability can be a bit more elegant, but with that we are creeping into the mathematical camp.

English Language and English literature

I like to liken. I’m going to liken maths and stats to English language and English literature. I was good at English at school, and loved the spelling and grammar aspects especially. I have in my library a very large book about the English language, (The Cambridge encyclopedia of the English Language, by David Crystal) and one day I hope to read it all. It talks about sounds and letters, words, grammar, syntax, origins, meanings. Even to dip into, it is fascinating. On the other hand I have recently finished reading “The End of Your Life Book Club” by Will Schwalbe, which is a biography of his amazing mother, set around the last two years of her life as she struggles with cancer. Will and his mother are avid readers, and use her time in treatment to talk about books. This book has been an epiphany for me. I had forgotten how books can change your way of thinking, and how important fiction is. At school I struggled with the literature side of English, as I wanted to know what the author meant, and could not see how it was right to take my own meaning from a book, poem or work of literature.  I have since discovered post-modernism and am happy drawing my own meaning.

So what does this all have to do with maths and statistics? Well I liken maths to English language. In order to be good at English you need to be able to read and write in a functional way. You need to know the mechanisms. You need to be able to DO, not just observe. In mathematics, you need to be able to approach a problem in a mathematical way.  Conversely, to be proficient in literature, you do not need to be able to produce literature. You need to be able to read literature with a critical mind, and appreciate the ideas, the words, the structure. You do need to be able to write enough to express your critique, but that is a different matter from writing a novel.  This, to me is like being statistically literate – you can read a statistical report, and ask the right questions. You can make sense of it, and not be at the mercy of poorly executed or mendacious research. You can even write a summary or a critique of a statistical analysis. But you do not need to be able to perform the actual analysis yourself, nor do you need to know the exact mathematical theory underlying it.

Statistical Literacy?

Maybe there is a problem with the term “statistical literacy”. The traditional meaning of literacy includes being able to read and write – to consume and to produce – to take meaning and to create meaning. I’m not convinced that what is called statistical literacy is the same.

Where I’m heading with this, is that statistics is a way to win back the mathematically disenfranchised. If I were teaching statistics to a high school class I would spend some time talking about what statistics involves and how it overlaps with, but is not mathematics. I would explain that even people who have had difficulty in the past with mathematics, can do well at statistics.

The following table outlines the different emphasis of the two disciplines.

Mathematics Statistics
Proficiency with numbers is important Proficiency with numbers is helpful
Abstract ideas are important Concrete applications are important
Context is to be removed so that we can model the underlying ideas Context is crucial to all statistical analysis
You don’t need to write very much. Written expression in English is important

Another idea related to this is that of “magic formulas” or the cookbook approach. I don’t have a problem with cookbooks and knitting patterns. They help me to make things I could not otherwise. However, the more I use recipes and patterns, the more I understand the principles on which they are based. But this is a thought for another day.

Conceptualising Probability

The problem with probability is that it doesn’t really exist. Certainly it never exists in the past.

Probability is an invention we use to communicate our thoughts about how likely something is to happen. We have collectively agreed that 1 is a certain event and 0 is impossible. 0.5 means that there is just as much chance of something happening as not. We have some shared perception that 0.9 means that something is much more likely to happen than to not happen. Probability is also useful for when we want to do some calculations about something that isn’t certain. Often it is too hard to incorporate all uncertainty, so we assume certainty and put in some allowance for error.

Sometimes probability is used for things that happen over and over again, and in that case we feel we can check to see if our predication about how likely something is to happen was correct. The problem here is that we actually need things to happen a really big lot of times under the same circumstances in order to assess if we were correct. But when we are talking about the probability of a single event, that either will or won’t happen, we can’t test out if we were right or not afterwards, because by that time it either did or didn’t happen. The probability no longer exists.

Thus to say that there is a “true” probability somewhere in existence is rather contrived. The truth is that it either will happen or it won’t. The only way to know a true probability would be if this one event were to happen over and over and over, in the wonderful fiction of parallel universes. We could then count how many times it would turn out one way rather than another. At which point the universes would diverge!

However, for the interests of teaching about probability, there is the construct that there exists a “true probability” that something will happen.

Why think about probability?

What prompted these musings about probability was exploring the new NZ curriculum and companion documents, the Senior Secondary Guide and

In Level 8 (last year of secondary school) of the senior secondary guide it says, “Selects and uses an appropriate distribution to solve a problem, demonstrating understanding of the relationship between true probability (unknown and unique to the situation), model estimates (theoretical probability) and experimental estimates.”

And at NZC level 3 (years 5 and 6 at Primary school!) in the Key ideas in Probability it talks about “Good Model, No Model and Poor Model” This statement is referred to at all levels above level 3 as well.

I decided I needed to make sense of these two conceptual frameworks: true-model-experimental and good-poor-no, and tie it to my previous conceptual framework of classical-frequency-subjective.

Here goes!

Delicious Mandarins

Let’s make this a little more concrete with an example. We need a one-off event. What is the probability that the next mandarin I eat will be delicious? It is currently mandarin season in New Zealand, and there is nothing better than a good mandarin, with the desired combination of sweet and sour, and with plenty of juice and a good texture. But, being a natural product, there is a high level of variability in the quality of mandarins, especially when they may have parted company with the tree some time ago.

There are two possible outcomes for my future event. The mandarin will be delicious or it will not. I will decide when I eat it. Some may say that there is actually a continuum of deliciousness, but for now this is not the case. I have an internal idea of deliciousness and I will know. I think back to my previous experience with mandarins. I think about a quarter are horrible, a half are nice enough and about a quarter are delicious (using the Dr Nic scale of mandarin grading). If the mandarin I eat next belongs to the same population as the ones in my memory, then I can predict that there is a 25% probability that the mandarin will be delicious.

The NZ curriculum talks about “true” probability which implies that any value I give to the probability is only a model. It may be a model based on empirical or experimental evidence. It can be based on theoretical probabilities from vast amounts of evidence, which has given us the normal distribution. The value may be only a number dredged up from my soul, which expresses the inner feeling of how likely it is that the mandarin will be delicious, based on several decades of experience in mandarin consumption.

More examples

Let us look at some more examples:

What is the probability that:

  • I will hear a bird on the way to work?
  • the flight home will be safe?
  • it will be raining when I get to Christchurch?
  • I will get a raisin in my first spoonful of muesli?
  • I will get at least one raisin in half of my spoonfuls of muesli?
  • the shower in my hotel room will be enjoyable?
  • I will get a rare Lego ® minifigure next time I buy one?

All of these events are probabilistic and have varying degrees of certainty and varying degrees of ease of modelling.

Easy to model Hard to model
Unlikely Get a rare Lego ® minifigure Raining in Christchurch
No idea Raisin in half my spoonfuls Enjoyable shower
Likely Raisin in first spoonful Bird, safe flight home

And as I construct this table I realise also that there are varying degrees of importance. Except for the flight home, none of those examples matter. I am hoping that a safe flight home has a probability extremely close to 1. I realise that there is a possibility of an incident. And it is difficult to model. But people have modelled air safety and the universal conclusion is that it is safer than driving. So I will take the probability and fly.

Conceptual Frameworks

How do we explain the different ways that probability has been described? I will now examine the three conceptual frameworks I introduced earlier, starting with the easiest.

Traditional categorisation

This is found in some form in many elementary college statistics text books. The traditional framework has three categories –classical or “a priori”, frequency or historical, and subjective.

Classical or “a priori” – I had thought of this as being “true” probability. To me, if there are three red and three white Lego® blocks in a bag and I take one out without looking, there is a 50% chance that I will get a red one. End of story. How could it be wrong? This definition is the mathematically interesting aspect of probability. It is elegant and has cool formulas and you can make up all sorts of fun examples using it. And it is the basis of gambling.

Frequency or historical – we draw on long term results of similar trials to gain information. For example we look at the rate of germination of a certain kind of seed by experiment, and that becomes a good approximation of the likelihood that any one future seed will germinate. And it also gives us a good estimate of what proportion of seeds in the future will germinate.

Subjective – We guess! We draw on our experience of previous similar events and we take a stab at it. This is not seen as a particularly good way to come up with a probability, but when we are talking about one off events, it is impossible to assess in retrospect how good the subjective probability estimate was. There is considerable research in the field of psychology about the human ability or lack thereof to attribute subjective probabilities to events.

In teaching the three part categorisation of sources of probability I had problems with the probability of rain. Where does that fit in the three categories? It uses previous experimental data to build a model, and current data to put into the model, and then a probability is produced. I decided that there is a fourth category, that I called “modelled”. But really that isn’t correct, as they are all models.

NZ curriculum terminology

So where does this all fit in the New Zealand curriculum pronouncements about probability? There are two conceptual frameworks that are used in the document, each with three categories as follows:

True, modelled, experimental

In this framework we start with the supposition that there exists somewhere in the universe a true probability distribution. We cannot know this. Our expressions of probability are only guesses at what this might be. There are two approaches we can take to estimate this “truth”. These two approaches are not independent of each other, but often intertwined.

One is a model estimate, based on theory, such as that the probability of a single outcome is the number of equally likely ways that it can occur over the number of possible outcomes. This accounts for the probability of a red brick as opposed to a white brick, drawn at random. Another example of a modelled estimate is the use of distributions such as the binomial or normal.

In addition there is the category of experimental estimate, in which we use data to draw conclusions about what it likely to happen. This is equivalent to the frequency or historical category above. Often modelled distributions use data from an experiment also. And experimental probability relies on models as well.  The main idea is that neither the modelled nor the experimental estimate of the “true” probability distribution is the true distribution, but rather a model of some sort.

Good model, poor model, no model

The other conceptual framework stated in the NZ curriculum is that of good model, poor model and no model, which relates to fitness for purpose. When it is important to have a “correct” estimate of a probability such as for building safety, gambling machines, and life insurance, then we would put effort into getting as good a model as possible. Conversely, sometimes little effort is required. Classical models are very good models, often of trivial examples such as dice games and coin tossing. Frequency models aka experimental models may or may not be good models, depending on how many observations are included, and how much the future is similar to the past. For example, a model of sales of slide rules developed before the invention of the pocket calculator will be a poor model for current sales. The ground rules have changed. And a model built on data from five observations of is unlikely to be a good model. A poor model is not fit for purpose and requires development, unless the stakes are so low that we don’t care, or the cost of better fitting is greater than the reward.

I have problems with the concept of “no model”. I presume that is the starting point, from which we develop a model or do not develop a model if it really doesn’t matter. In my examples above I include the probability that I will hear a bird on the way to work. This is not important, but rather an idle musing. I suspect I probably will hear a bird, so long as I walk and listen. But if it rains, I may not. As I am writing this in a hotel in an unfamiliar area I have no experience on which to draw. I think this comes pretty close to “no model”. I will take a guess and say the probability is 0.8. I’m pretty sure that I will hear a bird. Of course, now that I have said this, I will listen carefully, as I would feel vindicated if I hear a bird. But if I do not hear a bird, was my estimate of the probability wrong? No – I could assume that I just happened to be in the 0.2 area of my prediction. But coming back to the “no model” concept – there is now a model. I have allocated the probability of 0.8 to the likelihood of hearing a bird. This is a model. I don’t even know if it is a good model or a poor model. I will not be walking to work this way again, so I cannot even test it out for the future, and besides, my model was only for this one day, not for all days of walking to work.

So there you have it – my totally unscholarly musings on the different categorisations of probability.

What are the implications for teaching?

We need to try not to perpetuate the idea that probability is the truth. But at the same time we do not wish to make students think that probability is without merit. Probability is a very useful, and at times highly precise way of modelling and understanding the vagaries of the universe. The more teachers can use language that implies modelling rather than rules, the better. It is common, but not strictly correct to say, “This process follows a normal distribution”. As Einstein famously and enigmatically said, “God does not play dice”. Neither does God or nature use normal distribution values to determine the outcomes of natural processes. It is better to say, “this process is usefully modelled by the normal distribution.”

We can have learning experiences that help students to appreciate certainty and uncertainty and the modelling of probabilities that are not equi-probable. Thanks to the overuse of dice and coins, it is too common for people to assess things as having equal probabilities. And students need to use experiments.  First they need to appreciate that it can take a large number of observations before we can be happy that it is a “good” model. Secondly they need to use experiments to attempt to model an otherwise unknown probability distribution. What fun can be had in such a class!

But, oh mathematical ones, do not despair – the rules are still the same, it’s just the vigour with which we state them that has changed.

Comment away!

Post Script

In case anyone is interested, here are the outcomes which now have a probability of 1, as they have already occurred.

  • I will hear a bird on the way to work? Almost the minute I walked out the door!
  • the flight home will be safe? Inasmuch as I am in one piece, it was safe.
  • it will be raining when I get to Christchurch? No it wasn’t
  • I will get a raisin in my first spoonful of muesli? I did
  • I will get at least one raisin in half of my spoonfuls of muesli? I couldn’t be bothered counting.
  • the shower in my hotel room will be enjoyable? It was okay.
  • I will get a rare Lego minifigure next time I buy one? Still in the future!

Oh Ordinal data, what do we do with you?

What can you do with ordinal data? Or more to the point, what shouldn’t you do with ordinal data?

First of all, let’s look at what ordinal data is.

It is usual in statistics and other sciences to classify types of data in a number of ways. In 1946, Stanley Smith Stevens suggested a theory of levels of measurement, in which all measurements are classified into four categories, Nominal, Ordinal, Interval and Ratio. This categorisation is used extensively, and I have a popular video explaining them. (Though I group Interval and Ratio together as there is not much difference in their behaviour for most statistical analysis.)

Nominal is pretty straight-forward. This category includes any data that is put into groups, in which there is no inherent order. Examples of nominal data are country of origin, sex, type of cake, or sport. Similarly it is pretty easy to explain interval/ratio data. It is something that is measured, by length, weight, time (duration), cost and similar. These two categorisations can also be given as qualitative and quantitative, or non-parametric and parametric.

Ordinal data

But then we come to ordinal level of measurement. This is used to describe data that has a sense of order, but for which we cannot be sure that the distances between the consecutive values are equal. For example, level of qualification has a sense of order

  • A postgraduate degree is higher than
  • a Bachelor’s degree,which is higher than
  • a high-school qualification, which is higher
  • than no qualification.

There are four steps on the scale, and it is clear that there is a logical sense of order. However, we cannot sensibly say that the difference between no qualification and a high-school qualification is equivalent to the difference between the high-school qualification and a bachelor’s degree, even though both of those are represented by one step up the scale.

Another example of ordinal level of measurement is used extensively in psychological, educational and marketing research, known as a Likert scale. (Though I believe the correct term is actually Likert item – and according to Wikipedia, the pronunciation should be Lick it, not Like it, as I have used for some decades!). A statement is given, and the response is given as a value, often from 1 to 5, showing agreement to the statement. Often the words “Strongly agree, agree, neutral, disagree, strongly disagree” are used. There is clearly an order in the five possible responses. Sometimes a seven point scale is used, and sometimes the “neutral” response is eliminated in an attempt to force the respondent to commit one way or the other.

The question at the start of this post has an ordinal response, which could be perceived as indicating how quantitative the respondent believes ordinal data to be.

What prompted this post was a question from Nancy under the YouTube video above, asking:

“Dr Nic could you please clarify which kinds of statistical techniques can be applied to ordinal data (e.g. Likert-scale). Is it true that only non-parametric statistics are possible to apply?”


As shown in the video, there are the purists, who are adamant that ordinal data is qualitative. There is no way that a mean should ever be calculated for ordinal, data, and the most mathematical thing you can do with it is find the median. At the other pole are the practical types, who happily calculate means for any ordinal data, without any concern for the meaning (no pun intended.)

There are differing views on finding the mean for ordinal data.

There are differing views on finding the mean for ordinal data.

So the answer to Nancy would depend on what school of thought you belong to.

Here’s what I think:

All ordinal data is not the same. There is a continuum of “ordinality” if you like.

There are some instances of ordinal data which are pretty much nominal, with a little bit of order thrown in. These should be distinguished from nominal data, only in that they should always be graphed as a bar chart (rather than a pie-chart)* because there is inherent order. The mode is probably the only sensible summary value other than frequencies. In the examples above, I would say that “level of qualification” is only barely ordinal. I would not support calculating a mean for the level of qualification. It is clear that the gaps are not equal, and additionally any non-integer result would have doubtful interpretation.

Then there are other instances of ordinal data for which it is reasonable to treat it as interval data and calculate the mean and median. It might even be supportable to use it in a correlation or regression. This should always be done with caution, and an awareness that the intervals are not equal.

Here is an example for which I believe it is acceptable to use the mean of an ordinal scale. At the beginning and the end of a university statistics course, the class of 200 students is asked the following question: How useful do you think a knowledge of statistics is will be to you in your future career? Very useful, useful, not useful.

Now this is not even a very good Likert question, as the positive and negative elements are not balanced. There are only three choices. There is no evidence that the gaps between the elements are equal. However if we score the elements as 3,2 and 1, respectively and find that the mean for the 200 students is 1.5 before the course, and 2.5 after the course, I would say that there is meaning in what we are reporting. There are specific tests to use for this – and we could also look at how many students changed their minds positively or negatively. But even without the specific test, we are treating this ordinal data as something more than qualitative. What also strengthens the evidence for doing this is that the test is performed on the same students, who will probably perceive the scale in the same way each time, making the comparison more valid.

So what I’m saying is that it is wrong to make a blanket statement that ordinal data can or can’t be treated like interval data. It depends on meaning and number of elements in the scale.

What do we teach?

And again the answer is that it depends! For my classes in business statistics I told them that it depends. If you are teaching a mathematical statistics class, then a more hard line approach is justified. However, at the same time as saying, “you should never calculate the mean of ordinal data”, it would be worthwhile to point out that it is done all the time! Similarly if you teach that it is okay to find the mean of some ordinal data, I would also point out that there are issues with regard to interpretation and mathematical correctness.

Please comment!

Foot note on Pie charts

*Yes, I too eschew pie-charts, but for two or three categories of nominal data, where there are marked differences in frequency, if you really insist, I guess you could possibly use them, so long as they are not 3D and definitely not exploding. But even then, a barchart is better. – perhaps a post for another day, but so many have done this.