The Central Limit Theorem – with Dragons

To quote Willy Wonka, “A little magic now and then is relished by the best of men [and women].” Any frequent reader of this blog will know that I am of a pragmatic nature when it comes to using statistics. For most people the Central Limit Theorem can remain in the realms of magic. I have never taught it, though at times I have waved my hands past it.

Sometimes you don’t need to know.

Students who want that sort of thing can read about it in their textbooks or look it up online. The New Zealand school curriculum does not include it, as I explained in 2012.

But – there are many curricula and introductory statistics courses that include The Central Limit Theorem, so I have chosen to blog about it, in preparation to making a video. In this post I will cover what the Central Limit does. Maybe my approach will give ideas to teachers on how they might teach it.

Sampling distribution of a mean

First let me explain what a sampling distribution is. (And let me add the term to Dr Nic’s long list of statistics terms that cause unnecessary confusion.) A sampling distribution of a mean is the distribution of the means of samples of the same size taken from the same population. The distribution of the means will be different from the distribution of values in the original population.  The Central Limit Theorem tells us useful things about the sampling distribution and its relationship to the distribution of the values in the population.

Example using dragons

We have a population of 720 dragons, and each dragon has a strength value of 1 to 8. The distribution of the strengths goes from 1 to 8 and has a population mean somewhere around 4.5. We take a sample of four dragons from the population. (Dragons are difficult to catch and measure so it will just be 4.)

We find the mean. Then we think about what other values we might have got for samples that size. In real life, that is all we can do. But to understand what is happening, we will take multiple samples using cards, and then a spreadsheet, to explore what happens.

Important aspects of the Central Limit Theorem

Aspect 1: The sampling distribution will be less spread than the population from which it is drawn.

Dragon example

What do you think is the largest value the mean strength of the four dragons will take? Theoretically you could have a sample of four dragons, each with strength of 8, giving us a sample mean of 8. But it isn’t very likely. The chances that all four values are greater than the mean are pretty small.  (It’s about a 6% chance). If there are equal numbers of dragons with each strength value, then the probability of getting all four dragons with strength 8 is 0.0002.

So already we have worked out that the distribution of the sample means is going to be less spread than the distribution of the original population.

Aspect 2: The sampling distribution will be well-modelled by a normal distribution.

Now isn’t that amazing – and really useful! And even more amazing, it doesn’t even matter what the underlying population distribution is, the sampling distribution will still (in most cases) look like a normal distribution.

If you think about it, it does make sense. I like to see practical examples – so here is one!

Dragon example

We worked out that it was really unlikely to get a sample of four dragons with a mean strength of 8. Similarly it is really unlikely to get a sample of four dragons with a mean strength of 1.
Say we assumed that the strength of dragons was uniform – there are equal numbers of dragons with each of the strengths. Then we find out all the possible combinations of strengths from samples of 4 dragons. Bearing in mind there are eight different strengths, that gives us 8 to the power of 4 or 4096 possible combinations. We can use a spreadsheet to enumerate all these equally likely combinations. Then we find the mean strength and we get this distribution.

Or we could take some samples of four dragons and see what happens. We can do this with our cards, or with a handy spreadsheet, and here is what we get.

Four samples of four dragons each

The sample mean values are 4.25, 5.25, 4.75 and 6. Even with really small samples we can see that the values of the means are clustering around some central point.

Here is what the means of 1000 samples of size 4 look like:

And hey presto – it resembles a normal distribution! By that I mean that the distribution is symmetric, with a bulge in the middle and tails in either direction. A normal distribution is useful for modelling just about anything that is the result of a large number of change effects.

The bigger the sample size and the more samples we take, the more the distribution of the means (the sampling distribution) looks like a normal distribution. The Central Limit Theorem gives mathematical explanation for this. I put this in the “magic” category unless you are planning to become a theoretical statistician.

Aspect 3: The spread of the sampling distribution is related to the spread of the population.

If you think about it, this also makes sense. If there is very little variation in the population, then the sample means will all be about the same.  On the other hand, if the population is really spread out, then the sample means will be more spread out too.

Dragon example

Say the strengths of the dragons occur equally from 1 to 5 instead of from 1 to 8. The spread of the means of teams of four dragons are going to go from 1 to 5 also, though most of the values will be near the middle.

Aspect 4: Bigger samples lead to a smaller spread in the sampling distribution.

As we increase the size of the sample, the means become less varied. We reduce the effect of one extreme value. Similarly the chance of getting all high values in our sample or all low values gets smaller and smaller. Consequently the spread of the sample means will decrease. However, the reduction is not linear. By that I mean that the effect achieved by adding one more to the sample decreases, depending on how big the sample is in the first place. Say you have a sample of size n = 4, and you increase it to n = 5, that is a 25% increase in information. If you have a sample n = 100 and increase it to size n=101, that is only a 1% increase in information.

Now here is the coolest thing! The spread of the sampling distribution is the standard deviation of the population, divided by the square root of the sample size. As we do not know the standard deviation of the population (σ), we use the standard deviation of the sample (s) to approximate it. The spread of the sampling distribution is usually called the standard error, or s.e.

 

Implications of the Central Limit Theorem

The properties listed above underpin most traditional statistical inference. When we find a confidence interval of a mean, we use the standard error in the formula. If we used the sample standard deviation we would be finding the values between which most of the values in the sample lie. By using the standard error, we are finding the values between which most of the sample means lie.

Sample size

The Central Limit Theorem applies best with large samples. A rule of thumb is that the sample should be 30 or more. For smaller samples we need to use the t distribution rather than the normal distribution in our testing or confidence intervals. If the sample is very small, such as less than 15, then we can still use the t-distribution if the underlying population has a normal shape. If the underlying population is not normal, and the sample is small, then other methods, such as resampling should be used, as the Central Limit Theorem does not hold.

Reminder!

We do not take multiple samples of the same population in real life. This simulation is just that – a pretend example to show how the Central Limit Theorem plays out. When we undergo inferential statistics we have one sample, and from that we use what we know about it to make inferences about the population from which it is drawn.

Teaching suggestion

Data cards are extremely useful tools to help understand sampling and other aspects of inference. I would suggest getting the class to take multiple small samples(n=4), using cards, and finding the means. Plot the means. Then take larger samples (n=9) and similarly plot the means. Compare the shape and spread of the distributions of the means.

The Dragonistics data cards used in this post can be purchased at The StatsLC shop.

Advertisements

The Central Limit Theorem: To teach or not to teach

The question of whether to teach explicitly the Central Limit Theorem seems to divide instructors along philosophical lines. Let us look first at these lines.

There are at least three different areas of activity within the discipline of statistics. These are

  • Theory of statistics and research into statistics
  • Practice of statistics
  • Teaching statistics and related research

Theory and research in statistics

The theory of statistics is mathematical. It is taught and practised in Mathematics and Statistics Departments of Universities. It is possible to be an expert on the theory and mathematics of statistics while having little contact with real data. The theory provides underpinnings to the practice of statistics. It is vital that some people know this – but not most of us. One would hope that people employed as statisticians would have a sound understanding of both the theoretical and applied aspects of statistics. This relates strongly to the research into statistics, which seems to be very mathematical, from my perusal of journals. This research advances the theory and use of statistical methods and philosophy.

Practice of statistics

The practice of statistics occurs in many, many areas, particularly in universities. Most postgraduate courses require some proficiency in the application of statistical methods. Researchers in areas as diverse as psychology, genetics, market research, education, geography, speech therapy, physiotherapy, mechanics, management, economics and medicine all use statistical methods. Some researchers have a deep understanding of the theory of statistics, but most aim to be safe and competent practitioners. When they get to the tricky bits they know to ask a statistician, but most of the day-to-day data generation, collection and analysis is within their capability.

Teaching of statistics and related research

Then there is the teaching of statistics. The level of applicability and theory taught will depend on the context. An instructor in statistics (in a non-service course) in a Department of Mathematics would tend towards the mathematical aspects, as that is most appropriate to the audience. However in just about every other setting the emphasis will be on the practical aspects of data collection and inference. This treatment of statistics is explicable, accessible and interesting to just about anyone, whereas only the mathematically inclined are likely to get excited about the theory of statistics.

There is another growing area, which is the research into the teaching and learning of statistics. This informs and is informed by the other areas, as well as general educational research and cognitive psychology. Much of my thinking comes from this background. An overview of some of the material relating to college level can be found in this literature review. The general topic of How Students Learn Statistics is introduced in this early paper by Joan Garfield (1995), a leader in the field of statistics education research.

Statistics in the school curriculum

Statistics is gradually making its way into the school curriculum internationally, and in New Zealand has become a separate subject in the final year of schooling. There are philosophical issues arising as most of the teachers of statistics are mathematicians, and some tend towards the beauty and elegance of the formulas, proofs etc. The aim of the curriculum, however, is more towards statistical investigations and statistical literacy. There are fuzzy, dirty, ambiguous, context driven explorations with sometimes extensive write-ups. There is discussion and critique of statistical reports. There are experiments which may or may not produce usable results. Some of this is well into the realms of social science and well away from what mathematicians find appealing or even comfortable. In another life I can hear myself saying, “I didn’t become a maths teacher to mark essay questions!” There is a bit of a mismatch between the skill-set and attitudes of the teachers and the curriculum.

Teaching the Central Limit Theorem

One place where this is particularly evident is in the question of teaching the Central Limit Theorem. Mathematicians like the Central Limit Theorem and it seems that they like to teach it. One teacher states “The fact that the CLT is to be de-emphasised in Yr 13 is a major disappointment to me…” This statement prompted this post. I agree that the CLT is neat. It is really handy. And it makes confidence interval calculation almost trivial. There are cool little exercises you can do to illustrate it. It is the backbone of traditional statistical theory.

However, teaching and learning do not always go hand in hand. I wonder how many students really do internalise the Central Limit Theorem. Evidence says not many. Chance, Delmas and Garfield, in “The challenge of developing statistical literacy reasoning and thinking” (Ben Zvi and Garfield 2004) state: “Sampling distributions is a difficult topic for students to learn. A complete understanding of sampling distributions requires students to integrate and apply several concepts from different parts of a statistics course and to be able to reason about the hypothetical behavior of many samples – a distinct, intangible thought process for most students. The Central Limit Theorem provides a theoretical model of the behavior of sampling distributions, but students often have difficulty mapping this model to applied contexts. As a result students fail to develop a deep understanding of the concept of sampling distributions and therefore often develop only a mechanical knowledge of statistical inference. Students may learn how to compute confidence intervals and carry out tests of significance, but the are not able to understand and explain related concepts, such as interpreting a p-value.”

I have a confession to make. I didn’t teach the Central Limit Theorem. It never seemed as if it were going to help my students understand what was going on. For a few years I made them do a little simulation exercise which helped them to see why the square-root of n occurred in the denominator of the formula for the standard error. That was fun and seemed to help. But the words “Central Limit Theorem” seldom passed my lips in my twenty years of instruction.

What has helped immeasurably have been videos, beginning with “Understanding the p-value” and plenty of different examples and exercises using confidence intervals and hypothesis tests. (Another confession – I taught traditional statistical inference, not resampling. My excuse was that I didn’t know any better, and I had to stay in parallel with the course provided by the maths department.) What I have found from my own experience as a learner and as a teacher is that students learn to understand statistics by DOING statistics.

Definition of the Central Limit Theorem

The Central Limit Theorem states that regardless of the shape of the population distribution, the distribution of sample means is normal if the sample size is large. This was a really brilliant model for when simulation and resampling was impossible. The Central Limit Theorem makes it possible to calculate confidence intervals for population means from sample data. It is the reason why most statistical procedures either assume normality at some point, or take steps to correct for the lack thereof. (See the paper by Cobb I referred to extensively in last week’s post.)

In a curriculum that develops from informal inference to formal inference using resampling, there is no need to call on the Central Limit Theorem. With resampling we use the distribution of the sample as the best estimate of the distribution of the population. True, it is quicker to use the old method of plug the values in the formula. However it isn’t much quicker than using the free iNZight software for resampling.

At high school level we want students to get an understanding of what inference is. (I would suggest my Pinkie Bar lesson as a good way of introducing the rejection part of Cobbs mantra, Randomise, Repeat, Reject.) I’m not convinced that teaching the Central Limit Theorem, and formula-based Confidence intervals for means and proportions lead to understanding. Research suggests that it doesn’t. I agree that statistical theorists, and educators and researchers should all understand the Central Limit Theorem. I just don’t think that it has a vital place in an innovative curriculum based on resampling.

Concern for students

I suspect that teachers fear that if their students are not taught the Central Limit Theorem and traditional confidence intervals at high school they will be at a disadvantage at university. I’d like to reassure them that it just isn’t true. All first year university statistics courses that I know of assume no prior knowledge of statistics. (The same is true of some second year courses as well!) The greatest gift a high school statistics teacher can give their students is an attitude of excitement and success, with a healthy helping of scepticism, and an idea of what inference is – that we can draw conclusions about a population from a sample. If my first year students had started from that point, half our work would have been done.