Confidence intervals are needed because there is variation in the world. Nearly all natural, human or technological processes result in outputs which vary to a greater or lesser extent. Examples of this are people’s heights, students’ scores in a well written test and weights of loaves of bread. Sometimes our inability or lack of desire to measure something down to the last microgram will leave us thinking that there is no variation, but it is there. For example we would check the weights of chocolate bars to the nearest gram, and may well find that there is no variation. However if we were to weigh them to the nearest milligram, there would be variation. Drug doses have a much smaller range of variation, but it is there all the same.
You can see a video about some of the main sources of variation – natural, explainable, sampling and due to bias.
When we wish to find out about a phenomenon, the ideal would be to measure all instances. For example we can find out the heights of all students in one class at a given time. However it is impossible to find out the heights of all people in the world at a given time. It is even impossible to know how many people there are in the world at a given time. Whenever it is impossible or too expensive or too destructive or dangerous to measure all instances in a population, we need to take a sample. Ideally we will take a sample that gives each object in the population an equal likelihood of being chosen.
You can see a video here about ways of taking a sample.
When we take a sample there will always be error. It is called sampling error. We may, by chance, get exactly the same value for our sample statistic as the “true” value that exists in the population. However, even if we do, we won’t know that we have.
The sample mean is the best estimate for the population mean, but we need to say how well it is estimating the population mean. For example, say we wish to know the mean (or average) weight of apples in an orchard. We take a sample and find that the mean weight of the apples in the sample is 153g. If we only took a few apples, it is only a rough idea and we might say we are pretty sure the mean weight of the apples in the orchard is between 143g and 163g. If someone else took a bigger sample, they might be able to say that they are pretty sure that the mean weight of apples in the orchard is between 158g and 166g. You can tell that the second confidence interval is giving us better information as the range of the confidence interval is smaller.
There are two things that affect the width of a confidence interval. The first is the sample size. If we take a really large sample we are getting a lot more information about the population, so our confidence interval will be more exact, or smaller. It is not a one-to-one relationship, but a square-root relationship. If we wish to reduce the confidence interval by a factor of two, we will need to increase our sample size by a factor of 4.
The second thing to affect the width of a confidence interval is the amount of variation in the population. If all the apples in the orchard are about the same weight, then we will be able to estimate that weight quite accurately. However, if the apples are all different sizes, then it will be harder to be sure that the sample represents the population, and we will have a larger confidence interval as a result.
Three ways to find confidence intervals
Traditional (old-fashioned?) Approach
The standard way of calculating confidence intervals is by using formulas developed on the assumptions of normality and the Central Limit Theorem. These formulas are used to calculate the confidence intervals of means, proportions and slopes, but not for medians or standard deviations. That is because there aren’t nice straight-forward formulas for these. The formulas were developed when there were no computers, and analytical methods were needed in the absence of computational power.
In terms of teaching, these formulas are straight-forward, and also include the concept of level of confidence, which is part of the paradigm. You can see a video teaching the traditional approach to confidence intervals, using Excel to calculate the confidence interval for a mean.
Rule of Thumb
In the New Zealand curriculum at year 12, students are introduced to the concept of inference using an informal method for calculating a confidence interval. The formula is median +/- 1.5 times the interquartile range divided by the square-root of the sample size. There is a similar formula for proportions.
Bootstrapping is a very versatile way to find a confidence interval. It has three strengths:
- It can be used to calculate the confidence interval for a large range of different parameters.
- It uses ALL the information the sample gives us, rather than the summary values
- It has been found to aid in understanding the concepts of inference better than the traditional methods.
There are also some disadvantages
- Old fogeys don’t like it. (Just kidding) What I mean is that teachers who have always taught using the traditional approach find it difficult to trust what seems like a hit-and-miss method without the familiar theoretical underpinning.
- Universities don’t teach bootstrapping as much as the traditional methods.
- The common software packages do not include bootstrap confidence intervals.
The idea behind a bootstrap confidence interval is that we make use of the whole sample to represent the population. We take lots and lots of samples of the same size from the original sample. Obviously we need to sample with replacement, or the samples would all be identical. Then we use these repeated samples to get an idea of the distribution of the estimates of the population parameter. We chop the tails off at a given point, and we give the confidence interval. Voila!
Answers to the disadvantages (burn the straw man?)
- There is a sound theoretical underpinning for bootstrap confidence intervals. A good place to start is a previous blog about George Cobb’s work. Either that or – “Trust me, I’m a Doctor!” (This would also include trusting far more knowledgeable people such as Chris Wild and Maxine Pfannkuch, and the team of statistical educators led by Joan Garfield.
- We have to start somewhere. Bootstrap methods aren’t used at universities because of inertia. As an academic of twenty years I can say that there is NO PAY OFF for teaching new stuff. It takes up valuable research time and you don’t get promoted, and sometimes you even get made redundant. If students understand what confidence intervals are, and the concept of inference, then learning to use the traditional formulas is trivial. Eventually the universities will shift. I am aware that the University of Auckland now teaches the bootstrap approach.
- There are ways to deal with the software package problem. There is a free software interface called “iNZight” that you can download. I believe Fathom also uses bootstrapping. There may be other software. Please let me know of any and I will add them to this post.
Confidence intervals involve the concepts of variation, sampling and inference. They are a great way to teach these really important concepts, and to help students be critical of single value estimates. They can be taught informally, traditionally or using bootstrapping methods. Any of the approaches can lead to rote use of formula or algorithm and it is up to teachers to aim for understanding. I’m working on a set of videos around this topic. Watch this space.