# Why resampling is better than hypothesis tests and confidence intervals

The author, with kiwi, visiting Geroge Cobb in his office at Mt Holyoke in April 2008. (Not very flattering for either of the humans)

I love to read George Cobb’s writing. In person I found him a kind and intelligent man, and a generous host. But his writing is laugh-out-loud funny at times, provocative and inspiring. I suspect that he may be the indirect cause of near civil unrest among maths teachers in New Zealand. The link between George Cobb and mathematics teachers’ concerns is resampling. I will provide some background, then explain.

The New Zealand curriculum is divided into eight learning areas, one of which is called Mathematics and Statistics. The separate acknowledgement of Statistics, which I believe occurred in 2007, is indicative of the status which statistics is now afforded in the curriculum. It also sends the message that the subject of statistics is not simply a part of mathematics, but is its own discipline. This has met with approval from statisticians, and mixed reception from mathematicians, some of whom would still like to keep statistics firmly tucked in as a minor bedfellow of algebra and trigonometry. Regular readers will be aware of my feelings on this. They are expressed in the posts Hey mathematics, leave the stats alone and last week’s offering What mathematics teachers need to know about statistics. How I came to these views, from maths teacher, to Operations Researcher, to statistics educator are described in another post, the End of OR at UC.

Along with the change in title to Mathematics and Statistics, has come a new approach to the study of statistics at all levels of schooling. In the final year of schooling, there are now enough assessment items to provide a separate subject called Statistics. This overcomes the odd conflation, “Mathematics with Statistics” which evolved to “Statistics and Modelling”, and now has cast off any vestiges of Operations Research (sniff) to stand proudly as “Statistics”. Serious students of mathematics need to take both subjects. In the same way that Science at year 11 becomes three subjects: Biology, Physics and Chemistry in Year 12, Mathematics in Year 12 now splits into two subjects: Mathematics and Statistics at Year 13.

The most difficult aspects of the curriculum changes are experimental design, critiquing reports, and resampling. The rest of this post will address resampling.

# Cast off the t-test

George Cobb, in his article, “The Introductory Statistics Course: A Ptolemaic Curriculum” provides an overview of the problems of inferential statistics and how technological advances have freed us from the shackles of assuming normality, or making myriad corrections to allow for non-normality. But though the shackles are open, it is difficult to cast them aside.  Cobb points out that the t-test is the centerpiece of the introductory statistics curriculum because that is what scientists and social scientists use most often. Scientists and social scientists use t-tests most often because that is what they were taught in introductory statistics courses.

Cobb’s argument is that we are living with the legacy of lack of computational power. Until the advent of computers there was no choice but to use analytic methods, as computation was impossible. In operations research we see that neat little work-arounds and approximations to reduce computational time are no longer needed as computers become increasingly powerful. In statistical analysis the normal distribution was used to approximate the true distribution because anything else was prohibitively difficult to compute. This is no longer the case. Elegance, which is desirable in pure mathematics, has no place in the dirty world of statistics and real data. Cobb states, “We need to throw away the old notion that the normal approximation to a sampling distribution belongs at the center of our curriculum, and create a new curriculum whose center is the core logic of inference.”

These are fighting words. And it seems that the New Zealand curriculum is the first to take up the challenge. The University of Auckland has provided computational tools to enable resampling, with supporting materials. Thanks to iNZight it is possible for all students to take repeated samples and explore the outcomes without the burden of repeated hand calculation. The graphical displays enable understanding further.

## Dumbing down or more appropriate

Mathematics teachers are concerned that the resampling approach is a “dumbing down” of the curriculum. It does seem that way at first – that we are leaving the “difficult” material of proper confidence intervals to university. However, the intention is that students will actually understand what inference is about, which will make the learning of the traditional methods (now also automated) almost trivial. I don’t have a problem with confidence intervals and p-values as they are. They are pretty easy to compute. I do see a problem with an entire exam section which simply required students to select the correct formula, plug in the values and give the result. I am happy to concede that the computational and mathematical requirements in the new statistics curriculum are reduced. But that is because the subject is statistics, not mathematics, and other skills are used and developed. The aim is to develop statistical literacy, reasoning and thinking.

Teachers have expressed that traditional statistics is more rigorous than the resampling method. Because traditional statistics encompasses formulas and proofs this SEEMS more rigorous and correct. But they are wrong! Using the analytical methods gives us deceptively exact answers to what are often seriously flawed models. Fisher himself in 1936 explained that the analytical method is used because the simple and tedious method of resampling was not possible. (See the link to the Cobb paper above)  I can see why teachers might think traditional methods are preferable as maths teachers are seldom statisticians. This is why there is a national curriculum, so that decisions about what students learn are not reliant on the knowledge of one individual. You may notice that my videos generally teach the traditional ideas of the p-value and confidence intervals. I am a recent convert to resampling. (Perhaps with the usual evangelistic zeal of a new convert)

In “Developing Students’ Statistical Reasoning: connecting research and teaching practice”, Garfield and Ben-Zvi suggest that “ideas of informal inference are introduced early in the course, and revisited with growing complexity”. This is what will be happening year by year in the New Zealand setting, if the teachers are given enough support to do enact the curriculum.

# What is resampling/randomisation/bootstrapping?

Cobb summarises resampling as “three R’s: randomize, repeat, reject. Randomize data production; repeat by simulation to see what’s typical and what’s not; reject any model that puts your data in its tail.”

In essence you use the sample data to take large numbers of random samples and examine the behaviour of these samples. From there you can see the likelihood of getting a result such as the original as a matter of chance (similar to a p-value). You can also use multiple samples (taken with replacement) to create confidence intervals. It seems too simple to be true. But it is a better approach than to use a flawed approximation provided by regular statistical analysis. There is time enough to learn that later on when you want to be published in an academic journal. Once students truly understand inference, learning other techniques will be more straight-forward, and one hopes they will have enough understanding to be critical of them.

# How do you teach resampling?

Good question. I would begin with a small example which can be started by hand and then finished off with a simulation. There is a nice little one in the Cobb paper cited earlier. Then work through several more examples using quite different contexts. I would use the iNZight programs or Excel, depending on the nature of the problem. With a class you can get quite a few iterations of a small problem in a reasonably short time. I’m not a great believer in homework for the sake of it (a story for another day) but getting students to hand iterate an experiment a few times at home sounds ideal. There are materials with suggestions on the Census at School site. There doesn’t seem to be much on the TKI site, however (on Sept 2012). Let us hope that there will be more soon for teachers who are planning for the 2013 school year.

I’m excited, and I have already written too much for one day. We are at the start of a wonderful adventure in teaching and curriculum development, and yet again New Zealand is leading the world. I hope I can help to make it happen.

By the way  – please comment – I can’t be getting it right all the time, and dissent is important!

## 16 thoughts on “Why resampling is better than hypothesis tests and confidence intervals”

1. Hello – can you check the link for the Cobb papper that you cited? It seems to be broken. Thanks

• Sorry about that. It should be fixed now.

• Thanks Sandra. I truly sympathise with teachers and would have been anxious in their position too. I like to think teachers have the best interests of students at heart. Hopefully more information will help to allay concerns.

2. Your title is utter nonsense.

3. I have just been referred to your articles by a Maths adviser after struggling to see why we were throwing out confidence intervals based on the central limit theorem. What you have said now make the whole change make sense but why have I not heard about the REASONS for the changes before now? It seems to me that there has been a real lack of good communication to Statistics teachers around NZ not in centres like Auckland. Only yesterday I was at a best practice workshop where teachers of Statistics in the Nelson region were being briefed by moderators about the requirements of the new Level 3 standards and not one word was said by the moderators about WHY these changes were coming in – hence much frustration and even anger – not that we were unwilling to change but that no-one had explained to us WHY these changes were required. Thank you for your articles and links which I have spent today reading. I will be forwarding your articles as widely as I can.
John

• Thanks John. It is so good to hear that my blog is helping. There does seem to be a lack of communication. Let me know if there is anything else you would like covered.

4. Resampling does sound like a gentle way to introduce inference. I hope you give them a peep at a small sample situation before they finish though. eg 6/6 successes in a trial.

5. Dr Nic: Resampling, when it works, can* be a great way to illuminate sampling theory statistics, or what I prefer to call “error statistics”, and I’m all for it, but it is my understanding that it is the assumption of iid or random sampling that is crucial, not Normality. Bootstrapping won’t work without large enough random samples, so how do you check for that? You mention Cobb’s advice: “Randomize, repeat, reject. Randomize data production; repeat by simulation to see what’s typical and what’s not; reject any model that puts your data in its tail,” but would Cobb advocate replacing the model with one that “fits” the data? That can be a disaster unless the new model is checked for statistical adequacy—and even so, there are numerous “new models” that can be developed to supposedly “fix” the mismatch. And what counts as “misfitting”? how far in the tail? Maybe we should instead report discrepancies from the model. I am reflecting what I have learned from David Cox and separately from my colleague Aris Spanos.

I think it is quite misleading to limit your “downsides” to things like “old fogys” without mentioning serious downsides. We have tests of assumptions, both parametric and non.

*I say “can”, but I haven’t really seen the philosophical basis of bootstrapping adequately explained in a way I would consider adequate (as a philosopher of statistics). I hope to provide this in a book I’m writing on the error statistical philosophy more generally.

I very much like your blog which I try to keep up with!

• Hi. Thanks for commenting. It is really good to get all views here. Some of what you said is not clear to me, probably because I am not really a statistician, but rather a statistics communicator or teacher, having come from an operations research background. I’m happy for statisticians with a more theoretical grounding to provide contrasting viewpoints.
In New Zealand, the school curriculum has moved to bootstrapping, which we hope will lead to a better understanding of the process of inference. It can hardly lead to a worse understanding! Inference is tricky.
I find the contrast between blogging and writing academic papers is extreme and fun!

• Mine isn’t a contrasting view. I share the sampling theory view. My points get to the heart of bootstrapping, and you need to be able to address them if you are going to “communicate” the methods honestly. Even stick to the two main issues: (1) the assumption of random sampling, and (2) what do you do if you’re in the tail of the distribution and so are supposed to “reject” the model. Ignoring the underlying reasoning is why comprehension is lacking. Do not assume that “it can hardly lead to a worse understanding”, because for starters, keeping it vague allows all manner of methods to claim to be valid. That is what we now see. I’m surprised with your response, and perplexed that you’re not more concerned, Dr. Nic, especially as a “communicator”.