Divide and destroy in statistics teaching

A reductionist approach to teaching statistics destroys its very essence

I’ve been thinking a bit about systems thinking and reductionist thinking, especially with regard to statistics teaching and mathematics teaching. I used to teach a course on systems thinking, with regard to operations research. Systems thinking is concerned with the whole. The parts of the system interact and cannot be isolated without losing the essence of the system. Modern health providers and social workers realise that a child is a part of a family, which may be a part of a larger community, all of which have to be treated if the child is to be helped. My sister, a physio, always finds out about the home background of her patient, so that any treatment or exercise regime will fit in with their life. Reductionist thinking, by contrast, reduces things to their parts, and isolates them from their context.

Reductionist thinking in teaching mathematics

Mathematics teaching lends itself to reductionist thinking. You strip away the context, then break a problem down into smaller parts, solve the parts, and then put it all back together again. Students practise solving straight-forward problems over and over to make sure they can do it right. They feel that a column of little red ticks is evidence that they have learned something correctly. As a school pupil, I loved the columns of red ticks. I have written about the need for drill in some aspects of statistics teaching and learning, and can see the value of automaticity – or the ability to answer something without having to think too hard. That can be a little like learning a language – you need to be automatic on the vocabulary and basic verb structures. I used to spend my swimming training laps conjugating Latin verbs – amo, amas, amat (breathe), amamus, amatis, amant (breathe). I never did meet any ancient Romans to converse with, to see if my recitation had helped any, but five years of Latin vocab is invaluable in pub quizzes. But learning statistics has little in common with learning a language.

There is more to teaching than having students learn how to get stuff correct. Learning involves the mind, heart and hands. The best learning occurs when students actually want to know the answer. This doesn’t happen when context has been removed.

I was struck by Jo Boaler’s, “The Elephant in the Classroom”, which opened my eyes to how monumentally dull many mathematics lessons can be to so many people. These people are generally the ones who do not get satisfied by columns of red ticks, and either want to know more and ask questions, or want to be somewhere else. Holistic lessons, that involve group work, experiential learning, multiple solution methods and even multiple solutions, have been shown to improve mathematics learning and results, and have lifelong benefits to the students. The book challenged many of my ingrained feelings about how to teach and learn mathematics.

Teach statistics holistically, joyfully

Teaching statistics is inherently suited for a holistic approach. The problem must drive the model, not the other way around. Teachers of mathematics need to think more like teachers of social sciences if they are to capture the joy of teaching and learning statistics.

At one time I was quite taken with an approach suggested for students who are struggling, which is to go step-by-step through a number of examples in parallel and doing one step, before moving on to the next step. The examples I saw are great, and use real data, and the sentences are correct. I can see how that might appeal to students who are finding the language aspects difficult, and are interested in writing an assignment that will get them a passing grade. However I now have concerns about the approach, and it has made me think again about some of the resources we provide at Statistics Learning Centre. I don’t think a reductionist approach is suitable for the study of statistics.

Context, context, context

Context is everything in statistical analysis. Every time we produce a graph or a numerical result we should be thinking about the meaning in context. If there is a difference between the medians showing up in the graph, and reinforced by confidence intervals that do not overlap, we need to be thinking about what that means about the heart-rate in swimmers and non-swimmers, or whatever the context is. For this reason every data set needs to be real. We cannot expect students to want to find real meaning in manufactured data. And students need to spend long enough in each context in order to be able to think about the relationship between the model and the real-life situation. This is offset by the need to provide enough examples from different contexts so that students can learn what is general to all such models, and what is specific to each. It is a question of balance.

Keep asking questions

In my effort to help improve teaching of statistics, we are now developing teaching guides and suggestions to accompany our resources. I attend workshops, talk to teachers and students, read books, and think very hard about what helps all students to learn statistics in a holistic way. I do not begin to think I have the answers, but I think I have some pretty good questions. The teaching of statistics is such a new field, and so important. I hope we all keep asking questions about what we are teaching, and how and why.

Don’t teach significance testing – Guest post

The following is a guest post by Tony Hak of Rotterdam School of Management. I know Tony would love some discussion about it in the comments. I remain undecided either way, so would like to hear arguments.

GOOD REASONS FOR NOT TEACHING SIGNIFICANCE TESTING

It is now well understood that p-values are not informative and are not replicable. Soon null hypothesis significance testing (NHST) will be obsolete and will be replaced by the so-called “new” statistics (estimation and meta-analysis). This requires that undergraduate courses in statistics now already must teach estimation and meta-analysis as the preferred way to present and analyze empirical results. If not, then the statistical skills of the graduates from these courses will be outdated on the day these graduates leave school. But it is less evident whether or not NHST (though not preferred as an analytic tool) should still be taught. Because estimation is already routinely taught as a preparation for the teaching of NHST, the necessary reform in teaching will not require the addition of new elements in current programs but rather the removal of the current emphasis on NHST or the complete removal of the teaching of NHST from the curriculum. The current trend is to continue the teaching of NHST. In my view, however, teaching of NHST should be discontinued immediately because it is (1) ineffective and (2) dangerous, and (3) it serves no aim.

1. Ineffective: NHST is difficult to understand and it is very hard to teach it successfully

We know that even good researchers often do not appreciate the fact that NHST outcomes are subject to sampling variation and believe that a “significant” result obtained in one study almost guarantees a significant result in a replication, even one with a smaller sample size. Is it then surprising that also our students do not understand what NHST outcomes do tell us and what they do not tell us? In fact, statistics teachers know that the principles and procedures of NHST are not well understood by undergraduate students who have successfully passed their courses on NHST. Courses on NHST fail to achieve their self-stated objectives, assuming that these objectives include achieving a correct understanding of the aims, assumptions, and procedures of NHST as well as a proper interpretation of its outcomes. It is very hard indeed to find a comment on NHST in any student paper (an essay, a thesis) that is close to a correct characterization of NHST or its outcomes. There are many reasons for this failure, but obviously the most important one is that NHST a very complicated and counterintuitive procedure. It requires students and researchers to understand that a p-value is attached to an outcome (an estimate) based on its location in (or relative to) an imaginary distribution of sample outcomes around the null. Another reason, connected to their failure to understand what NHST is and does, is that students believe that NHST “corrects for chance” and hence they cannot cognitively accept that p-values themselves are subject to sampling variation (i.e. chance)

2. Dangerous: NHST thinking is addictive

One might argue that there is no harm in adding a p-value to an estimate in a research report and, hence, that there is no harm in teaching NHST, additionally to teaching estimation. However, the mixed experience with statistics reform in clinical and epidemiological research suggests that a more radical change is needed. Reports of clinical trials and of studies in clinical epidemiology now usually report estimates and confidence intervals, in addition to p-values. However, as Fidler et al. (2004) have shown, and contrary to what one would expect, authors continue to discuss their results in terms of significance. Fidler et al. therefore concluded that “editors can lead researchers to confidence intervals, but can’t make them think”. This suggests that a successful statistics reform requires a cognitive change that should be reflected in how results are interpreted in the Discussion sections of published reports.

The stickiness of dichotomous thinking can also be illustrated with the results of a more recent study of Coulson et al. (2010). They presented estimates and confidence intervals obtained in two studies to a group of researchers in psychology and medicine, and asked them to compare the results of the two studies and to interpret the difference between them. It appeared that a considerable proportion of these researchers, first, used the information about the confidence intervals to make a decision about the significance of the results (in one study) or the non-significance of the results (of the other study) and, then, drew the incorrect conclusion that the results of the two studies were in conflict. Note that no NHST information was provided and that participants were not asked in any way to “test” or to use dichotomous thinking. The results of this study suggest that NHST thinking can (and often will) be used by those who are familiar with it.

The fact that it appears to be very difficult for researchers to break the habit of thinking in terms of “testing” is, as with every addiction, a good reason for avoiding that future researchers come into contact with it in the first place and, if contact cannot be avoided, for providing them with robust resistance mechanisms. The implication for statistics teaching is that students should, first, learn estimation as the preferred way of presenting and analyzing research information and that they get introduced to NHST, if at all, only after estimation has become their routine statistical practice.

3. It serves no aim: Relevant information can be found in research reports anyway

Our experience that teaching of NHST fails its own aims consistently (because NHST is too difficult to understand) and the fact that NHST appears to be dangerous and addictive are two good reasons to immediately stop teaching NHST. But there is a seemingly strong argument for continuing to introduce students to NHST, namely that a new generation of graduates will not be able to read the (past and current) academic literature in which authors themselves routinely focus on the statistical significance of their results. It is suggested that someone who does not know NHST cannot correctly interpret outcomes of NHST practices. This argument has no value for the simple reason that it is assumed in the argument that NHST outcomes are relevant and should be interpreted. But the reason that we have the current discussion about teaching is the fact that NHST outcomes are at best uninformative (beyond the information already provided by estimation) and are at worst misleading or plain wrong. The point is all along that nothing is lost by just ignoring the information that is related to NHST in a research report and by focusing only on the information that is provided about the observed effect size and its confidence interval.

Bibliography

Coulson, M., Healy, M., Fidler, F., & Cumming, G. (2010). Confidence Intervals Permit, But Do Not Guarantee, Better Inference than Statistical Significance Testing. Frontiers in Quantitative Psychology and Measurement, 20(1), 37-46.

Fidler, F., Thomason, N., Finch, S., & Leeman, J. (2004). Editors Can Lead Researchers to Confidence Intervals, But Can’t Make Them Think. Statistical Reform Lessons from Medicine. Psychological Science, 15(2): 119-126.

This text is a condensed version of the paper “After Statistics Reform: Should We Still Teach Significance Testing?” published in the Proceedings of ICOTS9.

 

The Myth of Random Sampling

I feel a slight quiver of trepidation as I begin this post – a little like the boy who pointed out that the emperor has  no clothes.

Random sampling is a myth. Practical researchers know this and deal with it. Theoretical statisticians live in a theoretical world where random sampling is possible and ubiquitous – which is just as well really. But teachers of statistics live in a strange half-real-half-theoretical world, where no one likes to point out that real-life samples are seldom random.

The problem in general

In order for most inferential statistical conclusions to be valid, the sample we are using must obey certain rules. In particular, each member of the population must have equal possibility of being chosen. In this way we reduce the opportunity for systematic error, or bias. When a truly random sample is taken, it is almost miraculous how well we can make conclusions about the source population, with even a modest sample of a thousand. On a side note, if the general population understood this, and the opportunity for bias and corruption were eliminated, general elections and referenda could be done at much less cost,  through taking a good random sample.

However! It is actually quite difficult to take a random sample of people. Random sampling is doable in biology, I suspect, where seeds or plots of land can be chosen at random. It is also fairly possible in manufacturing processes. Medical research relies on the use of a random sample, though it is seldom of the total population. Really it is more about randomisation, which can be used to support causal claims.

But the area of most interest to most people is people. We actually want to know about how people function, what they think, their economic activity, sport and many other areas. People find people interesting. To get a really good sample of people takes a lot of time and money, and is outside the reach of many researchers. In my own PhD research I approximated a random sample by taking a stratified, cluster semi-random almost convenience sample. I chose representative schools of different types throughout three diverse regions in New Zealand. At each school I asked all the students in a class at each of three year levels. The classes were meant to be randomly selected, but in fact were sometimes just the class that happened to have a teacher away, as my questionnaire was seen as a good way to keep them quiet. Was my data of any worth? I believe so, of course. Was it random? Nope.

Problems people have in getting a good sample include cost, time and also response rate. Much of the data that is cited in papers is far from random.

The problem in teaching

The wonderful thing about teaching statistics is that we can actually collect real data and do analysis on it, and get a feel for the detective nature of the discipline. The problem with sampling is that we seldom have access to truly random data. By random I am not meaning just simple random sampling, the least simple method! Even cluster, systematic and stratified sampling can be a challenge in a classroom setting. And sometimes if we think too hard we realise that what we have is actually a population, and not a sample at all.

It is a great experience for students to collect their own data. They can write a questionnaire and find out all sorts of interesting things, through their own trial and error. But mostly students do not have access to enough subjects to take a random sample. Even if we go to secondary sources, the data is seldom random, and the students do not get the opportunity to take the sample. It would be a pity not to use some interesting data, just because the collection method was dubious (or even realistic). At the same time we do not want students to think that seriously dodgy data has the same value as a carefully collected random sample.

Possible solutions

These are more suggestions than solutions, but the essence is to do the best you can and make sure the students learn to be critical of their own methods.

Teach the best way, pretend and look for potential problems.

Teach the ideal and also teach the reality. Teach about the different ways of taking random samples. Use my video if you like!

Get students to think about the pros and cons of each method, and where problems could arise. Also get them to think about the kinds of data they are using in their exercises, and what biases they may have.

We also need to teach that, used judiciously, a convenience sample can still be of value. For example I have collected data from students in my class about how far they live from university , and whether or not they have a car. This data is not a random sample of any population. However, it is still reasonable to suggest that it may represent all the students at the university – or maybe just the first year students. It possibly represents students in the years preceding and following my sample, unless something has happened to change the landscape. It has worth in terms of inference. Realistically, I am never going to take a truly random sample of all university students, so this may be the most suitable data I ever get.  I have no doubt that it is better than no information.

All questions are not of equal worth. Knowing whether students who own cars live further from university, in general, is interesting but not of great importance. Were I to be researching topics of great importance, such safety features in roads or medicine, I would have a greater need for rigorous sampling.

So generally, I see no harm in pretending. I use the data collected from my class, and I say that we will pretend that it comes from a representative random sample. We talk about why it isn’t, but then we move on. It is still interesting data, it is real and it is there. When we write up analysis we include critical comments with provisos on how the sample may have possible bias.

What is important is for students to experience the excitement of discovering real effects (or lack thereof) in real data. What is important is for students to be critical of these discoveries, through understanding the limitations of the data collection process. Consequently I see no harm in using non-random, realistic sampled real data, with a healthy dose of scepticism.

Open Letter to Khan Academy about Basic Probability

Khan academy probability videos and exercises aren’t good either

Dear Mr Khan

You have created an amazing resource that thousands of people all over the world get a lot of help from. Well done. Some of your materials are not very good, though, so I am writing this open letter in the hope that it might make some difference. Like many others, I believe that something as popular as Khan Academy will benefit from constructive criticism.

I fear that the reason that so many people like your mathematics videos so much is not because the videos are good, but because their experience in the classroom is so bad, and the curriculum is poorly thought out and encourages mechanistic thinking. This opinion is borne out by comments I have read from parents and other bloggers. The parents love you because you help their children pass tests.  (And these tests are clearly testing the type of material you are helping them to pass!) The bloggers are not so happy, because you perpetuate a type of mathematical instruction that should have disappeared by now. I can’t even imagine what the history teachers say about your content-driven delivery, but I will stick to what I know. (You can read one critique here)

Just over a year ago I wrote a balanced review of some of the Khan Academy videos about statistics. I know that statistics is difficult to explain – in fact one of the hardest subjects to teach. You can read my review here. I’ve also reviewed a selection of videos about confidence intervals, one of which was from Khan Academy. You can read the review here.

Consequently I am aware that blogging about the Khan Academy in anything other than glowing terms is an invitation for vitriol from your followers.

However, I thought it was about time I looked at the exercises that are available on KA, wondering if I should recommend them to high school teachers for their students to use for review. I decided to focus on one section, introduction to probability. I put myself in the place of a person who was struggling to understand probability at school.

Here is the verdict.

First of all the site is very nice. It shows that it has a good sized budget to use on graphics and site mechanics. It is friendly to get into. I was a bit confused that the first section in the Probability and Statistics Section is called “Independent and dependent events”. It was the first section though. The first section of this first section is called Basic Probability, so I felt I was in the right place. But then under the heading, Basic probability, it says, “Can I pick a red frog out of a bag that only contains marbles?” Now I have no trouble with humour per se, and some people find my videos pretty funny. But I am very careful to avoid confusing people with the humour. For an anxious student who is looking for help, that is a bit confusing.

I was excited to see that this section had five videos, and two sets of exercises. I was pleased about that, as I’ve wanted to try out some exercises for some time, particularly after reading the review from Fawn Nguyen on her experience with exercises on Khan Academy. (I suggest you read this – it’s pretty funny.)

So I watched the first video about probability and it was like any other KA video I’ve viewed, with primitive graphics and a stumbling repetitive narration. It was correct enough, but did not take into account any of the more recent work on understanding probability. It used coins and dice. Big yawn. It wastes a lot of time. It was ok. I do like that you have the interactive transcript so you can find your way around.

It dawned on me that nowhere do you actually talk about what probability is. You seem to assume that the students already know that. In the very start of the first video it says,

“What I want to do in this video is give you at least a basic overview of probability. Probability, a word that you’ve probably heard a lot of and you are probably just a little bit familiar with it. Hopefully this will get you a little deeper understanding.”

Later in the video there is a section on the idea of large numbers of repetitions, which is one way of understanding probability. But it really is a bit skimpy on why anyone would want to find or estimate a probability, and what the values actually mean. But it was ok.

The first video was about single instances – one toss of a coin or one roll of a die. Then the second video showed you how to answer the questions in the exercises, which involved two dice. This seemed ok, if rather a sudden jump from the first video. Sadly both of these examples perpetuate the common misconception that if there are, say, 6 alternative outcomes, they will necessarily be equally likely.

Exercises

Then we get to some exercises called “Probability Space” , which is not an enormously helpful heading. But my main quest was to have a go at the exercises, so that is what I did. And that was not a good thing. The exercises were not stepped, but started right away with an example involving two dice and the phrase “at least one of”. There was meant to be a graphic to help me, but instead I had the message “scratchpad not available”. I will summarise my concerns about the exercises at the end of my letter. I clicked on a link to a video that wasn’t listed on the left, called Probability Space and got a different kind of video.

This video was better in that it had moving pictures and a script. But I have problems with gambling in videos like this. There are some cultures in which gambling is not acceptable. The other problem I have is with the term  “exact probability”, which was used several times. What do we mean by “exact probability”? How does he know it is exact? I think this sends the wrong message.

Then on to the next videos which were worked examples, entitled “Example: marbles from a bag, Example: Picking a non-blue marble, Example: Picking a yellow marble.” Now I understand that you don’t want to scare students with terminology too early, but I would have thought it helpful to call the second one, “complementary events, picking a non-blue marble”. That way if a student were having problems with complementary events in exercises from school, they could find their way here. But then I’m not sure who your audience is. Are you sure who your audience is?

The first marble video was ok, though the terminology was sloppy.

The second marble video, called “Example: picking a non-blue marble”, is glacially slow. There is a point, I guess in showing students how to draw a bag and marbles, but… Then the next example is of picking numbers at random. Why would we ever want to do this? Then we come to an example of circular targets. This involves some problem-solving regarding areas of circles, and cancelling out fractions including pi. What is this about? We are trying to teach about probablity so why have you brought in some complication involving the area of a circle?

The third marble video attempts to introduce the idea of events, but doesn’t really. By trying not to confuse with technical terms, the explanation is more confusing.

Now onto some more exercises. The Khan model is that you have to get 5 correct in a row in order to complete an exercise. I hope there is some sensible explanation for this, because it sure would drive me crazy to have to do that. (As I heard expressed on Twitter)

What are circular targets doing in with basic probability?

The first example is a circular target one.  I SO could not be bothered working out the area stuff so I used the hints to find the answer so I could move onto a more interesting example. The next example was finding the probability of a rolling a 4 from a fair six sided die. This is trivial, but would have been not a bad example to start with. Next question involve three colours of marbles, and finding the probability of not green. Then another dart-board one. Sigh. Then another dart board one. I’m never going to find out what happens if I get five right in a row if I don’t start doing these properly. Oh now – it gave me circumference. SO can’t be bothered.

And that was the end of Basic probability. I never did find out what happens if I get five correct in a row.

Venn diagrams

The next topic is called “Venn diagrams and adding probabilities “. I couldn’t resist seeing what you would do with a Venn diagram. This one nearly reduced me to tears.

As you know by now, I have an issue with gambling, so it will come as no surprise that I object to the use of playing cards in this example. It makes the assumption that students know about playing cards. You do take one and a half minutes to explain the contents of a standard pack of cards.  Maybe this is part of the curriculum, and if so, fair enough. The examples are standard – the probability of getting a Jack of Hearts etc. But then at 5:30 you start using Venn diagrams. I like Venn diagrams, but they are NOT good for what you are teaching at this level, and you actually did it wrong. I’ve put a comment in the feedback section, but don’t have great hopes that anything will change. Someone else pointed this out in the feedback two years ago, so no – it isn’t going to change.

Khan Venn diagram

This diagram is misleading, as is shown by the confusion expressed in the questions from viewers. There should be a green 3, a red 12, and a yellow 1.

Now Venn diagrams seem like a good approach in this instance, but decades of experience in teaching and communicating complex probabilities has shown that in most instances a two-way table is more helpful. The table for the Jack of Hearts problem would look like this:

Jacks Not Jacks Total
Hearts 1 12 13
Not Hearts 3 36 39
Total 4 48 52

(Any teachers reading this letter – try it! Tables are SO much easier for problem solving than Venn diagrams)

But let’s get down to principles.

The principles of instruction that KA have not followed in the examples:

  • Start easy and work up
  • Be interesting in your examples – who gives a flying fig about two dice or random numbers?
  • Make sure the hardest part of the question is the thing you are testing. This is particularly violated with the questions involving areas of circles.
  • Don’t make me so bored that I can’t face trying to get five in a row and not succeed.

My point

Yes, I do have one. Mr Khan you clearly can’t be stopped, so can you please get some real teachers with pedagogical content knowledge to go over your materials systematically and make them correct. You have some money now, and you owe it to your benefactors to GET IT RIGHT. Being flippant and amateurish is fine for amateurs but you are now a professional, and you need to be providing material that is professionally produced. I don’t care about the production values – keep the stammers and “lellows” in there if you insist. I’m very happy you don’t have background music as I can’t stand it myself. BUT… PLEASE… get some help and make your videos and exercises correct and pedagogically sound.

Dr Nic

PS – anyone else reading this letter, take a look at the following videos for mathematics.

And of course I think my own Statistics Learning Centre videos are pretty darn good as well.

Other posts about concerns about Khan:

Another Open Letter to Sal ( I particularly like the comment by Michael Paul Goldenberg)

Breaking the cycle (A comprehensive summary of the responses to criticism of Khan

Teaching with School League tables

NCEA League tables in the newspaper

My husband ran for cover this morning when he saw high school NCEA (National Certificates of Educational Achievement)  league tables in the Press. However, rather than rave at him yet again, I will grasp the opportunity to expound to a larger audience. Much as I loathe and despise league tables, they are a great opportunity to teach students to explore data rich reports with a critical and educated eye.  There are many lessons to learn from league tables. With good teaching we can help dispell some of the myths the league tables promulgate.

When a report is made short and easy to understand, there is a good chance that much of the ‘truth’ has been lost along with the complexity. The table in front of me lists 55 secondary and area schools from the Canterbury region. These schools include large “ordinary” schools and small specialist schools such as Van Asch Deaf Education Centre and Southern Regional Health School. They include single-sex and co-ed, private, state-funded and integrated. They include area schools which are in small rural communities, which cover ages 5 to 21. The “decile” of each of the schools is the only contextual information given, apart from the name of the school.  (I explain the decile, along with misconceptions at the end of the post.) For each school is given percentages of students passing at the three levels. It is not clear whether the percentages in the newspaper are of participation rate or school roll.

This is highly motivating information for students as it is about them and their school. I had an argument recently with a student from a school which scores highly in NCEA. She was insistent that her friend should change schools from one that has lower scores. What she did not understand was that the friend had some extra learning difficulties, and that the other school was probably more appropriate for her. I tried to teach the concept of added-value, but that wasn’t going in either. However I was impressed with her loyalty to her school and I think these tables would provide an interesting forum for discussion.

Great context discussion

You could start with talking about what the students think will help a school to have high pass rates. This could include a school culture of achievement, good teaching, well-prepared students and good resources. This can also include selection and exclusion of students to suit the desired results, selection of “easy” standards or subjects, and even less rigorous marking of internal assessment. Other factors to explore might be single-sex vs co-ed school, the ethnic and cultural backgrounds of the students, private vs state-funded schools.  All of these are potential explanatory variables. Then you can point out how little of this information is actually taken into account in the table. This is a very common occurrence, with limited space and inclusion of raw data. I suspect at least one school appears less successful because some of the students sit different exams, either Cambridge or International Baccalaureate. These may be the students who would have performed well in NCEA.

Small populations

It would be good to look at the impact of small populations, and populations of very different sizes in the data. Students should think about what impact their behaviour will have on the results of the school, compared with a larger or smaller cohort. The raw data provided by the Ministry of Education does give a warning for small cohorts. For a small school, particularly in a rural area, there may be only a handful of students in year 13, so that one student’s success or failure has a large impact on the outcome. At the other end of the scale, there are schools of over 2000, which will have about 400 students in year 13. This effect is important to understand in all statistical reporting. One bad event in a small hospital, for instance, will have a larger percentage effect than in a large hospital.

Different rules

We hear a lot about comparing apples and oranges. School league tables include a whole fruit basket of different criteria. Schools use different criteria for allowing students into the school, into different courses, and whether they are permitted to sit external standards. Attitudes to students with special educational needs vary greatly. Some schools encourage students to sit levels outside their year level.

Extrapolating from a small picture

What one of the accompanying stories points out is that NCEA is only a part of what schools do. Sometimes the things that are measurable get more attention because it is easier to report in bulk. A further discussion with students could be provoked using statements such as the following, which the students can vote on, and then discuss. You could also discuss what evidence you would need to be able to refute or support them.

  • A school that does well in NCEA level 3 is a good school.
  • Girls’ schools do better than boys’ schools at NCEA because girls are smarter than boys.
  • Country schools don’t do very well because the clever students go to boarding school in the city.
  • Boys are more satisfied with doing just enough to get achieved.

Further extension

If students are really interested you can download the full results from the Ministry of Education website and set up a pivot table on Excel to explore questions.

I can foresee some engaging and even heated discussions ensuing. I’d love to hear how they go.

Short explanation of Decile – see also official website.

The decile rating of the school is an index developed in New Zealand and is a measure of social deprivation. The decile rating is calculated from a combination of five values taken from census data for the meshblocks in which the students reside. A school with a low decile rating of 1 or 2 will have a large percentage of students from homes that are crowded, or whose parents are not in work or have no educational qualifications. A school with a decile rating of 10 will have the fewest students from homes like that. The system was set up to help with targeted funding for educational achievement. It recognises that students from disadvantaged homes will need additional resources in order to give them equal opportunity to learn. However, the term has entered the New Zealand vernacular as a measure of socio-economic status, and often even of worth. A decile 10 school is often seen as a rich school or a “top” school. The reality is that this is not the case.  Another common misconception is that one tenth of the population of school age students is in each of the ten bands. How it really works is that one tenth of schools is in each of the bands. The lower decile schools are generally smaller than other schools, and mostly primary schools. In 2002 there were nearly 40,000 secondary students in decile 10 schools, with fewer than 10,000 in decile 1 schools.

Conceptualising Probability

The problem with probability is that it doesn’t really exist. Certainly it never exists in the past.

Probability is an invention we use to communicate our thoughts about how likely something is to happen. We have collectively agreed that 1 is a certain event and 0 is impossible. 0.5 means that there is just as much chance of something happening as not. We have some shared perception that 0.9 means that something is much more likely to happen than to not happen. Probability is also useful for when we want to do some calculations about something that isn’t certain. Often it is too hard to incorporate all uncertainty, so we assume certainty and put in some allowance for error.

Sometimes probability is used for things that happen over and over again, and in that case we feel we can check to see if our predication about how likely something is to happen was correct. The problem here is that we actually need things to happen a really big lot of times under the same circumstances in order to assess if we were correct. But when we are talking about the probability of a single event, that either will or won’t happen, we can’t test out if we were right or not afterwards, because by that time it either did or didn’t happen. The probability no longer exists.

Thus to say that there is a “true” probability somewhere in existence is rather contrived. The truth is that it either will happen or it won’t. The only way to know a true probability would be if this one event were to happen over and over and over, in the wonderful fiction of parallel universes. We could then count how many times it would turn out one way rather than another. At which point the universes would diverge!

However, for the interests of teaching about probability, there is the construct that there exists a “true probability” that something will happen.

Why think about probability?

What prompted these musings about probability was exploring the new NZ curriculum and companion documents, the Senior Secondary Guide and nzmaths.co.nz.

In Level 8 (last year of secondary school) of the senior secondary guide it says, “Selects and uses an appropriate distribution to solve a problem, demonstrating understanding of the relationship between true probability (unknown and unique to the situation), model estimates (theoretical probability) and experimental estimates.”

And at NZC level 3 (years 5 and 6 at Primary school!) in the Key ideas in Probability it talks about “Good Model, No Model and Poor Model” This statement is referred to at all levels above level 3 as well.

I decided I needed to make sense of these two conceptual frameworks: true-model-experimental and good-poor-no, and tie it to my previous conceptual framework of classical-frequency-subjective.

Here goes!

Delicious Mandarins

Let’s make this a little more concrete with an example. We need a one-off event. What is the probability that the next mandarin I eat will be delicious? It is currently mandarin season in New Zealand, and there is nothing better than a good mandarin, with the desired combination of sweet and sour, and with plenty of juice and a good texture. But, being a natural product, there is a high level of variability in the quality of mandarins, especially when they may have parted company with the tree some time ago.

There are two possible outcomes for my future event. The mandarin will be delicious or it will not. I will decide when I eat it. Some may say that there is actually a continuum of deliciousness, but for now this is not the case. I have an internal idea of deliciousness and I will know. I think back to my previous experience with mandarins. I think about a quarter are horrible, a half are nice enough and about a quarter are delicious (using the Dr Nic scale of mandarin grading). If the mandarin I eat next belongs to the same population as the ones in my memory, then I can predict that there is a 25% probability that the mandarin will be delicious.

The NZ curriculum talks about “true” probability which implies that any value I give to the probability is only a model. It may be a model based on empirical or experimental evidence. It can be based on theoretical probabilities from vast amounts of evidence, which has given us the normal distribution. The value may be only a number dredged up from my soul, which expresses the inner feeling of how likely it is that the mandarin will be delicious, based on several decades of experience in mandarin consumption.

More examples

Let us look at some more examples:

What is the probability that:

  • I will hear a bird on the way to work?
  • the flight home will be safe?
  • it will be raining when I get to Christchurch?
  • I will get a raisin in my first spoonful of muesli?
  • I will get at least one raisin in half of my spoonfuls of muesli?
  • the shower in my hotel room will be enjoyable?
  • I will get a rare Lego ® minifigure next time I buy one?

All of these events are probabilistic and have varying degrees of certainty and varying degrees of ease of modelling.

Easy to model Hard to model
Unlikely Get a rare Lego ® minifigure Raining in Christchurch
No idea Raisin in half my spoonfuls Enjoyable shower
Likely Raisin in first spoonful Bird, safe flight home

And as I construct this table I realise also that there are varying degrees of importance. Except for the flight home, none of those examples matter. I am hoping that a safe flight home has a probability extremely close to 1. I realise that there is a possibility of an incident. And it is difficult to model. But people have modelled air safety and the universal conclusion is that it is safer than driving. So I will take the probability and fly.

Conceptual Frameworks

How do we explain the different ways that probability has been described? I will now examine the three conceptual frameworks I introduced earlier, starting with the easiest.

Traditional categorisation

This is found in some form in many elementary college statistics text books. The traditional framework has three categories –classical or “a priori”, frequency or historical, and subjective.

Classical or “a priori” – I had thought of this as being “true” probability. To me, if there are three red and three white Lego® blocks in a bag and I take one out without looking, there is a 50% chance that I will get a red one. End of story. How could it be wrong? This definition is the mathematically interesting aspect of probability. It is elegant and has cool formulas and you can make up all sorts of fun examples using it. And it is the basis of gambling.

Frequency or historical – we draw on long term results of similar trials to gain information. For example we look at the rate of germination of a certain kind of seed by experiment, and that becomes a good approximation of the likelihood that any one future seed will germinate. And it also gives us a good estimate of what proportion of seeds in the future will germinate.

Subjective – We guess! We draw on our experience of previous similar events and we take a stab at it. This is not seen as a particularly good way to come up with a probability, but when we are talking about one off events, it is impossible to assess in retrospect how good the subjective probability estimate was. There is considerable research in the field of psychology about the human ability or lack thereof to attribute subjective probabilities to events.

In teaching the three part categorisation of sources of probability I had problems with the probability of rain. Where does that fit in the three categories? It uses previous experimental data to build a model, and current data to put into the model, and then a probability is produced. I decided that there is a fourth category, that I called “modelled”. But really that isn’t correct, as they are all models.

NZ curriculum terminology

So where does this all fit in the New Zealand curriculum pronouncements about probability? There are two conceptual frameworks that are used in the document, each with three categories as follows:

True, modelled, experimental

In this framework we start with the supposition that there exists somewhere in the universe a true probability distribution. We cannot know this. Our expressions of probability are only guesses at what this might be. There are two approaches we can take to estimate this “truth”. These two approaches are not independent of each other, but often intertwined.

One is a model estimate, based on theory, such as that the probability of a single outcome is the number of equally likely ways that it can occur over the number of possible outcomes. This accounts for the probability of a red brick as opposed to a white brick, drawn at random. Another example of a modelled estimate is the use of distributions such as the binomial or normal.

In addition there is the category of experimental estimate, in which we use data to draw conclusions about what it likely to happen. This is equivalent to the frequency or historical category above. Often modelled distributions use data from an experiment also. And experimental probability relies on models as well.  The main idea is that neither the modelled nor the experimental estimate of the “true” probability distribution is the true distribution, but rather a model of some sort.

Good model, poor model, no model

The other conceptual framework stated in the NZ curriculum is that of good model, poor model and no model, which relates to fitness for purpose. When it is important to have a “correct” estimate of a probability such as for building safety, gambling machines, and life insurance, then we would put effort into getting as good a model as possible. Conversely, sometimes little effort is required. Classical models are very good models, often of trivial examples such as dice games and coin tossing. Frequency models aka experimental models may or may not be good models, depending on how many observations are included, and how much the future is similar to the past. For example, a model of sales of slide rules developed before the invention of the pocket calculator will be a poor model for current sales. The ground rules have changed. And a model built on data from five observations of is unlikely to be a good model. A poor model is not fit for purpose and requires development, unless the stakes are so low that we don’t care, or the cost of better fitting is greater than the reward.

I have problems with the concept of “no model”. I presume that is the starting point, from which we develop a model or do not develop a model if it really doesn’t matter. In my examples above I include the probability that I will hear a bird on the way to work. This is not important, but rather an idle musing. I suspect I probably will hear a bird, so long as I walk and listen. But if it rains, I may not. As I am writing this in a hotel in an unfamiliar area I have no experience on which to draw. I think this comes pretty close to “no model”. I will take a guess and say the probability is 0.8. I’m pretty sure that I will hear a bird. Of course, now that I have said this, I will listen carefully, as I would feel vindicated if I hear a bird. But if I do not hear a bird, was my estimate of the probability wrong? No – I could assume that I just happened to be in the 0.2 area of my prediction. But coming back to the “no model” concept – there is now a model. I have allocated the probability of 0.8 to the likelihood of hearing a bird. This is a model. I don’t even know if it is a good model or a poor model. I will not be walking to work this way again, so I cannot even test it out for the future, and besides, my model was only for this one day, not for all days of walking to work.

So there you have it – my totally unscholarly musings on the different categorisations of probability.

What are the implications for teaching?

We need to try not to perpetuate the idea that probability is the truth. But at the same time we do not wish to make students think that probability is without merit. Probability is a very useful, and at times highly precise way of modelling and understanding the vagaries of the universe. The more teachers can use language that implies modelling rather than rules, the better. It is common, but not strictly correct to say, “This process follows a normal distribution”. As Einstein famously and enigmatically said, “God does not play dice”. Neither does God or nature use normal distribution values to determine the outcomes of natural processes. It is better to say, “this process is usefully modelled by the normal distribution.”

We can have learning experiences that help students to appreciate certainty and uncertainty and the modelling of probabilities that are not equi-probable. Thanks to the overuse of dice and coins, it is too common for people to assess things as having equal probabilities. And students need to use experiments.  First they need to appreciate that it can take a large number of observations before we can be happy that it is a “good” model. Secondly they need to use experiments to attempt to model an otherwise unknown probability distribution. What fun can be had in such a class!

But, oh mathematical ones, do not despair – the rules are still the same, it’s just the vigour with which we state them that has changed.

Comment away!

Post Script

In case anyone is interested, here are the outcomes which now have a probability of 1, as they have already occurred.

  • I will hear a bird on the way to work? Almost the minute I walked out the door!
  • the flight home will be safe? Inasmuch as I am in one piece, it was safe.
  • it will be raining when I get to Christchurch? No it wasn’t
  • I will get a raisin in my first spoonful of muesli? I did
  • I will get at least one raisin in half of my spoonfuls of muesli? I couldn’t be bothered counting.
  • the shower in my hotel room will be enjoyable? It was okay.
  • I will get a rare Lego minifigure next time I buy one? Still in the future!

Oh Ordinal data, what do we do with you?

What can you do with ordinal data? Or more to the point, what shouldn’t you do with ordinal data?

First of all, let’s look at what ordinal data is.

It is usual in statistics and other sciences to classify types of data in a number of ways. In 1946, Stanley Smith Stevens suggested a theory of levels of measurement, in which all measurements are classified into four categories, Nominal, Ordinal, Interval and Ratio. This categorisation is used extensively, and I have a popular video explaining them. (Though I group Interval and Ratio together as there is not much difference in their behaviour for most statistical analysis.)

Costing no more than a box of popcorn, our snack-size course will help help you learn all you need to know about types of data.

Costing no more than a box of popcorn, our snack-size course will help help you learn all you need to know about types of data, and appropriate statistics and graphs.

Nominal is pretty straight-forward. This category includes any data that is put into groups, in which there is no inherent order. Examples of nominal data are country of origin, sex, type of cake, or sport. Similarly it is pretty easy to explain interval/ratio data. It is something that is measured, by length, weight, time (duration), cost and similar. These two categorisations can also be given as qualitative and quantitative, or non-parametric and parametric.

Ordinal data

But then we come to ordinal level of measurement. This is used to describe data that has a sense of order, but for which we cannot be sure that the distances between the consecutive values are equal. For example, level of qualification has a sense of order

  • A postgraduate degree is higher than
  • a Bachelor’s degree,which is higher than
  • a high-school qualification, which is higher
  • than no qualification.

There are four steps on the scale, and it is clear that there is a logical sense of order. However, we cannot sensibly say that the difference between no qualification and a high-school qualification is equivalent to the difference between the high-school qualification and a bachelor’s degree, even though both of those are represented by one step up the scale.

Another example of ordinal level of measurement is used extensively in psychological, educational and marketing research, known as a Likert scale. (Though I believe the correct term is actually Likert item – and according to Wikipedia, the pronunciation should be Lick it, not Like it, as I have used for some decades!). A statement is given, and the response is given as a value, often from 1 to 5, showing agreement to the statement. Often the words “Strongly agree, agree, neutral, disagree, strongly disagree” are used. There is clearly an order in the five possible responses. Sometimes a seven point scale is used, and sometimes the “neutral” response is eliminated in an attempt to force the respondent to commit one way or the other.

The question at the start of this post has an ordinal response, which could be perceived as indicating how quantitative the respondent believes ordinal data to be.

What prompted this post was a question from Nancy under the YouTube video above, asking:

“Dr Nic could you please clarify which kinds of statistical techniques can be applied to ordinal data (e.g. Likert-scale). Is it true that only non-parametric statistics are possible to apply?”

Well!

As shown in the video, there are the purists, who are adamant that ordinal data is qualitative. There is no way that a mean should ever be calculated for ordinal, data, and the most mathematical thing you can do with it is find the median. At the other pole are the practical types, who happily calculate means for any ordinal data, without any concern for the meaning (no pun intended.)

There are differing views on finding the mean for ordinal data.

There are differing views on finding the mean for ordinal data.

So the answer to Nancy would depend on what school of thought you belong to.

Here’s what I think:

All ordinal data is not the same. There is a continuum of “ordinality” if you like.

There are some instances of ordinal data which are pretty much nominal, with a little bit of order thrown in. These should be distinguished from nominal data, only in that they should always be graphed as a bar chart (rather than a pie-chart)* because there is inherent order. The mode is probably the only sensible summary value other than frequencies. In the examples above, I would say that “level of qualification” is only barely ordinal. I would not support calculating a mean for the level of qualification. It is clear that the gaps are not equal, and additionally any non-integer result would have doubtful interpretation.

Then there are other instances of ordinal data for which it is reasonable to treat it as interval data and calculate the mean and median. It might even be supportable to use it in a correlation or regression. This should always be done with caution, and an awareness that the intervals are not equal.

Here is an example for which I believe it is acceptable to use the mean of an ordinal scale. At the beginning and the end of a university statistics course, the class of 200 students is asked the following question: How useful do you think a knowledge of statistics is will be to you in your future career? Very useful, useful, not useful.

Now this is not even a very good Likert question, as the positive and negative elements are not balanced. There are only three choices. There is no evidence that the gaps between the elements are equal. However if we score the elements as 3,2 and 1, respectively and find that the mean for the 200 students is 1.5 before the course, and 2.5 after the course, I would say that there is meaning in what we are reporting. There are specific tests to use for this – and we could also look at how many students changed their minds positively or negatively. But even without the specific test, we are treating this ordinal data as something more than qualitative. What also strengthens the evidence for doing this is that the test is performed on the same students, who will probably perceive the scale in the same way each time, making the comparison more valid.

So what I’m saying is that it is wrong to make a blanket statement that ordinal data can or can’t be treated like interval data. It depends on meaning and number of elements in the scale.

What do we teach?

And again the answer is that it depends! For my classes in business statistics I told them that it depends. If you are teaching a mathematical statistics class, then a more hard line approach is justified. However, at the same time as saying, “you should never calculate the mean of ordinal data”, it would be worthwhile to point out that it is done all the time! Similarly if you teach that it is okay to find the mean of some ordinal data, I would also point out that there are issues with regard to interpretation and mathematical correctness.

Please comment!

Foot note on Pie charts

*Yes, I too eschew pie-charts, but for two or three categories of nominal data, where there are marked differences in frequency, if you really insist, I guess you could possibly use them, so long as they are not 3D and definitely not exploding. But even then, a barchart is better. – perhaps a post for another day, but so many have done this.

Which comes first – problem or solution?

In teaching it can be difficult to know whether to start with a problem or a solution method. It seems more obvious to start with the problem, but sometimes it is better to introduce the possibility of the solution before posing the problem.

Mathematics teaching

A common teaching method in mathematics is to teach the theory, followed by applications. Or not followed by applications. I seem to remember learning a lot of mathematics with absolutely no application – which was fine by me, because it was fun. My husband once came home from survey school, and excitedly told me that he was using complex numbers for some sort of transformation between two irregular surfaces. Who’d have thought? I had never dreamed there could be a real-life use for the square root of -1. I just thought it was a cool idea someone thought up for the heck of it.

But yet again we come to the point that statistics and operations research are not mathematics. Without context and real-life application they cease to exist and turn into … mathematics!

Applicable mathematics

My colleague wrote a guest post about “applicable mathematics” which he separates from “applied mathematics”. Applicable maths appears when teachers make up applications to try to make mathematics seem useful. There is little to recommend about applicable maths. A form of “applicable maths” occurs in probability assessment questions where the examiner decides not to tell the examinee all the information, and the examinee has to draw Venn diagrams and use logical thinking to find out something that clearly anyone in the real world would be able to read in the data! I actually enjoy answering questions like that, and they have a point in helping students understand the underlying structure of the data. But I do not fool myself into thinking that they are anywhere near real-life. Nor are they statistics.

Which first – theory or application?

So the question is – when teaching statistics and operations research, should you start with an application or a problem or a case, and work from there to the theory? Or do students need some theory, or at least an understanding of basic principles before a case or problem can have any meaning? Or in a sequence of learning do we move back and forward between theory and application?

My first off response is that of course we should start with the data, as many books on the teaching of statistics teach us. Well actually we should start with the problem, as that really precedes the collection of the data. But then, how can we know what sorts of problems to frame if we don’t have some idea of what is possible through modelling and statistics? So should we first begin with some theory? The New Zealand Curriculum emphasises the PPDAC cycle, Problem, Plan, Data, Analysis, Conclusion. However, in order to pose the problem in the first place, we need the theory of the PPDAC cycle itself. The answer is not simple and depends on the context.

I have recently made a set of three videos explaining confidence intervals and bootstrapping. These are two very difficult topics that become simple in an instant. What I mean by that is, until you understand a confidence interval, it makes no sense, and you can see no reason why it should make sense. You go through a “liminal space” of confusion and anxiety. Then when you emerge out the other side, instantly confidence intervals make sense, and it is equally difficult to see what it was that made them confusing. This dichotomy makes teaching difficult, as the teacher needs to try to understand what made the problem confusing.

I present the idea of a confidence interval first. Then I use examples. I present the idea of bootstrapping, then give examples. I think in this instance it is helpful to delineate the theory or the idea in reasonably abstract form, interspersed with examples. I also think diagrams are immensely useful, but that’s another topic.

Critique of AtMyPace: Statistics

What prompted these thoughts about “which comes first” was a comment made about our “AtMyPace: Statistics” iOS app.


The YouTube videos used in AtMyPace:Statistics were developed to answer specific needs in a course. They generally take the format of a quick summary of the theory, followed by an example, often related to Helen and her business selling choconutties.

The iOS app, AtMyPace:Statistics was set up as a way to capitalise on the success of the YouTube videos, and we added two quizzes of ten True/false questions to complement each of the videos. We also put these same quizzes in our on-line course and found that they were surprisingly popular. In a way, they are a substitute for a textbook or notes, but require the person to commit one way or the other to an answer before reading a further explanation. We had happened on a effective way of engaging students with the material.

AtMyPace:Statistics is not designed to be a full course in statistics, but rather a tool to help students who might be struggling with concepts. We have also developed a web-based version of AtMyPace:Statistics for those who are not the happy owners of iOS devices. At present the web version is a copy of the app, but we will happily add other questions and activities when the demand arises.

I received the following critique of the AtMyPace: Statistics app:

“They are nicely done but very classical in scope. The approach is tools-oriented using a few “realistic” examples to demonstrate the tool. This could work for students who need to take exams and want accessible material.”

Very true. The material in AtMyPace:Statistics is classical in scope, as we focus on the material currently being taught in most business schools and first year statistics service courses. We are trying to make a living, and once that is happening we will set out to change the world!

The reviewer continues,

“ I think that in adult education you should reverse the order and have the training problem oriented. Take a six sigma DMAIC process as an example. The backbone is a problem scheduled to be solved. The path is DMAIC and the tools are supporting the journey. If you want to do it that way you need to tailor the problem to the audience. “

In tailored adult education it is likely that a problem-based approach will work. I would strongly recommend it.

I had an interesting discussion some time ago with a young lecturer working in a prestigious case-based MBA programme in North America. The entire MBA is taught using cases, and is popular and successful. My friend had some reservations about case-based teaching for a subject like Operations Research which has a body of skills which are needed as a foundation for analysis. Statistics would be similar. The question is making sure the students have the necessary skills and knowledge, with the ability to transfer to another setting or problem. Case-based learning is not an efficient way to accomplish this.

Criticism on Choosing the Test procedure

In another instance, David Munroe commented on our video “Choosing which statistical test to use”, which receives about 1000 views a week.  In the video I suggest a three step process involving thinking about what kind of data we have, what kind of sample, and the purpose of the analysis. The comment was:

Myself I would put purpose first. :) The purpose of the analysis determines what data should be collected – and more data is not necessarily more informative. In my view it is more useful to think ‘what am I trying to achieve’ with this analysis before collecting the data (so the right data have a chance to be collected). This in contrast to: collecting the data and then going ‘now what can I get from this data?’ (although this is sometimes an appropriate research technique). I think because we’ve already collected the data any time we’re illustrating particular modelling tools or statistical tests, we reinforce the ‘collect the data first then worry about analysis’ approach – at least subconsciously.

Thanks David! Good thinking, and if I ever redo the video I may well change the order. I chose the order I did, as it seemed to go from easy to difficult. (Actually I don’t remember consciously thinking about the order – it just fell out of individual help sessions with students.)  And the diagram was developed in response to the rather artificial problems I was posing!

I’ll step back a bit and explain. One problem I have seen in teaching Statistics and Operations Research is that students fail to make connections. They also compartmentalise the different aspects and find it difficult to work out when certain procedures would be most useful. I wrote a post about this. In the statistics course I wrote a set of scenarios describing possible applications of statistical methods in a business context. The students were required to work out which technique to use in each scenario and found this remarkably difficult. They could perform a test on difference of two means quite well, but were hard-pressed to discern when the test should be used. So I made up even more questions to give them more practice, and designed my three step method for deciding on the test.  This helped.

I had not thought of it as a way to decide in a real-life situation which test to use. Surely that would be part of a much bigger process.  So my questions are rather artificial, but that doesn’t make them bad questions. Their point was to help students make linkages between different parts of the course. And for that, it works.

Bring on the criticism

I would like to finish by saying how much I appreciate criticism. It is nice when people tell me they like my materials. I feel as if I am doing something useful and helping people. I get frequent comments of this type on my YouTube site.  But when people make the effort to point out gaps and flaws in the material I am extremely grateful as it helps me to clarify my thinking and improve the approach. If nothing else, it gives me something to talk about in my blog. It is difficult producing material in a feedback vacuum.  So keep it coming!

Context – if it isn’t fun…

The role of context in statistical analysis

The wonderful advantage of teaching statistics is the real-life context within which any applicaton must exist. This can also be one of the difficulties. Statistics without context is merely the mathematics of statistics, and is sterile and theoretical.  The teaching of statistics requires real data. And real data often comes with a fairly solid back-story.

One of the interesting aspects for practicing statisticians, is that they can find out about a wide range of applications, by working in partnership with specialists. In my statistical and operations research advising I have learned about a range of subjects, including the treatment of hand injuries, children’s developmental understanding of probability, the bed occupancy in public hospitals, the educational needs of blind students, growth rates of vegetables, texted comments on service at supermarkets, killing methods of chickens, rogaine route choice, co-ordinating scientific expeditions to Antarctica and the cost of care for neonatals in intensive care. I found most of these really interesting and was keen to work with the experts on these projects. Statisticians tend to work in teams with specialists in related disciplines.

Learning a context can take time

When one is part of a long-term project, time spent learning the intricacies of the context is well spent. Without that, the meaning from the data can be lost. However, it is difficult to replicate this in the teaching of statistics, particularly in a general high school or service course. The amount of time required to become familiar with the context takes away from the time spent learning statistics. Too much time spent on one specific project or area of interest can mean that the students are unable to generalise. You need several different examples in order to know what is specific to the context and what is general to all or most contexts.

One approach is to try to have contexts with which students are already familiar. This can be enabled by collecting the data from the students themselves. The Census at School project provides international data for students to use in just this way. This is ideal, in that the context is familiar, and yet the data is “dirty” enough to provide challenges and judgment calls.

Some teachers find that this is too low-level and would prefer to use biological data, or dietary or sports data from other sources. I have some reservations about this. In New Zealand the new statistics curriculum is in its final year of introduction, and understandably there are some bedding-in issues. One I perceive is the relative importance of the context in the students’ reports. As these reports have high-stakes grades attached to them, this is an issue. I will use as an example the time series “standard”. The assessment specification states, among other things, “Using the statistical enquiry cycle to investigate time series data involves: using existing data sets, selecting a variable to investigate, selecting and using appropriate display(s), identifying features in the data and relating this to the context, finding an appropriate model, using the model to make a forecast, communicating findings in a conclusion.”

The full “standard” is given here: Investigate Time Series Data This would involve about five weeks of teaching and assessment, in parallel with four other subjects.(The final 3 years of schooling in NZ are assessed through the National Certificate of Educational Achievement (NCEA). Each year students usually take five subject areas, each of which consists of about six “achievement standards” worth between 3 and 6 credits. There is a mixture of internally and externally assessed standards.)

In this specification I see that there is a requirement for the model to be related to the context. This is a great opportunity for teachers to show how models are useful, and their limitations. I would be happy with a few sentences indicating that the student could identify a seasonal pattern and make some suggestions as to why this might relate to the context, followed by a similar analysis of the shape of the trend. However there are some teachers who are requiring students to do independent literature exploration into the area, and requiring references, while forbidding the referencing of Wikipedia.

This concerns me, and I call for robust discussion.

Statistics is not research methods any more than statistics is mathematics. Research methods and standards of evidence vary between disciplines. Clearly the evidence required in medical research will differ from that of marketing research. I do not think it is the place of the statistics teacher to be covering this. Mathematics teachers are already being stretched to teach the unfamiliar material of statistics, and I think asking them and the students to become expert in research methods is going too far.

It is also taking out all the fun.

Keep the fun

Statistics should be fun for the teacher and the students. The context needs to be accessible or you are just putting in another opportunity for antipathy and confusion. If you aren’t having fun, you aren’t doing it right. Or, more to the point, if your students aren’t having fun, you aren’t doing it right.

Some suggestions about the role of context in teaching statistics and operations research

  • Use real data.
  • If the context is difficult to understand, you are losing the point.
  • The results should not be obvious. It is not interesting that year 12 boys weigh more than year 9 boys.
  • Null results are still results. (We aren’t trying for academic publications!)
  • It is okay to clean up data so you don’t confuse students before they are ready for it.
  • Sometimes you should use dirty data – a bit of confusion is beneficial.
  • Various contexts are better than one long project.
  • Avoid the plodding parts of research methods.
  • Avoid boring data. Who gives a flying fish about the relative sizes of dolphin jaws?
  • Wikipedia is a great place to find out the context for most high school statistics analysis. That is where I look. It’s a great starting place for anyone.

Confidence Intervals: informal, traditional, bootstrap

Confidence Intervals

Confidence intervals are needed because there is variation in the world. Nearly all natural, human or technological processes result in outputs which vary to a greater or lesser extent. Examples of this are people’s heights, students’ scores in a well written test and weights of loaves of bread. Sometimes our inability or lack of desire to measure something down to the last microgram will leave us thinking that there is no variation, but it is there. For example we would check the weights of chocolate bars to the nearest gram, and may well find that there is no variation. However if we were to weigh them to the nearest milligram, there would be variation. Drug doses have a much smaller range of variation, but it is there all the same.

You can see a video about some of the main sources of variation – natural, explainable, sampling and due to bias.

When we wish to find out about a phenomenon, the ideal would be to measure all instances. For example we can find out the heights of all students in one class at a given time. However it is impossible to find out the heights of all people in the world at a given time. It is even impossible to know how many people there are in the world at a given time. Whenever it is impossible or too expensive or too destructive or dangerous to measure all instances in a population, we need to take a sample. Ideally we will take a sample that gives each object in the population an equal likelihood of being chosen.

You can see a video here about ways of taking a sample.

When we take a sample there will always be error. It is called sampling error. We may, by chance, get exactly the same value for our sample statistic as the “true” value that exists in the population. However, even if we do, we won’t know that we have.

The sample mean is the best estimate for the population mean, but we need to say how well it is estimating the population mean. For example, say we wish to know the mean (or average) weight of apples in an orchard. We take a sample and find that the mean weight of the apples in the sample  is 153g. If we only took a few apples, it is only a rough idea and we might say we are pretty sure the mean weight of the apples in the orchard is between 143g and 163g. If someone else took a bigger sample, they might be able to say that they are pretty sure that the mean weight of apples in the orchard is between 158g and 166g. You can tell that the second confidence interval is giving us better information as the range of the confidence interval is smaller.

There are two things that affect the width of a confidence interval. The first is the sample size. If we take a really large sample we are getting a lot more information about the population, so our confidence interval will be more exact, or smaller. It is not a one-to-one relationship, but a square-root relationship.  If we wish to reduce the confidence interval by a factor of two, we will need to increase our sample size by a factor of 4.

The second thing to affect the width of a confidence interval is the amount of variation in the population. If all the apples in the orchard are about the same weight, then we will be able to estimate that weight quite accurately. However, if the apples are all different sizes, then it will be harder to be sure that the sample represents the population, and we will have a larger confidence interval as a result.

Three ways to find confidence intervals

Traditional (old-fashioned?) Approach

The standard way of calculating confidence intervals is by using formulas developed on the assumptions of normality and the Central Limit Theorem. These formulas are used to calculate the confidence intervals of means, proportions and slopes, but not for medians or standard deviations. That is because there aren’t nice straight-forward formulas for these. The formulas were developed when there were no computers, and analytical methods were needed in the absence of computational power.

In terms of teaching, these formulas are straight-forward, and also include the concept of level of confidence, which is part of the paradigm. You can see a video teaching the traditional approach to confidence intervals, using Excel to calculate the confidence interval for a mean.

Rule of Thumb

In the New Zealand curriculum at year 12, students are introduced to the concept of inference using an informal method for calculating a confidence interval. The formula is median +/-  1.5 times the interquartile range divided by the square-root of the sample size. There is a similar formula for proportions.

Bootstrapping

Bootstrapping is a very versatile way to find a confidence interval. It has three strengths:

  1. It can be used to calculate the confidence interval for a large range of different parameters.
  2. It uses ALL the information the sample gives us, rather than the summary values
  3. It has been found to aid in understanding the concepts of inference better than the traditional methods.

There are also some disadvantages

  1. Old fogeys don’t like it. (Just kidding) What I mean is that teachers who have always taught using the traditional approach find it difficult to trust what seems like a hit-and-miss method without the familiar theoretical underpinning.
  2. Universities don’t teach bootstrapping as much as the traditional methods.
  3. The common software packages do not include bootstrap confidence intervals.

The idea behind a bootstrap confidence interval is that we make use of the whole sample to represent the population. We take lots and lots of samples of the same size from the original sample. Obviously we need to sample with replacement, or the samples would all be identical. Then we use these repeated samples to get an idea of the distribution of the estimates of the population parameter. We chop the tails off at a given point, and we give the confidence interval.  Voila!

Answers to the disadvantages (burn the straw man?)

  1. There is a sound theoretical underpinning for bootstrap confidence intervals. A good place to start is a previous blog about George Cobb’s work. Either that or – “Trust me, I’m a Doctor!” (This would also include trusting far more knowledgeable people such as Chris Wild and Maxine Pfannkuch, and the team of statistical educators led by Joan Garfield.
  2. We have to start somewhere. Bootstrap methods aren’t used at universities because of inertia. As an academic of twenty years I can say that there is NO PAY OFF for teaching new stuff. It takes up valuable research time and you don’t get promoted, and sometimes you even get made redundant. If students understand what confidence intervals are, and the concept of inference, then learning to use the traditional formulas is trivial. Eventually the universities will shift. I am aware that the University of Auckland now teaches the bootstrap approach.
  3. There are ways to deal with the software package problem. There is a free software interface called “iNZight” that you can download. I believe Fathom also uses bootstrapping. There may be other software. Please let me know of any and I will add them to this post.

In Summary

Confidence intervals involve the concepts of variation, sampling and inference. They are a great way to teach these really important concepts, and to help students be critical of single value estimates. They can be taught informally, traditionally or using bootstrapping methods. Any of the approaches can lead to rote use of formula or algorithm and it is up to teachers to aim for understanding. I’m working on a set of videos around this topic. Watch this space.