The Myth of Random Sampling

I feel a slight quiver of trepidation as I begin this post – a little like the boy who pointed out that the emperor has  no clothes.

Random sampling is a myth. Practical researchers know this and deal with it. Theoretical statisticians live in a theoretical world where random sampling is possible and ubiquitous – which is just as well really. But teachers of statistics live in a strange half-real-half-theoretical world, where no one likes to point out that real-life samples are seldom random.

The problem in general

In order for most inferential statistical conclusions to be valid, the sample we are using must obey certain rules. In particular, each member of the population must have equal possibility of being chosen. In this way we reduce the opportunity for systematic error, or bias. When a truly random sample is taken, it is almost miraculous how well we can make conclusions about the source population, with even a modest sample of a thousand. On a side note, if the general population understood this, and the opportunity for bias and corruption were eliminated, general elections and referenda could be done at much less cost,  through taking a good random sample.

However! It is actually quite difficult to take a random sample of people. Random sampling is doable in biology, I suspect, where seeds or plots of land can be chosen at random. It is also fairly possible in manufacturing processes. Medical research relies on the use of a random sample, though it is seldom of the total population. Really it is more about randomisation, which can be used to support causal claims.

But the area of most interest to most people is people. We actually want to know about how people function, what they think, their economic activity, sport and many other areas. People find people interesting. To get a really good sample of people takes a lot of time and money, and is outside the reach of many researchers. In my own PhD research I approximated a random sample by taking a stratified, cluster semi-random almost convenience sample. I chose representative schools of different types throughout three diverse regions in New Zealand. At each school I asked all the students in a class at each of three year levels. The classes were meant to be randomly selected, but in fact were sometimes just the class that happened to have a teacher away, as my questionnaire was seen as a good way to keep them quiet. Was my data of any worth? I believe so, of course. Was it random? Nope.

Problems people have in getting a good sample include cost, time and also response rate. Much of the data that is cited in papers is far from random.

The problem in teaching

The wonderful thing about teaching statistics is that we can actually collect real data and do analysis on it, and get a feel for the detective nature of the discipline. The problem with sampling is that we seldom have access to truly random data. By random I am not meaning just simple random sampling, the least simple method! Even cluster, systematic and stratified sampling can be a challenge in a classroom setting. And sometimes if we think too hard we realise that what we have is actually a population, and not a sample at all.

It is a great experience for students to collect their own data. They can write a questionnaire and find out all sorts of interesting things, through their own trial and error. But mostly students do not have access to enough subjects to take a random sample. Even if we go to secondary sources, the data is seldom random, and the students do not get the opportunity to take the sample. It would be a pity not to use some interesting data, just because the collection method was dubious (or even realistic). At the same time we do not want students to think that seriously dodgy data has the same value as a carefully collected random sample.

Possible solutions

These are more suggestions than solutions, but the essence is to do the best you can and make sure the students learn to be critical of their own methods.

Teach the best way, pretend and look for potential problems.

Teach the ideal and also teach the reality. Teach about the different ways of taking random samples. Use my video if you like!

Get students to think about the pros and cons of each method, and where problems could arise. Also get them to think about the kinds of data they are using in their exercises, and what biases they may have.

We also need to teach that, used judiciously, a convenience sample can still be of value. For example I have collected data from students in my class about how far they live from university , and whether or not they have a car. This data is not a random sample of any population. However, it is still reasonable to suggest that it may represent all the students at the university – or maybe just the first year students. It possibly represents students in the years preceding and following my sample, unless something has happened to change the landscape. It has worth in terms of inference. Realistically, I am never going to take a truly random sample of all university students, so this may be the most suitable data I ever get.  I have no doubt that it is better than no information.

All questions are not of equal worth. Knowing whether students who own cars live further from university, in general, is interesting but not of great importance. Were I to be researching topics of great importance, such safety features in roads or medicine, I would have a greater need for rigorous sampling.

So generally, I see no harm in pretending. I use the data collected from my class, and I say that we will pretend that it comes from a representative random sample. We talk about why it isn’t, but then we move on. It is still interesting data, it is real and it is there. When we write up analysis we include critical comments with provisos on how the sample may have possible bias.

What is important is for students to experience the excitement of discovering real effects (or lack thereof) in real data. What is important is for students to be critical of these discoveries, through understanding the limitations of the data collection process. Consequently I see no harm in using non-random, realistic sampled real data, with a healthy dose of scepticism.

Open Letter to Khan Academy about Basic Probability

Khan academy probability videos and exercises aren’t good either

Dear Mr Khan

You have created an amazing resource that thousands of people all over the world get a lot of help from. Well done. Some of your materials are not very good, though, so I am writing this open letter in the hope that it might make some difference. Like many others, I believe that something as popular as Khan Academy will benefit from constructive criticism.

I fear that the reason that so many people like your mathematics videos so much is not because the videos are good, but because their experience in the classroom is so bad, and the curriculum is poorly thought out and encourages mechanistic thinking. This opinion is borne out by comments I have read from parents and other bloggers. The parents love you because you help their children pass tests.  (And these tests are clearly testing the type of material you are helping them to pass!) The bloggers are not so happy, because you perpetuate a type of mathematical instruction that should have disappeared by now. I can’t even imagine what the history teachers say about your content-driven delivery, but I will stick to what I know. (You can read one critique here)

Just over a year ago I wrote a balanced review of some of the Khan Academy videos about statistics. I know that statistics is difficult to explain – in fact one of the hardest subjects to teach. You can read my review here. I’ve also reviewed a selection of videos about confidence intervals, one of which was from Khan Academy. You can read the review here.

Consequently I am aware that blogging about the Khan Academy in anything other than glowing terms is an invitation for vitriol from your followers.

However, I thought it was about time I looked at the exercises that are available on KA, wondering if I should recommend them to high school teachers for their students to use for review. I decided to focus on one section, introduction to probability. I put myself in the place of a person who was struggling to understand probability at school.

Here is the verdict.

First of all the site is very nice. It shows that it has a good sized budget to use on graphics and site mechanics. It is friendly to get into. I was a bit confused that the first section in the Probability and Statistics Section is called “Independent and dependent events”. It was the first section though. The first section of this first section is called Basic Probability, so I felt I was in the right place. But then under the heading, Basic probability, it says, “Can I pick a red frog out of a bag that only contains marbles?” Now I have no trouble with humour per se, and some people find my videos pretty funny. But I am very careful to avoid confusing people with the humour. For an anxious student who is looking for help, that is a bit confusing.

I was excited to see that this section had five videos, and two sets of exercises. I was pleased about that, as I’ve wanted to try out some exercises for some time, particularly after reading the review from Fawn Nguyen on her experience with exercises on Khan Academy. (I suggest you read this – it’s pretty funny.)

So I watched the first video about probability and it was like any other KA video I’ve viewed, with primitive graphics and a stumbling repetitive narration. It was correct enough, but did not take into account any of the more recent work on understanding probability. It used coins and dice. Big yawn. It wastes a lot of time. It was ok. I do like that you have the interactive transcript so you can find your way around.

It dawned on me that nowhere do you actually talk about what probability is. You seem to assume that the students already know that. In the very start of the first video it says,

“What I want to do in this video is give you at least a basic overview of probability. Probability, a word that you’ve probably heard a lot of and you are probably just a little bit familiar with it. Hopefully this will get you a little deeper understanding.”

Later in the video there is a section on the idea of large numbers of repetitions, which is one way of understanding probability. But it really is a bit skimpy on why anyone would want to find or estimate a probability, and what the values actually mean. But it was ok.

The first video was about single instances – one toss of a coin or one roll of a die. Then the second video showed you how to answer the questions in the exercises, which involved two dice. This seemed ok, if rather a sudden jump from the first video. Sadly both of these examples perpetuate the common misconception that if there are, say, 6 alternative outcomes, they will necessarily be equally likely.

Exercises

Then we get to some exercises called “Probability Space” , which is not an enormously helpful heading. But my main quest was to have a go at the exercises, so that is what I did. And that was not a good thing. The exercises were not stepped, but started right away with an example involving two dice and the phrase “at least one of”. There was meant to be a graphic to help me, but instead I had the message “scratchpad not available”. I will summarise my concerns about the exercises at the end of my letter. I clicked on a link to a video that wasn’t listed on the left, called Probability Space and got a different kind of video.

This video was better in that it had moving pictures and a script. But I have problems with gambling in videos like this. There are some cultures in which gambling is not acceptable. The other problem I have is with the term  “exact probability”, which was used several times. What do we mean by “exact probability”? How does he know it is exact? I think this sends the wrong message.

Then on to the next videos which were worked examples, entitled “Example: marbles from a bag, Example: Picking a non-blue marble, Example: Picking a yellow marble.” Now I understand that you don’t want to scare students with terminology too early, but I would have thought it helpful to call the second one, “complementary events, picking a non-blue marble”. That way if a student were having problems with complementary events in exercises from school, they could find their way here. But then I’m not sure who your audience is. Are you sure who your audience is?

The first marble video was ok, though the terminology was sloppy.

The second marble video, called “Example: picking a non-blue marble”, is glacially slow. There is a point, I guess in showing students how to draw a bag and marbles, but… Then the next example is of picking numbers at random. Why would we ever want to do this? Then we come to an example of circular targets. This involves some problem-solving regarding areas of circles, and cancelling out fractions including pi. What is this about? We are trying to teach about probablity so why have you brought in some complication involving the area of a circle?

The third marble video attempts to introduce the idea of events, but doesn’t really. By trying not to confuse with technical terms, the explanation is more confusing.

Now onto some more exercises. The Khan model is that you have to get 5 correct in a row in order to complete an exercise. I hope there is some sensible explanation for this, because it sure would drive me crazy to have to do that. (As I heard expressed on Twitter)

What are circular targets doing in with basic probability?

The first example is a circular target one.  I SO could not be bothered working out the area stuff so I used the hints to find the answer so I could move onto a more interesting example. The next example was finding the probability of a rolling a 4 from a fair six sided die. This is trivial, but would have been not a bad example to start with. Next question involve three colours of marbles, and finding the probability of not green. Then another dart-board one. Sigh. Then another dart board one. I’m never going to find out what happens if I get five right in a row if I don’t start doing these properly. Oh now – it gave me circumference. SO can’t be bothered.

And that was the end of Basic probability. I never did find out what happens if I get five correct in a row.

Venn diagrams

The next topic is called “Venn diagrams and adding probabilities “. I couldn’t resist seeing what you would do with a Venn diagram. This one nearly reduced me to tears.

As you know by now, I have an issue with gambling, so it will come as no surprise that I object to the use of playing cards in this example. It makes the assumption that students know about playing cards. You do take one and a half minutes to explain the contents of a standard pack of cards.  Maybe this is part of the curriculum, and if so, fair enough. The examples are standard – the probability of getting a Jack of Hearts etc. But then at 5:30 you start using Venn diagrams. I like Venn diagrams, but they are NOT good for what you are teaching at this level, and you actually did it wrong. I’ve put a comment in the feedback section, but don’t have great hopes that anything will change. Someone else pointed this out in the feedback two years ago, so no – it isn’t going to change.

Khan Venn diagram

This diagram is misleading, as is shown by the confusion expressed in the questions from viewers. There should be a green 3, a red 12, and a yellow 1.

Now Venn diagrams seem like a good approach in this instance, but decades of experience in teaching and communicating complex probabilities has shown that in most instances a two-way table is more helpful. The table for the Jack of Hearts problem would look like this:

Jacks Not Jacks Total
Hearts 1 12 13
Not Hearts 3 36 39
Total 4 48 52

(Any teachers reading this letter – try it! Tables are SO much easier for problem solving than Venn diagrams)

But let’s get down to principles.

The principles of instruction that KA have not followed in the examples:

  • Start easy and work up
  • Be interesting in your examples – who gives a flying fig about two dice or random numbers?
  • Make sure the hardest part of the question is the thing you are testing. This is particularly violated with the questions involving areas of circles.
  • Don’t make me so bored that I can’t face trying to get five in a row and not succeed.

My point

Yes, I do have one. Mr Khan you clearly can’t be stopped, so can you please get some real teachers with pedagogical content knowledge to go over your materials systematically and make them correct. You have some money now, and you owe it to your benefactors to GET IT RIGHT. Being flippant and amateurish is fine for amateurs but you are now a professional, and you need to be providing material that is professionally produced. I don’t care about the production values – keep the stammers and “lellows” in there if you insist. I’m very happy you don’t have background music as I can’t stand it myself. BUT… PLEASE… get some help and make your videos and exercises correct and pedagogically sound.

Dr Nic

PS – anyone else reading this letter, take a look at the following videos for mathematics.

And of course I think my own Statistics Learning Centre videos are pretty darn good as well.

Other posts about concerns about Khan:

Another Open Letter to Sal ( I particularly like the comment by Michael Paul Goldenberg)

Breaking the cycle (A comprehensive summary of the responses to criticism of Khan

Teaching with School League tables

NCEA League tables in the newspaper

My husband ran for cover this morning when he saw high school NCEA (National Certificates of Educational Achievement)  league tables in the Press. However, rather than rave at him yet again, I will grasp the opportunity to expound to a larger audience. Much as I loathe and despise league tables, they are a great opportunity to teach students to explore data rich reports with a critical and educated eye.  There are many lessons to learn from league tables. With good teaching we can help dispell some of the myths the league tables promulgate.

When a report is made short and easy to understand, there is a good chance that much of the ‘truth’ has been lost along with the complexity. The table in front of me lists 55 secondary and area schools from the Canterbury region. These schools include large “ordinary” schools and small specialist schools such as Van Asch Deaf Education Centre and Southern Regional Health School. They include single-sex and co-ed, private, state-funded and integrated. They include area schools which are in small rural communities, which cover ages 5 to 21. The “decile” of each of the schools is the only contextual information given, apart from the name of the school.  (I explain the decile, along with misconceptions at the end of the post.) For each school is given percentages of students passing at the three levels. It is not clear whether the percentages in the newspaper are of participation rate or school roll.

This is highly motivating information for students as it is about them and their school. I had an argument recently with a student from a school which scores highly in NCEA. She was insistent that her friend should change schools from one that has lower scores. What she did not understand was that the friend had some extra learning difficulties, and that the other school was probably more appropriate for her. I tried to teach the concept of added-value, but that wasn’t going in either. However I was impressed with her loyalty to her school and I think these tables would provide an interesting forum for discussion.

Great context discussion

You could start with talking about what the students think will help a school to have high pass rates. This could include a school culture of achievement, good teaching, well-prepared students and good resources. This can also include selection and exclusion of students to suit the desired results, selection of “easy” standards or subjects, and even less rigorous marking of internal assessment. Other factors to explore might be single-sex vs co-ed school, the ethnic and cultural backgrounds of the students, private vs state-funded schools.  All of these are potential explanatory variables. Then you can point out how little of this information is actually taken into account in the table. This is a very common occurrence, with limited space and inclusion of raw data. I suspect at least one school appears less successful because some of the students sit different exams, either Cambridge or International Baccalaureate. These may be the students who would have performed well in NCEA.

Small populations

It would be good to look at the impact of small populations, and populations of very different sizes in the data. Students should think about what impact their behaviour will have on the results of the school, compared with a larger or smaller cohort. The raw data provided by the Ministry of Education does give a warning for small cohorts. For a small school, particularly in a rural area, there may be only a handful of students in year 13, so that one student’s success or failure has a large impact on the outcome. At the other end of the scale, there are schools of over 2000, which will have about 400 students in year 13. This effect is important to understand in all statistical reporting. One bad event in a small hospital, for instance, will have a larger percentage effect than in a large hospital.

Different rules

We hear a lot about comparing apples and oranges. School league tables include a whole fruit basket of different criteria. Schools use different criteria for allowing students into the school, into different courses, and whether they are permitted to sit external standards. Attitudes to students with special educational needs vary greatly. Some schools encourage students to sit levels outside their year level.

Extrapolating from a small picture

What one of the accompanying stories points out is that NCEA is only a part of what schools do. Sometimes the things that are measurable get more attention because it is easier to report in bulk. A further discussion with students could be provoked using statements such as the following, which the students can vote on, and then discuss. You could also discuss what evidence you would need to be able to refute or support them.

  • A school that does well in NCEA level 3 is a good school.
  • Girls’ schools do better than boys’ schools at NCEA because girls are smarter than boys.
  • Country schools don’t do very well because the clever students go to boarding school in the city.
  • Boys are more satisfied with doing just enough to get achieved.

Further extension

If students are really interested you can download the full results from the Ministry of Education website and set up a pivot table on Excel to explore questions.

I can foresee some engaging and even heated discussions ensuing. I’d love to hear how they go.

Short explanation of Decile – see also official website.

The decile rating of the school is an index developed in New Zealand and is a measure of social deprivation. The decile rating is calculated from a combination of five values taken from census data for the meshblocks in which the students reside. A school with a low decile rating of 1 or 2 will have a large percentage of students from homes that are crowded, or whose parents are not in work or have no educational qualifications. A school with a decile rating of 10 will have the fewest students from homes like that. The system was set up to help with targeted funding for educational achievement. It recognises that students from disadvantaged homes will need additional resources in order to give them equal opportunity to learn. However, the term has entered the New Zealand vernacular as a measure of socio-economic status, and often even of worth. A decile 10 school is often seen as a rich school or a “top” school. The reality is that this is not the case.  Another common misconception is that one tenth of the population of school age students is in each of the ten bands. How it really works is that one tenth of schools is in each of the bands. The lower decile schools are generally smaller than other schools, and mostly primary schools. In 2002 there were nearly 40,000 secondary students in decile 10 schools, with fewer than 10,000 in decile 1 schools.

Conceptualising Probability

The problem with probability is that it doesn’t really exist. Certainly it never exists in the past.

Probability is an invention we use to communicate our thoughts about how likely something is to happen. We have collectively agreed that 1 is a certain event and 0 is impossible. 0.5 means that there is just as much chance of something happening as not. We have some shared perception that 0.9 means that something is much more likely to happen than to not happen. Probability is also useful for when we want to do some calculations about something that isn’t certain. Often it is too hard to incorporate all uncertainty, so we assume certainty and put in some allowance for error.

Sometimes probability is used for things that happen over and over again, and in that case we feel we can check to see if our predication about how likely something is to happen was correct. The problem here is that we actually need things to happen a really big lot of times under the same circumstances in order to assess if we were correct. But when we are talking about the probability of a single event, that either will or won’t happen, we can’t test out if we were right or not afterwards, because by that time it either did or didn’t happen. The probability no longer exists.

Thus to say that there is a “true” probability somewhere in existence is rather contrived. The truth is that it either will happen or it won’t. The only way to know a true probability would be if this one event were to happen over and over and over, in the wonderful fiction of parallel universes. We could then count how many times it would turn out one way rather than another. At which point the universes would diverge!

However, for the interests of teaching about probability, there is the construct that there exists a “true probability” that something will happen.

Why think about probability?

What prompted these musings about probability was exploring the new NZ curriculum and companion documents, the Senior Secondary Guide and nzmaths.co.nz.

In Level 8 (last year of secondary school) of the senior secondary guide it says, “Selects and uses an appropriate distribution to solve a problem, demonstrating understanding of the relationship between true probability (unknown and unique to the situation), model estimates (theoretical probability) and experimental estimates.”

And at NZC level 3 (years 5 and 6 at Primary school!) in the Key ideas in Probability it talks about “Good Model, No Model and Poor Model” This statement is referred to at all levels above level 3 as well.

I decided I needed to make sense of these two conceptual frameworks: true-model-experimental and good-poor-no, and tie it to my previous conceptual framework of classical-frequency-subjective.

Here goes!

Delicious Mandarins

Let’s make this a little more concrete with an example. We need a one-off event. What is the probability that the next mandarin I eat will be delicious? It is currently mandarin season in New Zealand, and there is nothing better than a good mandarin, with the desired combination of sweet and sour, and with plenty of juice and a good texture. But, being a natural product, there is a high level of variability in the quality of mandarins, especially when they may have parted company with the tree some time ago.

There are two possible outcomes for my future event. The mandarin will be delicious or it will not. I will decide when I eat it. Some may say that there is actually a continuum of deliciousness, but for now this is not the case. I have an internal idea of deliciousness and I will know. I think back to my previous experience with mandarins. I think about a quarter are horrible, a half are nice enough and about a quarter are delicious (using the Dr Nic scale of mandarin grading). If the mandarin I eat next belongs to the same population as the ones in my memory, then I can predict that there is a 25% probability that the mandarin will be delicious.

The NZ curriculum talks about “true” probability which implies that any value I give to the probability is only a model. It may be a model based on empirical or experimental evidence. It can be based on theoretical probabilities from vast amounts of evidence, which has given us the normal distribution. The value may be only a number dredged up from my soul, which expresses the inner feeling of how likely it is that the mandarin will be delicious, based on several decades of experience in mandarin consumption.

More examples

Let us look at some more examples:

What is the probability that:

  • I will hear a bird on the way to work?
  • the flight home will be safe?
  • it will be raining when I get to Christchurch?
  • I will get a raisin in my first spoonful of muesli?
  • I will get at least one raisin in half of my spoonfuls of muesli?
  • the shower in my hotel room will be enjoyable?
  • I will get a rare Lego ® minifigure next time I buy one?

All of these events are probabilistic and have varying degrees of certainty and varying degrees of ease of modelling.

Easy to model Hard to model
Unlikely Get a rare Lego ® minifigure Raining in Christchurch
No idea Raisin in half my spoonfuls Enjoyable shower
Likely Raisin in first spoonful Bird, safe flight home

And as I construct this table I realise also that there are varying degrees of importance. Except for the flight home, none of those examples matter. I am hoping that a safe flight home has a probability extremely close to 1. I realise that there is a possibility of an incident. And it is difficult to model. But people have modelled air safety and the universal conclusion is that it is safer than driving. So I will take the probability and fly.

Conceptual Frameworks

How do we explain the different ways that probability has been described? I will now examine the three conceptual frameworks I introduced earlier, starting with the easiest.

Traditional categorisation

This is found in some form in many elementary college statistics text books. The traditional framework has three categories –classical or “a priori”, frequency or historical, and subjective.

Classical or “a priori” – I had thought of this as being “true” probability. To me, if there are three red and three white Lego® blocks in a bag and I take one out without looking, there is a 50% chance that I will get a red one. End of story. How could it be wrong? This definition is the mathematically interesting aspect of probability. It is elegant and has cool formulas and you can make up all sorts of fun examples using it. And it is the basis of gambling.

Frequency or historical – we draw on long term results of similar trials to gain information. For example we look at the rate of germination of a certain kind of seed by experiment, and that becomes a good approximation of the likelihood that any one future seed will germinate. And it also gives us a good estimate of what proportion of seeds in the future will germinate.

Subjective – We guess! We draw on our experience of previous similar events and we take a stab at it. This is not seen as a particularly good way to come up with a probability, but when we are talking about one off events, it is impossible to assess in retrospect how good the subjective probability estimate was. There is considerable research in the field of psychology about the human ability or lack thereof to attribute subjective probabilities to events.

In teaching the three part categorisation of sources of probability I had problems with the probability of rain. Where does that fit in the three categories? It uses previous experimental data to build a model, and current data to put into the model, and then a probability is produced. I decided that there is a fourth category, that I called “modelled”. But really that isn’t correct, as they are all models.

NZ curriculum terminology

So where does this all fit in the New Zealand curriculum pronouncements about probability? There are two conceptual frameworks that are used in the document, each with three categories as follows:

True, modelled, experimental

In this framework we start with the supposition that there exists somewhere in the universe a true probability distribution. We cannot know this. Our expressions of probability are only guesses at what this might be. There are two approaches we can take to estimate this “truth”. These two approaches are not independent of each other, but often intertwined.

One is a model estimate, based on theory, such as that the probability of a single outcome is the number of equally likely ways that it can occur over the number of possible outcomes. This accounts for the probability of a red brick as opposed to a white brick, drawn at random. Another example of a modelled estimate is the use of distributions such as the binomial or normal.

In addition there is the category of experimental estimate, in which we use data to draw conclusions about what it likely to happen. This is equivalent to the frequency or historical category above. Often modelled distributions use data from an experiment also. And experimental probability relies on models as well.  The main idea is that neither the modelled nor the experimental estimate of the “true” probability distribution is the true distribution, but rather a model of some sort.

Good model, poor model, no model

The other conceptual framework stated in the NZ curriculum is that of good model, poor model and no model, which relates to fitness for purpose. When it is important to have a “correct” estimate of a probability such as for building safety, gambling machines, and life insurance, then we would put effort into getting as good a model as possible. Conversely, sometimes little effort is required. Classical models are very good models, often of trivial examples such as dice games and coin tossing. Frequency models aka experimental models may or may not be good models, depending on how many observations are included, and how much the future is similar to the past. For example, a model of sales of slide rules developed before the invention of the pocket calculator will be a poor model for current sales. The ground rules have changed. And a model built on data from five observations of is unlikely to be a good model. A poor model is not fit for purpose and requires development, unless the stakes are so low that we don’t care, or the cost of better fitting is greater than the reward.

I have problems with the concept of “no model”. I presume that is the starting point, from which we develop a model or do not develop a model if it really doesn’t matter. In my examples above I include the probability that I will hear a bird on the way to work. This is not important, but rather an idle musing. I suspect I probably will hear a bird, so long as I walk and listen. But if it rains, I may not. As I am writing this in a hotel in an unfamiliar area I have no experience on which to draw. I think this comes pretty close to “no model”. I will take a guess and say the probability is 0.8. I’m pretty sure that I will hear a bird. Of course, now that I have said this, I will listen carefully, as I would feel vindicated if I hear a bird. But if I do not hear a bird, was my estimate of the probability wrong? No – I could assume that I just happened to be in the 0.2 area of my prediction. But coming back to the “no model” concept – there is now a model. I have allocated the probability of 0.8 to the likelihood of hearing a bird. This is a model. I don’t even know if it is a good model or a poor model. I will not be walking to work this way again, so I cannot even test it out for the future, and besides, my model was only for this one day, not for all days of walking to work.

So there you have it – my totally unscholarly musings on the different categorisations of probability.

What are the implications for teaching?

We need to try not to perpetuate the idea that probability is the truth. But at the same time we do not wish to make students think that probability is without merit. Probability is a very useful, and at times highly precise way of modelling and understanding the vagaries of the universe. The more teachers can use language that implies modelling rather than rules, the better. It is common, but not strictly correct to say, “This process follows a normal distribution”. As Einstein famously and enigmatically said, “God does not play dice”. Neither does God or nature use normal distribution values to determine the outcomes of natural processes. It is better to say, “this process is usefully modelled by the normal distribution.”

We can have learning experiences that help students to appreciate certainty and uncertainty and the modelling of probabilities that are not equi-probable. Thanks to the overuse of dice and coins, it is too common for people to assess things as having equal probabilities. And students need to use experiments.  First they need to appreciate that it can take a large number of observations before we can be happy that it is a “good” model. Secondly they need to use experiments to attempt to model an otherwise unknown probability distribution. What fun can be had in such a class!

But, oh mathematical ones, do not despair – the rules are still the same, it’s just the vigour with which we state them that has changed.

Comment away!

Post Script

In case anyone is interested, here are the outcomes which now have a probability of 1, as they have already occurred.

  • I will hear a bird on the way to work? Almost the minute I walked out the door!
  • the flight home will be safe? Inasmuch as I am in one piece, it was safe.
  • it will be raining when I get to Christchurch? No it wasn’t
  • I will get a raisin in my first spoonful of muesli? I did
  • I will get at least one raisin in half of my spoonfuls of muesli? I couldn’t be bothered counting.
  • the shower in my hotel room will be enjoyable? It was okay.
  • I will get a rare Lego minifigure next time I buy one? Still in the future!

Oh Ordinal data, what do we do with you?

What can you do with ordinal data? Or more to the point, what shouldn’t you do with ordinal data?

First of all, let’s look at what ordinal data is.

It is usual in statistics and other sciences to classify types of data in a number of ways. In 1946, Stanley Smith Stevens suggested a theory of levels of measurement, in which all measurements are classified into four categories, Nominal, Ordinal, Interval and Ratio. This categorisation is used extensively, and I have a popular video explaining them. (Though I group Interval and Ratio together as there is not much difference in their behaviour for most statistical analysis.)

Nominal is pretty straight-forward. This category includes any data that is put into groups, in which there is no inherent order. Examples of nominal data are country of origin, sex, type of cake, or sport. Similarly it is pretty easy to explain interval/ratio data. It is something that is measured, by length, weight, time (duration), cost and similar. These two categorisations can also be given as qualitative and quantitative, or non-parametric and parametric.

Ordinal data

But then we come to ordinal level of measurement. This is used to describe data that has a sense of order, but for which we cannot be sure that the distances between the consecutive values are equal. For example, level of qualification has a sense of order

  • A postgraduate degree is higher than
  • a Bachelor’s degree,which is higher than
  • a high-school qualification, which is higher
  • than no qualification.

There are four steps on the scale, and it is clear that there is a logical sense of order. However, we cannot sensibly say that the difference between no qualification and a high-school qualification is equivalent to the difference between the high-school qualification and a bachelor’s degree, even though both of those are represented by one step up the scale.

Another example of ordinal level of measurement is used extensively in psychological, educational and marketing research, known as a Likert scale. (Though I believe the correct term is actually Likert item – and according to Wikipedia, the pronunciation should be Lick it, not Like it, as I have used for some decades!). A statement is given, and the response is given as a value, often from 1 to 5, showing agreement to the statement. Often the words “Strongly agree, agree, neutral, disagree, strongly disagree” are used. There is clearly an order in the five possible responses. Sometimes a seven point scale is used, and sometimes the “neutral” response is eliminated in an attempt to force the respondent to commit one way or the other.

The question at the start of this post has an ordinal response, which could be perceived as indicating how quantitative the respondent believes ordinal data to be.

What prompted this post was a question from Nancy under the YouTube video above, asking:

“Dr Nic could you please clarify which kinds of statistical techniques can be applied to ordinal data (e.g. Likert-scale). Is it true that only non-parametric statistics are possible to apply?”

Well!

As shown in the video, there are the purists, who are adamant that ordinal data is qualitative. There is no way that a mean should ever be calculated for ordinal, data, and the most mathematical thing you can do with it is find the median. At the other pole are the practical types, who happily calculate means for any ordinal data, without any concern for the meaning (no pun intended.)

There are differing views on finding the mean for ordinal data.

There are differing views on finding the mean for ordinal data.

So the answer to Nancy would depend on what school of thought you belong to.

Here’s what I think:

All ordinal data is not the same. There is a continuum of “ordinality” if you like.

There are some instances of ordinal data which are pretty much nominal, with a little bit of order thrown in. These should be distinguished from nominal data, only in that they should always be graphed as a bar chart (rather than a pie-chart)* because there is inherent order. The mode is probably the only sensible summary value other than frequencies. In the examples above, I would say that “level of qualification” is only barely ordinal. I would not support calculating a mean for the level of qualification. It is clear that the gaps are not equal, and additionally any non-integer result would have doubtful interpretation.

Then there are other instances of ordinal data for which it is reasonable to treat it as interval data and calculate the mean and median. It might even be supportable to use it in a correlation or regression. This should always be done with caution, and an awareness that the intervals are not equal.

Here is an example for which I believe it is acceptable to use the mean of an ordinal scale. At the beginning and the end of a university statistics course, the class of 200 students is asked the following question: How useful do you think a knowledge of statistics is will be to you in your future career? Very useful, useful, not useful.

Now this is not even a very good Likert question, as the positive and negative elements are not balanced. There are only three choices. There is no evidence that the gaps between the elements are equal. However if we score the elements as 3,2 and 1, respectively and find that the mean for the 200 students is 1.5 before the course, and 2.5 after the course, I would say that there is meaning in what we are reporting. There are specific tests to use for this – and we could also look at how many students changed their minds positively or negatively. But even without the specific test, we are treating this ordinal data as something more than qualitative. What also strengthens the evidence for doing this is that the test is performed on the same students, who will probably perceive the scale in the same way each time, making the comparison more valid.

So what I’m saying is that it is wrong to make a blanket statement that ordinal data can or can’t be treated like interval data. It depends on meaning and number of elements in the scale.

What do we teach?

And again the answer is that it depends! For my classes in business statistics I told them that it depends. If you are teaching a mathematical statistics class, then a more hard line approach is justified. However, at the same time as saying, “you should never calculate the mean of ordinal data”, it would be worthwhile to point out that it is done all the time! Similarly if you teach that it is okay to find the mean of some ordinal data, I would also point out that there are issues with regard to interpretation and mathematical correctness.

Please comment!

Foot note on Pie charts

*Yes, I too eschew pie-charts, but for two or three categories of nominal data, where there are marked differences in frequency, if you really insist, I guess you could possibly use them, so long as they are not 3D and definitely not exploding. But even then, a barchart is better. – perhaps a post for another day, but so many have done this.

Which comes first – problem or solution?

In teaching it can be difficult to know whether to start with a problem or a solution method. It seems more obvious to start with the problem, but sometimes it is better to introduce the possibility of the solution before posing the problem.

Mathematics teaching

A common teaching method in mathematics is to teach the theory, followed by applications. Or not followed by applications. I seem to remember learning a lot of mathematics with absolutely no application – which was fine by me, because it was fun. My husband once came home from survey school, and excitedly told me that he was using complex numbers for some sort of transformation between two irregular surfaces. Who’d have thought? I had never dreamed there could be a real-life use for the square root of -1. I just thought it was a cool idea someone thought up for the heck of it.

But yet again we come to the point that statistics and operations research are not mathematics. Without context and real-life application they cease to exist and turn into … mathematics!

Applicable mathematics

My colleague wrote a guest post about “applicable mathematics” which he separates from “applied mathematics”. Applicable maths appears when teachers make up applications to try to make mathematics seem useful. There is little to recommend about applicable maths. A form of “applicable maths” occurs in probability assessment questions where the examiner decides not to tell the examinee all the information, and the examinee has to draw Venn diagrams and use logical thinking to find out something that clearly anyone in the real world would be able to read in the data! I actually enjoy answering questions like that, and they have a point in helping students understand the underlying structure of the data. But I do not fool myself into thinking that they are anywhere near real-life. Nor are they statistics.

Which first – theory or application?

So the question is – when teaching statistics and operations research, should you start with an application or a problem or a case, and work from there to the theory? Or do students need some theory, or at least an understanding of basic principles before a case or problem can have any meaning? Or in a sequence of learning do we move back and forward between theory and application?

My first off response is that of course we should start with the data, as many books on the teaching of statistics teach us. Well actually we should start with the problem, as that really precedes the collection of the data. But then, how can we know what sorts of problems to frame if we don’t have some idea of what is possible through modelling and statistics? So should we first begin with some theory? The New Zealand Curriculum emphasises the PPDAC cycle, Problem, Plan, Data, Analysis, Conclusion. However, in order to pose the problem in the first place, we need the theory of the PPDAC cycle itself. The answer is not simple and depends on the context.

I have recently made a set of three videos explaining confidence intervals and bootstrapping. These are two very difficult topics that become simple in an instant. What I mean by that is, until you understand a confidence interval, it makes no sense, and you can see no reason why it should make sense. You go through a “liminal space” of confusion and anxiety. Then when you emerge out the other side, instantly confidence intervals make sense, and it is equally difficult to see what it was that made them confusing. This dichotomy makes teaching difficult, as the teacher needs to try to understand what made the problem confusing.

I present the idea of a confidence interval first. Then I use examples. I present the idea of bootstrapping, then give examples. I think in this instance it is helpful to delineate the theory or the idea in reasonably abstract form, interspersed with examples. I also think diagrams are immensely useful, but that’s another topic.

Critique of AtMyPace: Statistics

What prompted these thoughts about “which comes first” was a comment made about our “AtMyPace: Statistics” iOS app.


The YouTube videos used in AtMyPace:Statistics were developed to answer specific needs in a course. They generally take the format of a quick summary of the theory, followed by an example, often related to Helen and her business selling choconutties.

The iOS app, AtMyPace:Statistics was set up as a way to capitalise on the success of the YouTube videos, and we added two quizzes of ten True/false questions to complement each of the videos. We also put these same quizzes in our on-line course and found that they were surprisingly popular. In a way, they are a substitute for a textbook or notes, but require the person to commit one way or the other to an answer before reading a further explanation. We had happened on a effective way of engaging students with the material.

AtMyPace:Statistics is not designed to be a full course in statistics, but rather a tool to help students who might be struggling with concepts. We have also developed a web-based version of AtMyPace:Statistics for those who are not the happy owners of iOS devices. At present the web version is a copy of the app, but we will happily add other questions and activities when the demand arises.

I received the following critique of the AtMyPace: Statistics app:

“They are nicely done but very classical in scope. The approach is tools-oriented using a few “realistic” examples to demonstrate the tool. This could work for students who need to take exams and want accessible material.”

Very true. The material in AtMyPace:Statistics is classical in scope, as we focus on the material currently being taught in most business schools and first year statistics service courses. We are trying to make a living, and once that is happening we will set out to change the world!

The reviewer continues,

“ I think that in adult education you should reverse the order and have the training problem oriented. Take a six sigma DMAIC process as an example. The backbone is a problem scheduled to be solved. The path is DMAIC and the tools are supporting the journey. If you want to do it that way you need to tailor the problem to the audience. “

In tailored adult education it is likely that a problem-based approach will work. I would strongly recommend it.

I had an interesting discussion some time ago with a young lecturer working in a prestigious case-based MBA programme in North America. The entire MBA is taught using cases, and is popular and successful. My friend had some reservations about case-based teaching for a subject like Operations Research which has a body of skills which are needed as a foundation for analysis. Statistics would be similar. The question is making sure the students have the necessary skills and knowledge, with the ability to transfer to another setting or problem. Case-based learning is not an efficient way to accomplish this.

Criticism on Choosing the Test procedure

In another instance, David Munroe commented on our video “Choosing which statistical test to use”, which receives about 1000 views a week.  In the video I suggest a three step process involving thinking about what kind of data we have, what kind of sample, and the purpose of the analysis. The comment was:

Myself I would put purpose first. :) The purpose of the analysis determines what data should be collected – and more data is not necessarily more informative. In my view it is more useful to think ‘what am I trying to achieve’ with this analysis before collecting the data (so the right data have a chance to be collected). This in contrast to: collecting the data and then going ‘now what can I get from this data?’ (although this is sometimes an appropriate research technique). I think because we’ve already collected the data any time we’re illustrating particular modelling tools or statistical tests, we reinforce the ‘collect the data first then worry about analysis’ approach – at least subconsciously.

Thanks David! Good thinking, and if I ever redo the video I may well change the order. I chose the order I did, as it seemed to go from easy to difficult. (Actually I don’t remember consciously thinking about the order – it just fell out of individual help sessions with students.)  And the diagram was developed in response to the rather artificial problems I was posing!

I’ll step back a bit and explain. One problem I have seen in teaching Statistics and Operations Research is that students fail to make connections. They also compartmentalise the different aspects and find it difficult to work out when certain procedures would be most useful. I wrote a post about this. In the statistics course I wrote a set of scenarios describing possible applications of statistical methods in a business context. The students were required to work out which technique to use in each scenario and found this remarkably difficult. They could perform a test on difference of two means quite well, but were hard-pressed to discern when the test should be used. So I made up even more questions to give them more practice, and designed my three step method for deciding on the test.  This helped.

I had not thought of it as a way to decide in a real-life situation which test to use. Surely that would be part of a much bigger process.  So my questions are rather artificial, but that doesn’t make them bad questions. Their point was to help students make linkages between different parts of the course. And for that, it works.

Bring on the criticism

I would like to finish by saying how much I appreciate criticism. It is nice when people tell me they like my materials. I feel as if I am doing something useful and helping people. I get frequent comments of this type on my YouTube site.  But when people make the effort to point out gaps and flaws in the material I am extremely grateful as it helps me to clarify my thinking and improve the approach. If nothing else, it gives me something to talk about in my blog. It is difficult producing material in a feedback vacuum.  So keep it coming!

Context – if it isn’t fun…

The role of context in statistical analysis

The wonderful advantage of teaching statistics is the real-life context within which any applicaton must exist. This can also be one of the difficulties. Statistics without context is merely the mathematics of statistics, and is sterile and theoretical.  The teaching of statistics requires real data. And real data often comes with a fairly solid back-story.

One of the interesting aspects for practicing statisticians, is that they can find out about a wide range of applications, by working in partnership with specialists. In my statistical and operations research advising I have learned about a range of subjects, including the treatment of hand injuries, children’s developmental understanding of probability, the bed occupancy in public hospitals, the educational needs of blind students, growth rates of vegetables, texted comments on service at supermarkets, killing methods of chickens, rogaine route choice, co-ordinating scientific expeditions to Antarctica and the cost of care for neonatals in intensive care. I found most of these really interesting and was keen to work with the experts on these projects. Statisticians tend to work in teams with specialists in related disciplines.

Learning a context can take time

When one is part of a long-term project, time spent learning the intricacies of the context is well spent. Without that, the meaning from the data can be lost. However, it is difficult to replicate this in the teaching of statistics, particularly in a general high school or service course. The amount of time required to become familiar with the context takes away from the time spent learning statistics. Too much time spent on one specific project or area of interest can mean that the students are unable to generalise. You need several different examples in order to know what is specific to the context and what is general to all or most contexts.

One approach is to try to have contexts with which students are already familiar. This can be enabled by collecting the data from the students themselves. The Census at School project provides international data for students to use in just this way. This is ideal, in that the context is familiar, and yet the data is “dirty” enough to provide challenges and judgment calls.

Some teachers find that this is too low-level and would prefer to use biological data, or dietary or sports data from other sources. I have some reservations about this. In New Zealand the new statistics curriculum is in its final year of introduction, and understandably there are some bedding-in issues. One I perceive is the relative importance of the context in the students’ reports. As these reports have high-stakes grades attached to them, this is an issue. I will use as an example the time series “standard”. The assessment specification states, among other things, “Using the statistical enquiry cycle to investigate time series data involves: using existing data sets, selecting a variable to investigate, selecting and using appropriate display(s), identifying features in the data and relating this to the context, finding an appropriate model, using the model to make a forecast, communicating findings in a conclusion.”

The full “standard” is given here: Investigate Time Series Data This would involve about five weeks of teaching and assessment, in parallel with four other subjects.(The final 3 years of schooling in NZ are assessed through the National Certificate of Educational Achievement (NCEA). Each year students usually take five subject areas, each of which consists of about six “achievement standards” worth between 3 and 6 credits. There is a mixture of internally and externally assessed standards.)

In this specification I see that there is a requirement for the model to be related to the context. This is a great opportunity for teachers to show how models are useful, and their limitations. I would be happy with a few sentences indicating that the student could identify a seasonal pattern and make some suggestions as to why this might relate to the context, followed by a similar analysis of the shape of the trend. However there are some teachers who are requiring students to do independent literature exploration into the area, and requiring references, while forbidding the referencing of Wikipedia.

This concerns me, and I call for robust discussion.

Statistics is not research methods any more than statistics is mathematics. Research methods and standards of evidence vary between disciplines. Clearly the evidence required in medical research will differ from that of marketing research. I do not think it is the place of the statistics teacher to be covering this. Mathematics teachers are already being stretched to teach the unfamiliar material of statistics, and I think asking them and the students to become expert in research methods is going too far.

It is also taking out all the fun.

Keep the fun

Statistics should be fun for the teacher and the students. The context needs to be accessible or you are just putting in another opportunity for antipathy and confusion. If you aren’t having fun, you aren’t doing it right. Or, more to the point, if your students aren’t having fun, you aren’t doing it right.

Some suggestions about the role of context in teaching statistics and operations research

  • Use real data.
  • If the context is difficult to understand, you are losing the point.
  • The results should not be obvious. It is not interesting that year 12 boys weigh more than year 9 boys.
  • Null results are still results. (We aren’t trying for academic publications!)
  • It is okay to clean up data so you don’t confuse students before they are ready for it.
  • Sometimes you should use dirty data – a bit of confusion is beneficial.
  • Various contexts are better than one long project.
  • Avoid the plodding parts of research methods.
  • Avoid boring data. Who gives a flying fish about the relative sizes of dolphin jaws?
  • Wikipedia is a great place to find out the context for most high school statistics analysis. That is where I look. It’s a great starting place for anyone.

Confidence Intervals: informal, traditional, bootstrap

Confidence Intervals

Confidence intervals are needed because there is variation in the world. Nearly all natural, human or technological processes result in outputs which vary to a greater or lesser extent. Examples of this are people’s heights, students’ scores in a well written test and weights of loaves of bread. Sometimes our inability or lack of desire to measure something down to the last microgram will leave us thinking that there is no variation, but it is there. For example we would check the weights of chocolate bars to the nearest gram, and may well find that there is no variation. However if we were to weigh them to the nearest milligram, there would be variation. Drug doses have a much smaller range of variation, but it is there all the same.

You can see a video about some of the main sources of variation – natural, explainable, sampling and due to bias.

When we wish to find out about a phenomenon, the ideal would be to measure all instances. For example we can find out the heights of all students in one class at a given time. However it is impossible to find out the heights of all people in the world at a given time. It is even impossible to know how many people there are in the world at a given time. Whenever it is impossible or too expensive or too destructive or dangerous to measure all instances in a population, we need to take a sample. Ideally we will take a sample that gives each object in the population an equal likelihood of being chosen.

You can see a video here about ways of taking a sample.

When we take a sample there will always be error. It is called sampling error. We may, by chance, get exactly the same value for our sample statistic as the “true” value that exists in the population. However, even if we do, we won’t know that we have.

The sample mean is the best estimate for the population mean, but we need to say how well it is estimating the population mean. For example, say we wish to know the mean (or average) weight of apples in an orchard. We take a sample and find that the mean weight of the apples in the sample  is 153g. If we only took a few apples, it is only a rough idea and we might say we are pretty sure the mean weight of the apples in the orchard is between 143g and 163g. If someone else took a bigger sample, they might be able to say that they are pretty sure that the mean weight of apples in the orchard is between 158g and 166g. You can tell that the second confidence interval is giving us better information as the range of the confidence interval is smaller.

There are two things that affect the width of a confidence interval. The first is the sample size. If we take a really large sample we are getting a lot more information about the population, so our confidence interval will be more exact, or smaller. It is not a one-to-one relationship, but a square-root relationship.  If we wish to reduce the confidence interval by a factor of two, we will need to increase our sample size by a factor of 4.

The second thing to affect the width of a confidence interval is the amount of variation in the population. If all the apples in the orchard are about the same weight, then we will be able to estimate that weight quite accurately. However, if the apples are all different sizes, then it will be harder to be sure that the sample represents the population, and we will have a larger confidence interval as a result.

Three ways to find confidence intervals

Traditional (old-fashioned?) Approach

The standard way of calculating confidence intervals is by using formulas developed on the assumptions of normality and the Central Limit Theorem. These formulas are used to calculate the confidence intervals of means, proportions and slopes, but not for medians or standard deviations. That is because there aren’t nice straight-forward formulas for these. The formulas were developed when there were no computers, and analytical methods were needed in the absence of computational power.

In terms of teaching, these formulas are straight-forward, and also include the concept of level of confidence, which is part of the paradigm. You can see a video teaching the traditional approach to confidence intervals, using Excel to calculate the confidence interval for a mean.

Rule of Thumb

In the New Zealand curriculum at year 12, students are introduced to the concept of inference using an informal method for calculating a confidence interval. The formula is median +/-  1.5 times the interquartile range divided by the square-root of the sample size. There is a similar formula for proportions.

Bootstrapping

Bootstrapping is a very versatile way to find a confidence interval. It has three strengths:

  1. It can be used to calculate the confidence interval for a large range of different parameters.
  2. It uses ALL the information the sample gives us, rather than the summary values
  3. It has been found to aid in understanding the concepts of inference better than the traditional methods.

There are also some disadvantages

  1. Old fogeys don’t like it. (Just kidding) What I mean is that teachers who have always taught using the traditional approach find it difficult to trust what seems like a hit-and-miss method without the familiar theoretical underpinning.
  2. Universities don’t teach bootstrapping as much as the traditional methods.
  3. The common software packages do not include bootstrap confidence intervals.

The idea behind a bootstrap confidence interval is that we make use of the whole sample to represent the population. We take lots and lots of samples of the same size from the original sample. Obviously we need to sample with replacement, or the samples would all be identical. Then we use these repeated samples to get an idea of the distribution of the estimates of the population parameter. We chop the tails off at a given point, and we give the confidence interval.  Voila!

Answers to the disadvantages (burn the straw man?)

  1. There is a sound theoretical underpinning for bootstrap confidence intervals. A good place to start is a previous blog about George Cobb’s work. Either that or – “Trust me, I’m a Doctor!” (This would also include trusting far more knowledgeable people such as Chris Wild and Maxine Pfannkuch, and the team of statistical educators led by Joan Garfield.
  2. We have to start somewhere. Bootstrap methods aren’t used at universities because of inertia. As an academic of twenty years I can say that there is NO PAY OFF for teaching new stuff. It takes up valuable research time and you don’t get promoted, and sometimes you even get made redundant. If students understand what confidence intervals are, and the concept of inference, then learning to use the traditional formulas is trivial. Eventually the universities will shift. I am aware that the University of Auckland now teaches the bootstrap approach.
  3. There are ways to deal with the software package problem. There is a free software interface called “iNZight” that you can download. I believe Fathom also uses bootstrapping. There may be other software. Please let me know of any and I will add them to this post.

In Summary

Confidence intervals involve the concepts of variation, sampling and inference. They are a great way to teach these really important concepts, and to help students be critical of single value estimates. They can be taught informally, traditionally or using bootstrapping methods. Any of the approaches can lead to rote use of formula or algorithm and it is up to teachers to aim for understanding. I’m working on a set of videos around this topic. Watch this space.

Excel, SPSS, Minitab or R?

I often hear this question: Should I use Excel to teach my class? Or should I use R? Which package is the best?

It depends on the class

The short answer is: It depends on your class. You have to ask yourself, what are the attitudes, skills and knowledge that you wish the students to gain in the course. What is it that you want them to feel and do and understand?

If the students are never likely to do any more statistics, what matters most is that they understand the elementary ideas, feel happy about what they have done, and recognise the power of statistical analysis, so they can later employ a statistician.

If the students are strong in programming, such as engineering or computer science students, then they are less likely to find the programming a barrier, and will want to explore the versatility of the package.

If they are research students and need to take the course as part of a research methods paper, then they should be taught on the package they are most likely to use in their research.

Over the years I have taught statistics using Excel, Minitab and SPSS. These days I am preparing materials for courses using iNZight, which is a specifically designed user interface with an R engine. I have dabbled in R, but never had students who are suitable to be taught using R.

Here are my pros and cons for each of these, and when are they most suitable.

Excel

I have already written somewhat about the good and bad aspects of Excel, and the evils of Excel histograms. There are many problems with statistical analysis with Excel. I am told there are parts of the analysis toolpak which are wrong, though I’ve never found them myself. There is no straight-forward way to do a hypothesis test for a mean. The data-handling capabilities of the spreadsheet are fantastic, but the toolpak cannot even deal well with missing values. The output is idiosyncratic, and not at all intuitive. There are programming quirks which should have been eliminated many years ago. For example when you click on a radio button to say where you wish the output to go, the entry box for the data is activated, rather than the one for the output. It requires elementary Visual Basic to correct this, but has never happened. Each time Excel upgrades I look for this small fix, and have repeatedly been disappointed.

So, given these shortcomings, why would you use Excel? Because it is there, because you are helping students gain other skills in spreadsheeting at the same time, because it is less daunting to use a familiar interface. These reasons may not apply to all students. Excel is the best package for first year business students for so many reasons.

PivotTables in Excel are nasty to get your head around, but once you do, they are fantastic. I resisted teaching PivotTables for some years, but I was wrong. They may well be one of the most useful things I have ever taught at university. I made my students create comparative bar charts on Excel, using Pivot-Tables. One day Helen and I will make a video about PivotTables.

Minitab

Minitab is a lovely little package, and has very nice output. Its roots as a teaching package are obvious from the user-friendly presentation of results. It has been some years since I taught with Minitab. The main reason for this is that the students are unlikely ever to have access to Minitab again, and there is a lot of extra learning required in order to make it run.

SPSS

Most of my teaching at second year undergraduate and MBA and Masters of Education level has been with SPSS. Much of the analysis for my PhD research was done on SPSS. It’s a useful package, with its own peculiarities. I really like the data-handling in terms of excluding data, transforming variables and dealing with missing values. It has a much larger suite of analysis tools, including factor analysis, discriminant analysis, clustering and multi-dimensional scaling, which I taught to second year business students and research students.  SPSS shows its origins as a suite of barely related packages, in the way it does things differently between different areas. But it’s pretty good really.

R

R is what you expect from a command-line open-source program. It is extremely versatile, and pretty daunting for an arts or business major. I can see that R is brilliant for second-level and up in statistics, preferably for students who have already mastered similar packages/languages like MatLab or Maple. It is probably also a good introduction to high-level programming for Operations Research students.

iNZight

This brings us to iNZight, which is a suite of routines using R, set in a semi-friendly user interface. It was specifically written to support the innovative New Zealand school curriculum in statistics, and has a strong emphasis on visual representation of data and results. It includes alternatives that use bootstrapping as well as traditional hypothesis testing. The time series package allows only one kind of seasonal model. I like iNZight. If I were teaching at university still, I would think very hard about using it. I certainly would use it for Time Series analysis at first year level. For high school teachers in New Zealand, there is nothing to beat it.

It has some issues. The interface is clunky and takes a long time to unzip if you have a dodgy computer (as I do). The graphics are unattractive. Sorry guys, I HATE the eyeball, and the colours don’t do it for me either. I think they need to employ a professional designer. SOON! The data has to be just right before the interface will accept it. It is a little bit buggy in a non-disastrous sort of way. It can have dimensionality/rounding issues. (I got a zero slope coefficient for a linear regression with an r of 0.07 the other day.)

But – iNZight does exactly what you want it to do, with lots of great graphics and routines to help with understanding. It is FREE. It isn’t crowded with all the extras that you don’t really need. It covers all of the New Zealand statistics curriculum, so the students need only to learn one interface.

There are other packages such as Genstat, Fathom and TinkerPlots, aimed at different purposes. My university did not have any of these, so I didn’t learn them. They may well be fantastic, but I haven’t the time to do a critique just now. Feel free to add one as a comment below!

Protectionism vs empowerment in the teaching of statistics

Where are you on the Fastidiousness Scale?

Sometimes statisticians just have to let go, and accept that some statistical analysis will be done in less than ideal conditions, with fairly dodgy data and more than a few violated assumptions.  Sometimes the wrong graph will be used. Sometimes people will claim causation from association. Just as sometimes people put apostrophes where they should not and misuse the word “comprise”.

When we are teaching, particularly non-majors, we need to think hard about where we sit on the fastidiousness scale. (In my experience just about all statistics teaching is to non-majors, which may say something about the attitudes of people to statistics.)

The fastidiousness scale is best described by its two extremes. At one extreme statistical analysis is performed only by mathematical statisticians, using tools like SAS and R, but only if they know exactly how each formula works (and have preferably proved them as well) and have done small examples by hand. All data is perfectly random, unbiased and representative. We could call this end protectionism.

At the other end of the fastidiousness scale just about anyone can do statistical analysis, using Excel.  They accept that the formulas do what the instructor tells them they do. It is a black box approach. The data goes into the black box, and the results come out. Any graph is better than no graph. Any data is better than no data. THis end is probably best labelled “cavalier”.

Some instructors teach as if the mathematical extreme were the ideal and they reluctantly allow people to do really basic summary statistics so long as the data is random, with a large sample size. They fill their teaching with warnings, and include the correction for small population in their early lectures. This protectionism could be construed as professional snobbery. This is evident in attitudes to the use of Excel for statistical analysis. I accept that the data analysis toolpak in Excel leaves a lot to be desired. (see post about Excel and post about Excel histograms) But at the same time, lots of people have access to Excel and are at home using it. When Excel is used to introduce the statistical concepts it is building on current skills, and empowering people.

Two positions on the scale are protectionism and empowerment.

Protectionism has the advantage that no bad statistical analysis is ever done. Any results that are published are properly explained, and are totally sound with regard to sample size and sampling method, choice of variable, choice of analysis, interpretation and data display. One concern is that the mathematical focus may mean that the practical aspects are neglected.

I do not recommend the cavalier end of the fastidiousness scale either. But somewhere in that direction lies empowerment. The advantages of empowerment are legion! Even if people do bad statistical analysis it is better than none at all. Taking a sample and drawing conclusions from it is better than not taking a sample. As people are empowered to do and understand statistics, they may better understand statistical ideas when they are presented to them in other contexts.

Teaching Statistics to Physios

Some years ago my sister asked me to be a keynote speaker at a hand-therapy conference. At the time I had mainly taught Operations Research and some regression analysis. But it included a free trip to Queenstown away from my children, so how could I resist? I was to do a one-hour plenary session on statistics and an elective workshop on quantitative research methods. It was scheduled first thing in the morning after the “dinner” the night before. Attendance at my session was compulsory if they were to get credit towards their professional accreditation. I did wonder if my sister actually liked me! My audience was over a hundred physiotherapists and occupational therapists who specialise in the treatment of hands, from Australia and New Zealand. They are all clever people, who generally had little knowledge of statistics. I assumed, correctly, that most of them were nervous of statistics. They had been taught by protectionists, and felt afraid, like over-protected children.

I decided to take an approach of empowerment – that all statistics boiled down to a few main ideas and that if they could understand those, they would be able to read academic reports on statistical analysis critically and, with help, do their own research. I taught about levels of data, the concept of sampling, and the meaning of the p-value. I used examples about hands. And I took an enabling, encouraging approach without being patronising.

It worked. The attendees felt empowered, and a large number came to my follow-up workshop.  I don’t know if any of them went on to apply much of what I taught them but I do know that a lot of them changed their attitude to statistical analysis.

Attitudes outlast skills and knowledge

Sometimes we forget that we are teaching attitudes, skills and knowledge- in that order of importance. If our students finish our course feeling that statistics is interesting, possible and relevant, then we have accomplished a great thing. People will forget skills and knowledge, but attitudes stick. If the students know that at one point they knew how to perform a comparison of two means, and that it wasn’t that difficult, if the time comes again, they are more likely to work out how to do it again. They have been empowered!

Imagine if only people who can spell well and write with correct grammar were allowed to write, if  only the best chefs could cook and the rest of us would just watch in awe, if only professional musicians were allowed to play instruments, if only professional sport people were allowed to participate. Just as amateur writers, musicians, sportspeople and chefs have a better appreciation of the true nature of the endeavour, empowered amateur statisticians are in a better position to appreciate the worth and importance of rigorous, fastidious statistical analysis.

Let us cast off the shackles of protectionism and start empowering. Or at least move a little way down the fastidiousness scale when teaching non-majors.