The problem with probability is that it doesn’t really exist. Certainly it never exists in the past.
Probability is an invention we use to communicate our thoughts about how likely something is to happen. We have collectively agreed that 1 is a certain event and 0 is impossible. 0.5 means that there is just as much chance of something happening as not. We have some shared perception that 0.9 means that something is much more likely to happen than to not happen. Probability is also useful for when we want to do some calculations about something that isn’t certain. Often it is too hard to incorporate all uncertainty, so we assume certainty and put in some allowance for error.
Sometimes probability is used for things that happen over and over again, and in that case we feel we can check to see if our predication about how likely something is to happen was correct. The problem here is that we actually need things to happen a really big lot of times under the same circumstances in order to assess if we were correct. But when we are talking about the probability of a single event, that either will or won’t happen, we can’t test out if we were right or not afterwards, because by that time it either did or didn’t happen. The probability no longer exists.
Thus to say that there is a “true” probability somewhere in existence is rather contrived. The truth is that it either will happen or it won’t. The only way to know a true probability would be if this one event were to happen over and over and over, in the wonderful fiction of parallel universes. We could then count how many times it would turn out one way rather than another. At which point the universes would diverge!
However, for the interests of teaching about probability, there is the construct that there exists a “true probability” that something will happen.
Why think about probability?
What prompted these musings about probability was exploring the new NZ curriculum and companion documents, the Senior Secondary Guide and nzmaths.co.nz.
In Level 8 (last year of secondary school) of the senior secondary guide it says, “Selects and uses an appropriate distribution to solve a problem, demonstrating understanding of the relationship between true probability (unknown and unique to the situation), model estimates (theoretical probability) and experimental estimates.”
And at NZC level 3 (years 5 and 6 at Primary school!) in the Key ideas in Probability it talks about “Good Model, No Model and Poor Model” This statement is referred to at all levels above level 3 as well.
I decided I needed to make sense of these two conceptual frameworks: true-model-experimental and good-poor-no, and tie it to my previous conceptual framework of classical-frequency-subjective.
Let’s make this a little more concrete with an example. We need a one-off event. What is the probability that the next mandarin I eat will be delicious? It is currently mandarin season in New Zealand, and there is nothing better than a good mandarin, with the desired combination of sweet and sour, and with plenty of juice and a good texture. But, being a natural product, there is a high level of variability in the quality of mandarins, especially when they may have parted company with the tree some time ago.
There are two possible outcomes for my future event. The mandarin will be delicious or it will not. I will decide when I eat it. Some may say that there is actually a continuum of deliciousness, but for now this is not the case. I have an internal idea of deliciousness and I will know. I think back to my previous experience with mandarins. I think about a quarter are horrible, a half are nice enough and about a quarter are delicious (using the Dr Nic scale of mandarin grading). If the mandarin I eat next belongs to the same population as the ones in my memory, then I can predict that there is a 25% probability that the mandarin will be delicious.
The NZ curriculum talks about “true” probability which implies that any value I give to the probability is only a model. It may be a model based on empirical or experimental evidence. It can be based on theoretical probabilities from vast amounts of evidence, which has given us the normal distribution. The value may be only a number dredged up from my soul, which expresses the inner feeling of how likely it is that the mandarin will be delicious, based on several decades of experience in mandarin consumption.
Let us look at some more examples:
What is the probability that:
- I will hear a bird on the way to work?
- the flight home will be safe?
- it will be raining when I get to Christchurch?
- I will get a raisin in my first spoonful of muesli?
- I will get at least one raisin in half of my spoonfuls of muesli?
- the shower in my hotel room will be enjoyable?
- I will get a rare Lego ® minifigure next time I buy one?
All of these events are probabilistic and have varying degrees of certainty and varying degrees of ease of modelling.
||Easy to model
||Hard to model
||Get a rare Lego ® minifigure
||Raining in Christchurch
||Raisin in half my spoonfuls
||Raisin in first spoonful
||Bird, safe flight home
And as I construct this table I realise also that there are varying degrees of importance. Except for the flight home, none of those examples matter. I am hoping that a safe flight home has a probability extremely close to 1. I realise that there is a possibility of an incident. And it is difficult to model. But people have modelled air safety and the universal conclusion is that it is safer than driving. So I will take the probability and fly.
How do we explain the different ways that probability has been described? I will now examine the three conceptual frameworks I introduced earlier, starting with the easiest.
This is found in some form in many elementary college statistics text books. The traditional framework has three categories –classical or “a priori”, frequency or historical, and subjective.
Classical or “a priori” – I had thought of this as being “true” probability. To me, if there are three red and three white Lego® blocks in a bag and I take one out without looking, there is a 50% chance that I will get a red one. End of story. How could it be wrong? This definition is the mathematically interesting aspect of probability. It is elegant and has cool formulas and you can make up all sorts of fun examples using it. And it is the basis of gambling.
Frequency or historical – we draw on long term results of similar trials to gain information. For example we look at the rate of germination of a certain kind of seed by experiment, and that becomes a good approximation of the likelihood that any one future seed will germinate. And it also gives us a good estimate of what proportion of seeds in the future will germinate.
Subjective – We guess! We draw on our experience of previous similar events and we take a stab at it. This is not seen as a particularly good way to come up with a probability, but when we are talking about one off events, it is impossible to assess in retrospect how good the subjective probability estimate was. There is considerable research in the field of psychology about the human ability or lack thereof to attribute subjective probabilities to events.
In teaching the three part categorisation of sources of probability I had problems with the probability of rain. Where does that fit in the three categories? It uses previous experimental data to build a model, and current data to put into the model, and then a probability is produced. I decided that there is a fourth category, that I called “modelled”. But really that isn’t correct, as they are all models.
NZ curriculum terminology
So where does this all fit in the New Zealand curriculum pronouncements about probability? There are two conceptual frameworks that are used in the document, each with three categories as follows:
True, modelled, experimental
In this framework we start with the supposition that there exists somewhere in the universe a true probability distribution. We cannot know this. Our expressions of probability are only guesses at what this might be. There are two approaches we can take to estimate this “truth”. These two approaches are not independent of each other, but often intertwined.
One is a model estimate, based on theory, such as that the probability of a single outcome is the number of equally likely ways that it can occur over the number of possible outcomes. This accounts for the probability of a red brick as opposed to a white brick, drawn at random. Another example of a modelled estimate is the use of distributions such as the binomial or normal.
In addition there is the category of experimental estimate, in which we use data to draw conclusions about what it likely to happen. This is equivalent to the frequency or historical category above. Often modelled distributions use data from an experiment also. And experimental probability relies on models as well. The main idea is that neither the modelled nor the experimental estimate of the “true” probability distribution is the true distribution, but rather a model of some sort.
Good model, poor model, no model
The other conceptual framework stated in the NZ curriculum is that of good model, poor model and no model, which relates to fitness for purpose. When it is important to have a “correct” estimate of a probability such as for building safety, gambling machines, and life insurance, then we would put effort into getting as good a model as possible. Conversely, sometimes little effort is required. Classical models are very good models, often of trivial examples such as dice games and coin tossing. Frequency models aka experimental models may or may not be good models, depending on how many observations are included, and how much the future is similar to the past. For example, a model of sales of slide rules developed before the invention of the pocket calculator will be a poor model for current sales. The ground rules have changed. And a model built on data from five observations of is unlikely to be a good model. A poor model is not fit for purpose and requires development, unless the stakes are so low that we don’t care, or the cost of better fitting is greater than the reward.
I have problems with the concept of “no model”. I presume that is the starting point, from which we develop a model or do not develop a model if it really doesn’t matter. In my examples above I include the probability that I will hear a bird on the way to work. This is not important, but rather an idle musing. I suspect I probably will hear a bird, so long as I walk and listen. But if it rains, I may not. As I am writing this in a hotel in an unfamiliar area I have no experience on which to draw. I think this comes pretty close to “no model”. I will take a guess and say the probability is 0.8. I’m pretty sure that I will hear a bird. Of course, now that I have said this, I will listen carefully, as I would feel vindicated if I hear a bird. But if I do not hear a bird, was my estimate of the probability wrong? No – I could assume that I just happened to be in the 0.2 area of my prediction. But coming back to the “no model” concept – there is now a model. I have allocated the probability of 0.8 to the likelihood of hearing a bird. This is a model. I don’t even know if it is a good model or a poor model. I will not be walking to work this way again, so I cannot even test it out for the future, and besides, my model was only for this one day, not for all days of walking to work.
So there you have it – my totally unscholarly musings on the different categorisations of probability.
What are the implications for teaching?
We need to try not to perpetuate the idea that probability is the truth. But at the same time we do not wish to make students think that probability is without merit. Probability is a very useful, and at times highly precise way of modelling and understanding the vagaries of the universe. The more teachers can use language that implies modelling rather than rules, the better. It is common, but not strictly correct to say, “This process follows a normal distribution”. As Einstein famously and enigmatically said, “God does not play dice”. Neither does God or nature use normal distribution values to determine the outcomes of natural processes. It is better to say, “this process is usefully modelled by the normal distribution.”
We can have learning experiences that help students to appreciate certainty and uncertainty and the modelling of probabilities that are not equi-probable. Thanks to the overuse of dice and coins, it is too common for people to assess things as having equal probabilities. And students need to use experiments. First they need to appreciate that it can take a large number of observations before we can be happy that it is a “good” model. Secondly they need to use experiments to attempt to model an otherwise unknown probability distribution. What fun can be had in such a class!
But, oh mathematical ones, do not despair – the rules are still the same, it’s just the vigour with which we state them that has changed.
In case anyone is interested, here are the outcomes which now have a probability of 1, as they have already occurred.
- I will hear a bird on the way to work? Almost the minute I walked out the door!
- the flight home will be safe? Inasmuch as I am in one piece, it was safe.
- it will be raining when I get to Christchurch? No it wasn’t
- I will get a raisin in my first spoonful of muesli? I did
- I will get at least one raisin in half of my spoonfuls of muesli? I couldn’t be bothered counting.
- the shower in my hotel room will be enjoyable? It was okay.
- I will get a rare Lego minifigure next time I buy one? Still in the future!