# The concept of “random” is a tough one.

First there is the problem of lexical ambiguity. There are colloquial meanings for random that don’t totally tie in with the technical or domain-specific meanings for random.

Then there is the fact that people can’t actually be random.

Then there is the problem of equal chance vs displaying a long-term distribution.

And there is the problem that there are several conflicting ideas associated with the word “random”.

In this post I will look at these issues, and ask some questions about how we can better teach students about randomness and random sampling. This problem exists for many domain specific terms, that have colloquial meanings that hinder comprehension of the idea in question. You can read about more of these words, and some teaching ideas in the post, Teaching Statistical Language.

## Lexical ambiguity

First there is lexical ambiguity. Lexical ambiguity is a special term meaning that the word has more than one meaning. Kaplan, Rogness and Fisher write about this in their 2014 paper “Exploiting Lexical Ambiguity to help students understand the meaning of Random.” I recently studied this paper closely in order to present the ideas and findings to a group of high school teachers. I found the concept of leveraging lexical ambiguity very interesting. As a useful intervention, Kaplan et al introduced a picture of “random zebras” to represent the colloquial meaning of random, and a picture of a hat to represent the idea of taking a random sample. I think it is a great idea to have pictures representing the different meanings, and it might be good to get students to come up with their own.

So what are the different meanings for random? I consulted some on-line dictionaries.

## Different meanings

## Without method

The first meaning of random describes something happening without pattern, method or conscious decision. An example is “random violence”.

Example: She dressed in a rather random faction, putting on whatever she laid her hand on in the dark.

## Statistical meaning

Most on-line dictionaries also give a statistical definition, which includes that each item has an equal probability of being chosen.

Example: The students’ names were taken at random from a pile, to decide who would represent the school at the meeting.

## Informal or colloquial

One meaning: Something random is either unknown, unidentified, or out of place.

Example: My father brought home some random strangers he found under a bridge.

Another colloquial meaning for random is odd and unpredictable in an amusing way.

Example: My social life is so random!

# People cannot be random

There has been considerable research into why people cannot provide a sequence of random numbers that is like a truly randomly generated sequence. In our minds we like things to be shared out evenly and the series will generally have fewer runs of the same number.

Animals aren’t very random either, it seems. Yesterday I saw a whole lot of sheep in a paddock, and while they weren’t exactly lined up, there was a pretty similar distance between all the sheep.

# Equal chance vs long-term distribution

In the paper quoted earlier, Kaplan et al used the following definition of random:

“We call a phenomenon random if individual outcomes are uncertain, but there is nonetheless a regular distribution of outcomes in a large number of repetitions.” From Moore (2007) The Basic Practice of Statistics.

Now to me, that does not insist that each outcome be equally likely, which matches with my idea of randomness. In my mind, random implies chance, but not equal likelihood. When creating simulation models we would generate random variates following all sorts of distributions. The outcomes would be far from even, but in the long run they would display a distribution similar to the one being modelled.

Yet the dictionaries, and the later parts of the Kaplan paper insist that randomness requires equal opportunity to be chosen. What’s a person to do?

I propose that the meaning of the adjective, “random” may depend on the noun that it is qualifying. There are random samples and random variables. There is also randomisation and randomness.

A random sample is a sample in which each object has an equal opportunity of being chosen, and each choice of object is by chance, and independent of the previous objects chosen. A random variable is one that can take a number of values, and will generally display a pattern of outcomes similar to a given distribution.

I wonder if the problem is that randomness is somehow equated with fairness. Our most familiar examples of true randomness come from gambling, with dice, cards, roulette wheels and lotto balls. In each case there is the requirement that each outcome be equally likely.

Bearing in mind the overwhelming evidence that the “statistical meaning” of randomness includes equality, I begin to think that it might not really matter if people equate randomness with equal opportunity.

However, if you think about medical or hazard risk, the story changes. Apart from known risk increasing factors associated with lifestyle, whether a person succumbs to a disease appears to be random. But the likelihood of succumbing is not equal to the likelihood of not succumbing. Similarly there is a clear random element in whether a future child has a disability known to be caused by an autorecessive gene. It is definitely random, in that there is an element of chance, and that the effects on successive children are independent. But the probability of a disability is one in four. I suppose if you look at the outcomes as being which children are affected, there is an equal chance for each child.

But then think about a “lucky dip” containing many cheap prizes and a few expensive prizes. The choice of prize is random, but there is not an even chance of getting a cheap prize or an expensive prize.

I think I have mused enough. I’m interested to know what the readers think. Whatever the conclusion is, it is clear that we need to spend some time making clear to the students what is meant by randomness, and a random sample.

“A random sample is a sample in which each object has an equal opportunity of being chosen, and each choice of object is by chance, and independent of the previous objects chosen.”

I don’t like this much, Nic. We should at least start with the standard, orthodox definition of a simple random sample of size n from a finite population of size N: it’s chosen in such a way that any of the possible N choose n samples is equally likely.

I number a class of 100 from 1 to 100, and choose a sample of size 50 by tossing a coin, and choosing the the even numbers if it’s a H, and the odd numbers if it’s a tail. That is a “random” (probability mechansim) sample, and each person has the same chance of being in it. But only two possible samples can be chosen, and it’s not a simple random sample.

Also, your definition assumes objects get into the sample sequentially, which need not be the case.

Hi Nic, I haven’t given it much thought, but I wonder if the topic is a little to big from a statistical perspective. “Random” is often used as an adjective in statistics, as in random sample, random variable, missing at random, etc.

A random draw from a standard normal distribution is definitely not one where “each item has an equal probability of being chosen”. In fact numbers nearest to 0 have the greatest probability of being chosen, numbers with an absolute value of 100 or more have an infinitesimally small probability of being chosen.

I guess it depends at what level you are teaching. “Random” is one of those concepts which improves and refines with age 🙂

The best source I ever came across, I was looking at tests for randomness when defining pseudo-random-number generators, was one of Knuth’s volumes. Its a bit heavy going in places, but very interesting.

Cath

“In my mind, random implies chance, but not equal likelihood. When creating simulation models we would generate random variates following all sorts of distributions.”

I think the likelihood of an outcome does not imply the randomness of the event. Suppose you had an urn (yeah its me the urn guy) with 5 red marbles and 10 blue marbles. Selecting any of the marbles is a matter of chance and each has a probability of 1 in 15 of being chosen. But the outcome of choosing a red versus blue marble is different. The selection is “random”, the “likelihood” depends on the definition the desired outcome.

Hi Dr Nic

I enjoy reading your blog, and this time you’ve tempted me into replying as well, with not one but two points.

It’s already been said, but “random” and “equal probability” are quite distinct concepts.

In the area where I used to work (survey sampling from finite populations) that is very obvious. You begin with simple random samples (which may be with or without replacement) as an introduction, but they are almost never used in practice. There is all the fun of stratified sampling, clustered sampling, multi-stage designs, multi-frame designs, multi-phase designs etc to look forward to once you get past the very basics. With all the fancy labels they might seem intimidating to students, but almost all of them arise as solutions to very practical problems, and if approached that way can be quite easy to understand (OK I know I’m not I am not a good test, I worked with it for so long I can’t remember what it was like when I started). For example, how do I make sure I get enough males and females in my sample? – stratify. Is there a correct way to get a better (ugh I hate using the word with defining what I mean but ploughing on anyway) sample when I know there are both big and small businesses out there? – sure stratify by business size and maybe even have a ‘take-all stratum’. What if I don’t know the size of the businesses to begin with ? – well ask them in the first phase of your collection, and then subsample in the second phase (because otherwise you’ll get “too many” small business and “not enough” medium and large ones). What if it costs me a lot to get to where the respondents are so I can ask them questions? – well cluster the design to reduce costs – it usually makes the variance go up but overall it’s more efficient, as long as you weight the sample estimator correspondingly. Because I’m old-fashioned, all these approaches involve random sampling where each unit in the population must have a ‘known non-zero chance of selection’ – but it is almost unheard of for those chances to be equal for every unit.

Second, “unknown” and “random” are distinct concepts.

Things that are random may well begin as unknown, but there are many things that are unknown to me that are certainly not random! Try asking me about the right temperature to bake a cake.

More seriously, and more challengingly, I recall a talk by Persi Diaconis (apologies if I’ve spelt his name wrong). He pointed out that flipping a coin (even a ‘fair’ one if you are brave enough to try and define what that is) is actually a deterministic situation. Coins are large enough that Newtonian mechanics works for them, so if you know (I’ve just slipped in an important three words)the starting position of the coin, it’s orientation, the force applied when it’s flipped, the distance to the floor, etc then the outcome, heads or tails, can be calculated using relatively basic mechanics. He reckoned he used to be able to win a lot of bets, using a machine he had made so the he could control the initial force of the toss etc. Of course, if you don’t know the initial conditions, the outcome is unknown to you before hand, but it’s not actually random, and you’d have been unwise to bet against him. Unfortunately the security guards in modern airports weren’t keen to let him on planes with his device, and it was quite delicate since it had to deliver fairly precise and repeatable tosses, so I didn’t get to actually see it in action.

Anyway, good luck with defining random, and keep the blogs coming.

Regards

Geoff

There are many terms where the mathematical and the common definitions differ. Take “expected”, for example. Mathematically, the expected value of a single roll of a fair dice is 3.5, but for those not mathematically inclined, it doesn’t make sense to “expect” to roll a 3.5.

P.S. “People cannot be random”: not entirely true. According to IMDB, there is a screenwriter called John Random. There may be others.

This MAA article may help:

What Is a Random Sequence by Sérgio B. Volchan

http://www.maa.org/programs/maa-awards/writing-awards/what-is-a-random-sequence

Richard von Mises thought random, chance and probability all involved “insensitivity to place selection” and the “impossibility of a successful gambling system.” This outcome-based description may make a stronger impression on students than the classic probability-based descriptions. For details, see Schield & Burnham (2008). Von Mises’ Frequentist Approach to Probability. Copy at http://www.statlit.org/PDF/2008SchieldBurnhamASA.pdf

Pingback: Teaching random variables and distributions | Learn and Teach Statistics and Operations Research

Pingback: Sampling error and non-sampling error | Learn and Teach Statistics and Operations Research