# Proving causation

## Aeroplanes cause hot weather

In Christchurch we have a weather phenomenon known as the “Nor-wester”, which is a warm dry wind, preceding a cold southerly change. When the wind is from this direction, aeroplanes make their approach to the airport over the city. Our university is close to the airport in the direct flightpath, so we are very aware of the planes. A new colleague from South Africa drew the amusing conclusion that the unusual heat of the day was caused by all the planes flying overhead.

Statistics experts and educators spend a lot of time refuting claims of causation. “Correlation does not imply causation” has become a catch cry of people trying to avoid the common trap. This is a great advance in understanding that even journalists (notoriously math-phobic) seem to have caught onto. My own video on important statistical concepts ends with the causation issue. (You can jump to it at 3:51)

So we are aware that it is not easy to prove causation.

In order to prove causation we need a randomised experiment. We need to make random any possible factor that could be associated, and thus cause or contribute to the effect.

There is also the related problem of generalizability. If we do have a randomised experiment, we can prove causation. But unless the sample is also a random representative sample of the population in question, we cannot infer that the results will also transfer to the population in question. This is nicely illustrated in this matrix from The Statistical Sleuth by Fred L. Ramsey and Daniel W Schafer.

The relationship between the type of sample and study and the conclusions that may be drawn.

The top left-hand quadrant is the one in which we can draw causal inferences for the population.

## Causal claims from observational studies

A student posed this question:  Is it possible to prove a causal link based on an observational study alone?

It would be very useful if we could. It is not always possible to use a randomised trial, particularly when people are involved. Before we became more aware of human rights, experiments were performed on unsuspecting human lab rats. A classic example is the Vipeholm experiments where patients at a mental hospital were the unknowing subjects. They were given large quantities of sweets in order to determine whether sugar caused cavities in teeth. This happened into the early 1950s. These days it would not be acceptable to randomly assign people to groups who are made to smoke or drink alcohol or consume large quantities of fat-laden pastries. We have to let people make those lifestyle choices for themselves. And observe. Hence observational studies!

There is a call for “evidence-based practice” in education to follow the philosophy in medicine. But getting educational experiments through ethics committee approval is very challenging, and it is difficult to use rats or fruit-flies to impersonate the higher learning processes of humans. The changing landscape of the human environment makes it even more difficult to perform educational experiments.

To find out the criteria for justifying causal claims in an observational study I turned to one of my favourite statistics text-books, Chance Encounters by Wild and Seber  (page 27). They cite the Surgeon General of the United States. The criteria for the establishment of a cause and effect relationship in an epidemiological study are the following:

1. Strong relationship: For example illness is four times as likely among people exposed to a possible cause as it is for those who are not exposed.
2. Strong research design
3. Temporal relationship: The cause must precede the effect.
4. Dose-response relationship: Higher exposure leads to a higher proportion of people affected.
5. Reversible association: Removal of the cause reduces the incidence of the effect.
6. Consistency: Multiple studies in different locations producing similar effects
7. Biological plausibility: there is a supportable biological mechanism
8. Coherence with known facts.

In high school, and entry-level statistics courses, the focus is often on statistical literacy. This concept of causation is pivotal to correct understanding of what statistics can and cannot claim. It is worth spending some time in the classroom discussing what would constitute reasonable proof and what would not. In particular it is worthwhile to come up with alternative explanations for common fallacies, or even truths in causation. Some examples for discussion might be drink-driving and accidents, smoking and cancer, gender and success in all number of areas, home game advantage in sport, the use of lucky charms, socks and undies. This also ties nicely with probability theory, helping to tie the year’s curriculum together.

This entry was posted in concepts, inference, statistics, teaching and tagged , , , by Dr Nic. Bookmark the permalink.

I love to teach just about anything. My specialties are statistics and operations research. I have insider knowledge on Autism through my family. I have a lovely husband, two grown-up sons, a fabulous daughter-in-law and an adorable grandson. I have several blogs - Learn and Teach Statistics, and Building a Statistics Learning Community, are the main ones.

## 9 thoughts on “Proving causation”

1. One case that might make an interesting teaching example: http://en.wikipedia.org/wiki/Schizophrenia_and_smoking

If you tell students that 80% of people with a certain life-threatening disease smoke, compared to 20% of the general population (US figures, but other countries show similar correlation), I think most would assume that the disease is caused by smoking – after all, we know of so many other diseases where smoking certainly does have a causal role.

But in this case there are several alternate theories for the correlation. One is that the same brain chemistry that predisposes people to schizophrenia may also make nicotine more attractive; another is that nicotine may help alleviate the symptoms of schizophrenia. (See link above for more complexities.)

Whatever the explanation, it’s still a bad thing; schizophrenics suffer from a high rate of heart and lung disease, with smoking likely to be a major contributor in that. But if smoking is a form of self-medication, that has implications for how the issue should best be handled.

2. Dr Nic,

The path from correlation to causation is strewn is a minefield, no less. The conditions you described in this blog post were first iterared, AFAIR, by a British Anesthesiologist at an occupational health conference in 1965. His name is Sir Austin Bradford Hill, and this list erroneously bears his name, :-), as Hill’s Criteria. Hill himself called these as considerations, :-).

3. Hi Dr Nic,
Interesting post. The criteria you mentioned were first suggested by Sir Austin Bradford Hill, a British doctor in 1965. He of course called this list as a list of considerations.

Arin Basu

4. Dr Nic

Great stimulating article as always.

I have an amusing (and true) anecdote that I use when teaching about correlation / causation. In my school, if you look at the leaving grades of our students and correlate them against a whole bunch of baseline data (gender, test scores, reading age etc) – the “best” predictor of leaving grade is none of these but rather the primary (kindergarden) school that the learners went to.

In fact, living in an odd or even numbered house correlates “more” than gender. Taken to the extreme, if you investigate which console your learner has, you find that owning an XBOX vs PS3 is the best correlation of all.

In the UK there is an obsession with the observation that there appears to be a correlation (but read causation) between gender and results — and we have initiatives to “fix” this imbalance.

If my students lean any stats from me, it’s that Correlation is not Causation 😉

Great stuff.

Glen

5. The other day while looking through a high-school maths test, I was reminded that there’s a world of difference between remembering these concepts and internalising them:

Q1: Research on people with Type 2 diabetes finds a strong positive correlation between drinking coffee and high blood sugar/insulin levels.
1a: Does this show that drinking coffee causes high blood sugar/insulin?
1b: Based on this data, what advice would you give to people with Type 2 diabetes?

Answers provided: “No, correlation is not causation”, “Reduce coffee intake”. I sent it back with a suggestion to consider how part (a) might influence the answer to part (b)…

6. Hi,

I am studying the report on “Do vaccines cause autism” and trying to understand causation better. I have been told many things on both sides. I was thinking that causation is probably impossible to prove statistically due to all the variables. Do you have any insight on the studies done by JAMA. Here is a link: http://jama.jamanetwork.com/article.aspx?articleid=2275444

Good videos by the way! Thank you.

• Thanks Daniel. That is a rabbit-hole down which I do not have time to descend just now. I am convinced there is no link between autism and vaccination, but I have not read the studies myself.