Lies and statistics

One of the most famous sayings about statistics is the line: “There are three types of lies, lies, damned lies and statistics.” This was stated by author Mark Twain (Samuel Clements)  and quoted by British statesman Benjamin Disraeli.  There is a book entitled, “How to lie with statistics”. Within high school education students are taught about misleading graphs. It seems clear that statistics and facts are not the same thing. Yet one True/False question many of my students continue to get wrong says “Statistical analysis is an objective science, unaffected by the researcher’s opinions.” The correct response is False, yet 44% of students put True.  Referring back to my earlier post “You’re teaching it wrong”, I realise that I have work to do in helping students to recognise the subjective aspects of statistics.

Two scientists discussing

Statistics is not an objective science

It may be that the students are not sure about what is meant by subjective. Any post-modern researcher realises that very little is objective. We strive in science and analysis for “the facts” unsullied by human interpretation, but objectivity remains elusive in most endeavours. Like it or not, our own world-views affect the decisions we are required to make. We do not see the world as IT is, but rather as we are. Two people seeing the same scene can describe it totally differently, each convinced that he or she is correct and the other in error.

Subjectivity is generally unintentional. As part of the qualitative part of my mixed methods PhD research I was required to include a “statement of bias”, wherein I described my own views and circumstance which may have influenced my understanding of the data.  As my research related to the education of children with vision impairment, it was clear that having a son who is totally blind would affect my interpretation. However it is also important to bear in mind that a person who did NOT have a child with vision impairment would also be influenced by their own circumstances. It was also instructive to see how my views were affected by the research. My opinions of groups of people and circumstances and rights all changed over the right years of the study.

Subjective bias can creep into statistical analysis at all stages. I tell the students that when they read a statistical report it is important to think of the possible biases the person publishing it might have. The choice of significance level at which to reject the null hypothesis is a value judgment. The sample size, questions asked, order of the questions, manner of sampling, data cleaning methods and choice of which aspects to report or ignore are all judgements made by the person performing the test. The way data is represented in graphs and even the choice of vocabulary affect the interpretation of the “facts”. Sometimes the bias may seem clear, such as when funded by a company with vested interests. It is less clear when they are similar to our own biases. It can be difficult to find flaws in research which supports our own opinion.

The presence of subjectivity is important to teach at all levels of statistics, and is one of the places where mathematics and the decision sciences of Statistics and Operations Research part company. Not being a pure mathematician, I can only postulate that pure mathematicians believe that mathematics is objective and free of the taint of human bias.  But with statistics it is possible right from the early stages to point out how different students in a class have shown different things in graphs using the same data. This is exactly when statistics can become really exciting and thought-provoking rather than mechanistic number crunching. This is why statistics may be better taught in a social studies or science class, or at least in a cross-disciplinary setting.

It is not difficult to teach the subjective nature of statistics. It can be brought in as class discussions. Data should ALWAYS be within a context, which then means any discussion or evaluation of outcomes is rooted in the students’ experience and can be further analysed for validity and applicability to real life. It may require an attitude shift, away from the unique and satisfying correctness of mathematics, and it also may need care not to undermine confidence in all statistical analysis. It is important that this is seen as pivotal to statistical analysis, and not the messy stuff that happens around the edges. Case studies are useful for this. As usual in writing my blog I have come up with ideas that would improve my own course! I would love to hear if any of my readers implement any of these ideas.

As part of our first year Management Science paper we include a section on ethics. Students are required to identify possible conflicts of interest in scenarios, and the concept of worldviews is touched on. This is quite difficult for students in their late teens as they tend to be rather naive and “black and white” in their thinking. But to me this is the role of the university – to challenge their ways of thinking so that steam comes out their ears. It may be that the business students we get are less used to playing with ideas in the way that history or arts students may be. Whatever the reason, it is fun to challenge them.

In closing I’d like to say thanks for the support expressed in response to my previous post about the demise of Operations Research at UC. It is a loss to the country, as Mike Trick and others pointed out. And it is a tough time for my colleagues who are now looking for other work. The insights from the discipline of OR are valuable and I hope that the thousands of students we have taught over the years remember the subjective aspects of OR and statistics.

About these ads

4 thoughts on “Lies and statistics

  1. Excellent point about social studies. My second stats class was in political science (taught by Ed Tufte before he become famous). It taught me a healthy distrust for facile analyses.

    I’ve occasionally seen authors use rather obscure tests to reject an hypothesis. I always wonder how many tests they burned through before they got the “right” result.

    • Thanks. I had a maths colleague who insisted that statistics WAS mathematics. I guess to him it was, but I’m not sure he had done much data analysis. The same discussion can ensue about Operations Research, even among practitioners.

      • Oops. “Autos” was “authors” (Swype strikes again).

        When I got my MS in stats, I had a choice of two tracks. I took the less mathematical track, which was pretty much all theorem-proof, with no significant encounters with actual data. The more mathematical track was adventures in measure theory. The first I heard about “data cleaning” was when I actually had to work with messy data, long after the degree was behind me.

        You’re correct about the parallel in OR. I was teaching optimization (as mathematical theory) well before I had meaningful practical experience solving problems (and discovering all the things that can go wrong doing so).

      • I did wonder about the autos! (I’ve fixed it for you.)

        I have seen evidence of a totally separate branch of statistical analysis used only in marketing research. In this it is acceptable to remove the middle sections of your data and perform a chi sq analysis only on the outer quartiles. Not surprisingly this gives an effect that did not show up in correlation.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s