Graphs – beauty and truth

Graphs – beauty and truth (with apologies to Keats)

A good graph is elegant

I really like graphs. I like the way graphs turn numbers into pictures. A good graph is elegant. It uses a few well-placed lines to communicate what would take a paragraph of text. And like a good piece of literature or art, a good graph continues to give, beyond the first reading. I love looking at my YouTube and WordPress graphs. These graphs tell me stories. The WordPress analytics tell me that when I put up a new post, I get more hits, but that everyday more than 1000 people read one of my posts. The YouTube analytics tell me stories about when people want to know about different aspects of statistics. It is currently the end of the North American school year, and the demand is for my video on Choosing which statistical test to use. Earlier in the year, the video about levels of measurement is the most popular. And not many people view videos about statistics on the 25th of December. I’m happy to report that the YouTube and WordPress graphs are good graphs.

Spreadsheets have made it possible for anyone and everyone to create graphs. I like that graphs are easier to make. Drawing graphs by hand is a laborious task and fraught with error. But sometimes my heart aches when I see a graph used badly. I suspect that this is when a graphic artist has taken control, and the search for beauty has over-ridden the need for truth.

Three graphs spurred me to write this post.

Graph One: Bad-tasting Donut on house occupation

The first was on a website to find out about property values. I must have clicked onto something to find out about the property values in my area, and was taken to the qv website. And this is the graph that disturbed me.

Graphs named after food are seldom a good idea

Sure it is pretty – uses pretty colours and shading, and you can find out what it is saying by looking at the key – with the numbers beside it. But a pie or donut chart should not be used for data which has inherent order. The result here is that the segments are not in order. Or rather they are ordered from most frequent to least frequent, which is not intuitive. Ordinal data is best represented in a bar or column chart. To be honest, most data is best represented in a bar or column chart. My significant other suggested that bar charts aren’t as attractive as pie charts. Circles are prettier than rectangles. Circles are curvy and seem friendlier than straight lines and rectangles. So prettiness has triumphed over truth.

Graph Two: Misleading pictogram (a tautology?)

It may be a little strong to call bad communication lack of truth. Let’s look at another example. In a way it is cheating to cite a pictogram in a post like this. Pictograms are the lowest form of graph and are so often incorrect, that finding a bad one is easier than finding a good one. In the graph below of fatalities it is difficult to work out what one little person represents.

What does one little person represent?

A quick glance, ignoring the numbers, suggests that the road toll in 2014 is just over half what it was in 2012. However, the truth, calculated from the numbers, is that the relative size is 80%. 2012 has 12 people icons, representing 280 fatalities. One icon is removed for 2013, representing a drop of 9 fatalities. 2011 has one icon fewer again, representing a drop of 2 fatalities. There is so much wrong in the reporting of road fatalities, that I will stop here. Perhaps another day…

Graph Three: Mysterious display on Household income

And here is the other graph that perplexed me for some time. It came in the Saturday morning magazine from our newspaper, as part of an article about inequality in New Zealand. Anyone who reads my blog will be aware that my politics place me well left of centre, and I find inequality one of the great ills of the modern day. So I was keen to see what this graph would tell me. And the answer is…

See how long it takes for you to find where you appear on the graph. (Pretending you live in NZ)

I have no idea. Now, I have expertise in the promulgation of statistics, and this graph stumped me for some time. Take a good look now, before I carry on.

I did work out in the end, what was going on in the graph, but it took far longer than it should. This article is aimed at an educated but not particularly statistically literate audience, and I suspect there will be very few readers who spent long enough working out what was going on here. This graph is probably numerically correct. I had a quick flick back to the source of the data (who, by the way, are not to be blamed for the graph, as the data was presented in a table) and the graph seems to be an accurate depiction of the data. However, the graph is so confusing as to be worse than useless. Please post critiques in the comments. This graph commits several crimes. It is difficult to understand. It poses a question and then fails to help the reader find the answer. And it does not provide insights that an educated reader could not get from a table. In fact, I believe it has obscured the data.

Graphs are the main way that statistical analysts communicate with the outside world. Graphs like these ones do us no favours, even if they are not our fault. We need to do better, and make sure that all students learn about graphs.

Teaching suggestion – a graph a day

Here is a suggestion for teachers at all levels. Have a “graph a day” display – maybe for a month? Students can contribute graphs from the news media. Each day discuss what the graph is saying, and critique the way the graph is communicating. I have a helpful structure for reading graphs in my post: There’s more to reading graphs than meets the eye; 

Here is a summary of what I’ve said and what else I could say on the topic.

Thoughts about Statistical Graphs

  • The choice of graph depends on the purpose
  • The text should state the purpose of the graph
  • There is not a graph for everything you wish to communicate
  • Sometimes a table communicates better than a graph
  • Graphs are part of the analysis as well as part of the reporting. But some graphs are better to stay hidden.
  • If it takes more than a few seconds to work out what a graph is communicating it should either be dumped or have an explanation in the text
  • Truth (or communication) is more important than beauty
  • There is beauty in simplicity
  • Be aware than many people are colour-blind, or cannot easily differentiate between different shades.

Feedback from previous post on which graph to use

Late last year I posted four graphs of the same data and asked for people’s opinions. You can link back to the post here and see the responses: Which Graph to Use.

The interesting thing is not which graph was selected as the most popular, but rather that each graph had a considerable number of votes. My response is that it depends.  It depends on the question you are answering or the message you are sending. But yes – I agree with the crowd that Graph A is the one that best communicates the various pieces of information. I think it would be improved by ordering the categories differently. It is not very pretty, but it communicates.

I recently posted a new video on YouTube about graphs. It is a quick once-over of important types of graphs, and can help to clarify what they are about. There are examples of good graphs in there.


I have written about graphs previously and you can find them here on the Collected Works page.

I’m interested in your thoughts. And I’d love to see some beautiful and truthful graphs in the comments.

Play and learning mathematics and statistics

The role of play in learning

I have been reading further about teaching mathematics and came across this interesting assertion:

Play, understood as something frivolous, opposed to work, off-task behaviour, is not welcomed into most mathematics classrooms. But play is exactly what is needed. It is only play that can entice us to the type of repetition that is needed to learn how to inhabit the mathematical landscape and how to create new mathematics.
Friesen(2000) – unpublished thesis, cited in Stordy, Children Count, (2015)

Play and practice

It is an appealing idea that as children play, they have opportunities to engage in repetition that is needed in mastering some mathematical skills. The other morning I decided to do some exploration of prime numbers and factorising even before I got out of bed. (Don’t judge me!). It was fun, and I discovered some interesting properties, and came up with a way of labelling numbers as having two, three and more dimensions. 12 is a three dimensional number, as is 20, whereas 35 and 77 are good examples of two dimensional numbers. As I was thus playing on my own, I was aware that it was practising my tables and honing my ability to think multiplicatively. In this instance the statement from Friesen made sense. I admit I’m not sure what it means to “create new mathematics”. Perhaps that is what I was doing with my 2 and 3 dimensional numbers.

You may be wondering what this has to do with teaching statistics to adults. Bear with…

Traditional vs recent teaching methods for mathematics

Today on Twitter, someone asked what to do when a student says that they like being shown what to do, and then practising on textbook examples. This is the traditional method for teaching mathematics, and is currently not seen as ideal among many maths teachers (particularly those who inhabit the MathTwitterBlogosphere or MTBoS, as it is called). There is strong support for a more investigative, socially constructed approach to learning and teaching mathematics.  I realise that as a learner, I was happy enough learning maths by being shown what to do and then practising. I suspect a large proportion of maths teachers also liked doing that. Khan Academy videos are wildly popular with many learners and far too many teachers because they perpetuate this procedural view of mathematics. So is the procedural approach wrong? I think what it comes down to is what we are trying to teach. Were I to teach mathematics again I would not use “show then practise” as my modus operandi. I would like to teach children to become mathematicians rather than mathematical technicians. For this reason, the philosophies and methods of Youcubed, Dan Meyer and other MTBoS bloggers have appeal.

Play and statistics

Now I want to turn my thoughts to statistics. Is there a need for more play in statistics? Can statistics be playful in the way that mathematics can be playful? Operations Research is just one game after another! Simulation, critical path, network analysis, travelling salesperson, knapsack problem? They are all big games. Probability is immensely playful, but what about statistical analysis? Can and should statistics be playful?

My first response is that there is no play in statistics. Statistics is serious and important, and deals with reality, not joyous abstract ideas like prime numbers and the Fibonacci series – and two and three dimensional numbers.

The excitement of a fresh set of data

But there is that frisson of excitement as you finally finish cleaning your database and a freshly minted set of variables and observations beckons to you, with SPSS, SAS or even Excel at your fingertips. A new set of data is a new journey of discovery. Of course a serious researcher has already worked out a methodical route through her hypotheses… maybe. Or do we mostly all fossick about looking for patterns and insights, growing more and more familiar with the feel of the data, as if we were squeezing it through our fingers? So yes – my experience of data exploration is playful. It is an adventure, with wrong turns, forgetting the path, starting again, finding something only to lose it again and finally saying “enough” and taking a break, not because the data has been exhausted, but because I am.

Writing the report is like cleaning up

Writing up statistical analysis is less exciting. It feels like picking up the gardening tools and putting them away after weeding the garden. Or cleaning the paintbrushes after creating a masterpiece. That was not one of my strengths – finishing and tidying up afterwards. The problem was that I felt I had finished when the original task had been completed – when the weeds had been pulled or the painting completed. In my view, cleaning and putting away the tools was an afterthought that dragged on after the completion of the task, and too often got ignored. Happily I have managed to change my behaviour by rethinking the nature of the weeding task. The weeding task is complete when the weeds are pulled and in the compost and the implements are resting clean and safe where they belong. Similarly a statistical analysis is not what comes before the report-writing, but is rather the whole process, ending when the report is complete, and the data is carefully stored away for another day. I wonder if that is the message we give our students – a thought for another post.

Can statistics be playful?

For I have not yet answered the question. Can statistics be playful in the way that mathematics can be playful? We want to embed play in order to make our task of repetition be more enjoyable, and learning statistics requires repetition, in order to develop skills and learn to differentiate the universal from the individual. One problem is that statistics can seem so serious. When we use databases about global warming, species extinction, cancer screening, crime detection, income discrepancies and similarly adult topics, it can seem almost blasphemous to be too playful about it.

I suspect that one reason our statistics videos on YouTube are so popular is because they are playful.

helen-has-attitude

Helen has an attitude problem

Helen has a real attitude problem and hurls snarky comments at her brother, Luke. The apples fall in an odd way, and Dr Nic pops up in strange places. This playfulness keeps the audience engaged in a way that serious, grown up themes may not. This is why we invented Ear Pox in our video about Risk and screening, because being playful about cancer is inappropriate.

Ear Pox is imaginary disease for which we are studying the screening risk.

Ear Pox is imaginary disease for which we are studying the screening risk.

Dragonistics data cards provide light-hearted data which yields worth-while results.

A set of 240 Dragonistics data cards provides light-hearted data which yields satisfying results.

When I began this post I did not intend to bring it around to the videos and the Dragonistics data cards, but I have ended up there anyway. Maybe that is the appeal of the Dragonistics data cards –  that they avoid the gravitas of true and real grown-up data, and maintain a playfulness that is more engaging than reality. There is a truthiness about them – the two species – green and red dragons are different enough to present as different animal species, and the rules of danger and breath-type make sense. But students may happily play with the dragon cards without fear of ignorance or even irreverence of a real-life context.

What started me thinking about play with regards to learning maths and statistics is our Cat Maths cards. There are just so many ways to play with them that I can see Cat Maths cards playing an integral part in a junior primary classroom. This is why we created them and want them to make their way into classrooms. Sadly, our Kickstarter campaign was unsuccessful, but we hope to work with an established game manufacturer to bring them to the market by the end of 2017.

We'd love your help.

We’d love your help.

Your thoughts about play and statistics

And maybe we need to be thinking a little more about the role of play in learning statistics – even for adults! What do you think? Can and should statistics be playful? And for what age group? Do you find statistical analysis fun?

 

What does it mean to understand statistics?

It is possible to get a passing grade in a statistics paper by putting numbers into formulas and words into memorised phrases. In fact I suspect that this is a popular way for students to make their way through a required and often unwanted subject.

Most teachers of statistics would say that they would like students to understand what they are doing. This was a common sentiment expressed by participants in the excellent MOOC, Teaching statistics through data investigations (which is currently running again in January to May 2016.)

Understanding

This makes me wonder what it means for students to understand statistics. There are many levels to understanding things. The concept of understanding has many nuances. If a person understands English, it means that they can use English with proficiency. If they are native speakers they may have little understanding of how grammar works, but they can still speak with correct grammar. We talk about understanding how a car works. I have no idea how a car works, apart from some idea that it requires petrol and the pistons go really, really fast. I can name parts of a car engine, such as distributor and drive shaft. But that doesn’t stop me from driving a car.

Understanding statistics

I propose that when we talk about teaching students to understand statistics, we want our students to know why they are doing something, and have an idea of how it works. Students also need to be fluent in the language of statistics. I would not expect any student of an introductory or high school statistics class to be able to explain how least squares regression works in terms of matrix algebra, but I would expect them to have an idea that the fitted line in a bivariate plot is a model that minimises the squared error terms. I’m not sure anyone needs to know why “degrees of freedom” are called that – or even really what degrees of freedom do. These days computer packages look after degrees of freedom for us. We DO need to understand what a p-value is, and what it is telling us. For many people it is not necessary to know how a p-value is calculated.

Ways to teach statistics

There are several approaches to teaching statistics. The approach needs to be tailored to the students and the context of the course. I prefer a hands-on, conceptual approach rather than a mathematical one. In current literature and practice there is a push for learning through investigations, often based around the statistical inquiry cycle. The problem with one long project is that students don’t get opportunities to apply principles in different situations, in such a way that will help in transfer of learning to other situations. There are some people who still teach statistics through the mathematical formulas, but I fear they are missing out on the opportunity to help students really enjoy statistics.

I do not propose to have all the answers, but we did discover one way to help students learn, alongside other methods. This approach is to use a short video, followed by a ten question true/false quiz. The quiz serves to reinforce and elaborate on concepts taught in the video, challenge students’ misconceptions, and help students be more familiar with the vocabulary and terminology of statistics. The quizzes we develop have multiple questions that randomise to give students the opportunity to try multiple times which seems to help understanding.

This short and entertaining video gives an illustration of how you can use videos and quizzes to help students learn difficult concepts.

And here is a link to a listing of all our videos and how you can get access to them. Statistics Learning Centre Videos

We have just started a newsletter letting people know of new products and hints for teaching. You can sign up here. Sign up for newsletter

Understanding Statistical Inference

Inference is THE big idea of statistics. This is where people come unstuck. Most people can accept the use of summary descriptive statistics and graphs. They can understand why data is needed. They can see that the way a sample is taken may affect how things turn out. They often understand the need for control groups. Most statistical concepts or ideas are readily explainable. But inference is a tricky, tricky idea. Well actually – it doesn’t need to be tricky, but the way it is generally taught makes it tricky.

Procedural competence with zero understanding

I cast my mind back to my first encounter with confidence intervals and hypothesis tests. I learned how to calculate them (by hand  – yes I am that old) but had not a clue what their point was. Not a single clue. I got an A in that course. This is a common occurrence. It is possible to remain blissfully unaware of what inference is all about, while answering procedural questions in exams correctly.

But, thanks to the research and thinking of a lot of really smart and dedicated statistics teachers, we are able put a stop to that. And we must. Help us make great resourcces

We need to explicitly teach what statistical inference is. Students do not learn to understand inference by doing calculations. We need to revisit the ideas behind inference frequently. The process of hypothesis testing, is counter-intuitive and so confusing that it spills its confusion over into the concept of inference. Confidence intervals are less confusing so a better intermediate point for understanding statistical inference. But we need to start with the concept of inference.

What is statistical inference?

The idea of inference is actually not that tricky if you unbundle the concept from the application or process.

The concept of statistical inference is this –

We want to know stuff about a large group of people or things (a population). We can’t ask or test them all so we take a sample. We use what we find out from the sample to draw conclusions about the population.

That is it. Now was that so hard?

Developing understanding of statistical inference in children

I have found the paper by Makar and Rubin, presenting a “framework for thinking about informal statistical inference”, particularly helpful. In this paper they summarise studies done with children learning about inference. They suggest that “ three key principles … appeared to be essential to informal statistical inference: (1) generalization, including predictions, parameter estimates, and conclusions, that extend beyond describing the given data; (2) the use of data as evidence for those generalizations; and (3) employment of probabilistic language in describing the generalization, including informal reference to levels of certainty about the conclusions drawn.” This can be summed up as Generalisation, Data as evidence, and Probabilistic Language.

We can lead into informal inference early on in the school curriculum. The key Ideas in the NZ curriculum suggest that “ teachers should be encouraging students to read beyond the data. Eg ‘If a new student joined our class, how many children do you think would be in their family?’” In other words, though we don’t specifically use the terms population and sample, we can conversationally draw attention to what we learn from this set of data, and how that might relate to other sets of data.

Explaining directly to Adults

When teaching adults we may use a more direct approach, explaining explicitly, alongside experiential learning to understanding inference. We have just completed made a video: Understanding Inference. Within the video we have presented three basic ideas condensed from the Five Big Ideas in the very helpful book published by NCTM, “Developing Essential Understanding of Statistics, Grades 9 -12”  by Peck, Gould and Miller and Zbiek.

Ideas underlying inference

  • A sample is likely to be a good representation of the population.
  • There is an element of uncertainty as to how well the sample represents the population
  • The way the sample is taken matters.

These ideas help to provide a rationale for thinking about inference, and allow students to justify what has often been assumed or taught mathematically. In addition several memorable examples involving apples, chocolate bars and opinion polls are provided. This is available for free use on YouTube. If you wish to have access to more of our videos than are available there, do email me at n.petty@statslc.com.

Please help us develop more great resources

We are currently developing exciting innovative materials to help students at all levels of the curriculum to understand and enjoy statistical analysis. We would REALLY appreciate it if any readers here today would help us out by answering this survey about fast food and dessert. It will take 10 minutes at a maximum. We don’t mind what country you are from, and will do the currency conversions.  And in a few months I will let you know how we got on. and we would love you to forward it to your friends and students to fill it out also – the more the merrier! It is an example of a well-designed questionnaire, with a meaningful purpose.

 

 

Introducing Probability

I have a guilty secret. I really love probability problems. I am so happy to be making videos about probability just now, and conditional probability and distributions and all that fun stuff. I am a little disappointed that we won’t be doing decision trees with Bayesian review, calculating EVPI. That is such fun, but I gave up teaching that some years ago.

The reason probability is fun is because it is really mathematics, and puzzles and logic. I love permutations and combinations too – there is something cool about working out how many ways something can happen.

So why should I feel guilty? Well, in all honesty I have to admit that there is very little need for most of that in a course about statistics at high-school or entry level university. When I taught statistical methods for management, we did some probability, but only from an applied viewpoint, and we never touched intersection and union signs or anything like that. We applied some distributions, but without much theoretical underpinning.

The GAISE (Guidelines for Assessment and Instruction in Statistics Education) Report says, “Teachers and students must understand that statistics and probability are not the same. Statistics uses probability, much as physics uses calculus.”

The question is, why do we teach probability – apart from the fact that it’s fun and makes a nice change from writing reports on time series and bivariate analysis, inference and experiments. The GAISE report also says, “Probability is an important part of any mathematical education. It is a part of mathematics that enriches the subject as a whole by its interactions with other uses of mathematics. Probability is an essential tool in applied mathematics and mathematical modeling. It is also an essential tool in statistics.”

The concept of probability is as important as it is misunderstood. It is vital to have an understanding of the nature of chance and variation in life, in order to be a well-informed, (or “efficient”) citizen. One area in which this is extremely important is in understanding risk and relative risk. When a person is told that their chances of dying of some rare disease have just doubled, it is important that they know that it may be because they have gone from one chance in a million to two chances in a million. Sure it has doubled, but it still is pretty trivial. An understanding of probability is also important in terms of gambling and resistance to the allures of games of chance. And more socially acceptable gambling, such as stockmarket trading, also requires an understanding of chance and variation.

The concept of probability is important, and a few rules of probability may help with understanding, but I suspect the mathematicians get carried away and create problems that are unlikely (probability close to zero) to ever occur in reality. Anything requiring a three-way Venn Diagram has moved from applied problem to logic puzzle.This is in stark contrast to the very applied data-driven approach used in teaching statistics in New Zealand.

Teaching Probability

The traditional approach to teaching probability is to start with the coin and the dice and the balls in the urns. As well as being mind-bogglingly boring and pointless, this also projects an artificial certainty about the probabilities, which is confusing when we start discussing models. If you look at the Khan Academy videos (but don’t) you will find trivial examples about coloured balls or sweets or strangely complex problems involving hitting a circular target. The traditional approach is also to teach probability as truth. “The probability of getting a boy is one-half”. What does that even mean?

I am currently reading the new Springer volume, Probabilistic Thinking, and intend to write a review and post it on this blog, if I can get through enough before my review copy expires. It is inspiring and surprisingly gripping (but I don’t think that is enough of a review to earn me a hard copy to keep.). There are many great ideas for teaching in it, that I hope to pass on in due time.

The New Zealand approach to teaching probability comes from a modelling perspective, right from the start. At level 1, the first two years of schooling, children are exploring chance situations, playing games with a chance element and describing possible outcomes. By years 5 and 6 they are assigning numeric values to the likelihood of an occurrence. They (in the curriculum) are being introduced to model estimates and experimental estimates of probability. Bearing in mind how difficult high school maths teachers are finding the new approach, I don’t have a lot of confidence that the primary teachers are equipped yet to make the philosophical changes, let alone enact them in the classroom.

A helpful structure for analysing graphs

Mathematicians teaching English

“I became a maths teacher so I wouldn’t have to mark essays”
“I’m having trouble getting the students to write down their own ideas”
“When I give them templates I feel as if it’s spoon-feeding them”

These are comments I hear as I visit mathematics teachers who are teaching the new statistics curriculum in New Zealand. They have a point. It is difficult for a mathematics teacher to teach in a different style. But – it can also be rewarding and interesting, and you never get asked, “Where is this useful?”

The statistical enquiry cycle provides a structure for all statistical investigations and learning.

We start with a problem or question, and undergo an investigation, either using extant data, an experiment or observational study to answer the question. Writing skills are key in several stages of the cycle. We need to be able to write an investigative question (or hypotheses). We need to write down a plan, and sometimes an entire questionnaire. We need to write down what we find in the analysis and we need to write a conclusion to answer the original question. That’s a whole heap of writing!

And for teachers who may not be all that happy about writing themselves, and students who chose mathematical subjects to avoid writing, it can be a bridge too far.
In previous posts on teaching report writing I promote the use of templates, and give some teaching suggestions.

In this post I am concentrating on analysing graphs, using a handy acronym, OSEM. OSEM was developed by Jeremy Brocklehurst from Lincoln High School near Christchurch NZ. There are other acronyms that would work just as well, but we like this one, not the least for its link with kiwi culture. We think it is awesome (OSEM). You could Google “o for awesome”, to get the background. OSEM stands for Obvious, Specific, Evidence and Meaning. It is a process to follow, rather than a checklist.

I like the use of O for obvious. I think students can be scared to say what they think might be too obvious, and look for tricky things. By including “obvious” in the process, it allows them to write about the important, and usually obvious features of a graph. I also like the emphasis on meaning, Unless the analysis of the data links back to the context and purpose of the investigation, it is merely a mathematical exercise.

Is this spoon-feeding? Far from it. We are giving students a structure that will help them to analyse any graph, including timeseries, scatter plots, and histograms, as well as boxplots and dotplots. It emphasises the use of quantitative information, linked with context. There is nothing revolutionary about it, but I think many statistics teachers may find it helpful as a way to breakdown and demystify the commenting process.

Class use of OSEM

In a class setting, OSEM is a helpful framework for students to work in groups. Students individually (perhaps on personal whiteboards) write down something obvious about the graph. Then they share answers in pairs, and decide which one to carry on with. In the pair they specify and give evidence for their “obvious” statement. Then the pairs form groups of four, and they come up with statements of meaning, that are then shared with the class as a whole.

Spoon feeding has its place

On a side-note – spoon-feeding is a really good way to make sure children get necessary nutrition until they learn to feed themselves. It is preferable to letting them starve before they get the chance to develop sufficient skills and co-ordination to get the food to their mouths independently.

Those who can, teach statistics

The phrase I despise more than any in popular use (and believe me there are many contenders) is “Those who can, do, and those who can’t, teach.” I like many of the sayings of George Bernard Shaw, but this one is dismissive, and ignorant and born of jealousy. To me, the ability to teach something is a step higher than being able to do it. The PhD, the highest qualification in academia, is a doctorate. The word “doctor” comes from the Latin word for teacher.

Teaching is a noble profession, on which all other noble professions rest. Teachers are generally motivated by altruism, and often go well beyond the requirements of their job-description to help students. Teachers are derided for their lack of importance, and the easiness of their job. Yet at the same time teachers are expected to undo the ills of society. Everyone “knows” what teachers should do better. Teachers are judged on their output, as if they were the only factor in the mix. Yet how many people really believe their success or failure is due only to the efforts of their teacher?

For some people, teaching comes naturally. But even then, there is the need for pedagogical content knowledge. Teaching is not a generic skill that transfers seamlessly between disciplines. You must be a thinker to be a good teacher. It is not enough to perpetuate the methods you were taught with. Reflection is a necessary part of developing as a teacher. I wrote in an earlier post, “You’re teaching it wrong”, about the process of reflection. Teachers need to know their material, and keep up-to-date with ways of teaching it. They need to be aware of ways that students will have difficulties. Teachers, by sharing ideas and research, can be part of a communal endeavour to increase both content knowledge and pedagogical content knowledge.

There is a difference between being an explainer and being a teacher. Sal Khan, maker of the Khan Academy videos, is a very good explainer. Consequently many students who view the videos are happy that elements of maths and physics that they couldn’t do, have been explained in such a way that they can solve homework problems. This is great. Explaining is an important element in teaching. My own videos aim to explain in such a way that students make sense of difficult concepts, though some videos also illustrate procedure.

Teaching is much more than explaining. Teaching includes awakening a desire to learn and providing the experiences that will help a student to learn.  In these days of ever-expanding knowledge, a content-driven approach to learning and teaching will not serve our citizens well in the long run. Students need to be empowered to seek learning, to criticize, to integrate their knowledge with their life experiences. Learning should be a transformative experience. For this to take place, the teachers need to employ a variety of learner-focussed approaches, as well as explaining.

It cracks me up, the way sugary cereals are advertised as “part of a healthy breakfast”. It isn’t exactly lying, but the healthy breakfast would do pretty well without the sugar-filled cereal. Explanations really are part of a good learning experience, but need to be complemented by discussion, participation, practice and critique.  Explanations are like porridge – healthy, but not a complete breakfast on their own.

Why statistics is so hard to teach

“I’m taking statistics in college next year, and I can’t wait!” said nobody ever!

Not many people actually want to study statistics. Fortunately many people have no choice but to study statistics, as they need it. How much nicer it would be to think that people were studying your subject because they wanted to, rather than because it is necessary for psychology/medicine/biology etc.

In New Zealand, with the changed school curriculum that gives greater focus to statistics, there is a possibility that one day students will be excited to study stats. I am impressed at the way so many teachers have embraced the changed curriculum, despite limited resources, and late changes to assessment specifications. In a few years as teachers become more familiar with and start to specialise in statistics, the change will really take hold, and the rest of the world will watch in awe.

In the meantime, though, let us look at why statistics is difficult to teach.

  1. Students generally take statistics out of necessity.
  2. Statistics is a mixture of quantitative and communication skills.
  3. It is not clear which are right and wrong answers.
  4. Statistical terminology is both vague and specific.
  5. It is difficult to get good resources, using real data in meaningful contexts.
  6. One of the basic procedures, hypothesis testing, is counter-intuitive.
  7. Because the teaching of statistics is comparatively recent, there is little developed pedagogical content knowledge. (Though this is growing)
  8. Technology is forever advancing, requiring regular updating of materials and teaching approaches.

On the other hand, statistics is also a fantastic subject to teach.

  1. Statistics is immediately applicable to life.
  2. It links in with interesting and diverse contexts, including subjects students themselves take.
  3. Studying statistics enables class discussion and debate.
  4. Statistics is necessary and does good.
  5. The study of data and chance can change the way people see the world.
  6. Technlogical advances have put the power for real statistical analysis into the hands of students.
  7. Because the teaching of statistics is new, individuals can make a difference in the way statistics is viewed and taught.

I love to teach. These days many of my students are scattered over the world, watching my videos (for free) on YouTube. It warms my heart when they thank me for making something clear, that had been confusing. I realise that my efforts are small compared to what their teacher is doing, but it is great to be a part of it.

The Knife-edge of Competence

I do my own video-editing using a very versatile and complex program called Adobe Premiere Pro. I have had no formal training, and get help by ringing my son, who taught me all I know and can usually rescue me with patient instructions over the phone. At times, especially in the early stages I have felt myself wobbling along the knife-edge of competence. All I needed was for something new to go wrong, or or click a button inadvertently and I would fall off the knife-edge and the whole project would disappear into a mass of binary. This was not without good reason. Premiere Pro wasn’t always stable on our computer, and at one point it took us several weeks to get our hard-drive replaced. (Apple “Time machine” saved me from despair). And sometimes I would forget to save regularly and a morning’s work was lost. (Even time-machine can’t help with that level of incompetence.)

But despite my severe limitations I have managed to edit over twenty videos that now receive due attention (and at times adulation!) on YouTube. It isn’t an easy feeling, to be teetering on the brink of disaster, real or imagined. But there was no alternative, and there is a sense of pride at having made it through with only a few scars and not too much inappropriate language.

There are some things at which I feel totally competent. I can speak to a crowd of any number of people and feel happy that they will be entertained, edified and perhaps even educated. I can analyse data using basic statistical methods. I can teach a person about inference. Performing these tasks is a joy, because I know I have the prerequisite skills and knowledge to cope with whatever happens. But on the way to getting to this point, I had to walk the knife-edge of competence.

Many teachers of statistics know too well this knife-edge. In New Zealand at present there are a large number of teachers of Year 13 statistics who are teaching about bootstrapping, when their own understanding of it is sketchy. They are teaching how to write statistical reports, when they have never written one themselves. They are assessing statements about statistics that they are not actually sure about. This is a knife-edge. They feel that any minute a student will ask them a question about the content that they cannot answer. These are not beginning teachers, but teachers with years and decades of experience in teaching mathematics and mathematical statistics. But the innovations of the curriculum have put them in an uncomfortable position. Inconsistent, tardy and even incorrect information from the qualification agency is not helping, but that is a story for another day.

In another arena there are professors and lecturers of statistics (in the antipodes we do not throw around the title “professor” with the abandon of our North American cousins) who are extremely competent at statistical mathematics and analysis but who struggle to teach in a satisfactory way. Their knife-edge concerns teaching, appropriate explanation and the generation of effective learning activities and assessments in the absence of any educational training. They fear that someone will realise one day that they don’t really know how to devise learning objectives, and provide fair assessments. I am hoping that this blog is going some way to helping these people to ask for help! Unfortunately the frequent response is avoidance behaviour, which is alarmingly supported by a system that rewards research publications rather than effective educational endeavours.

So what do you do when you are walking the knife-edge of competence?

You do the best you can.

And sometimes you fake it.

I am led to believe there is a gender-divide on this. Some people are better at hiding their incompetence than others, and just about all the people I know like that are men. I had a classmate in my honours year who was at a similar level of competence to me, but he applied for jobs I wouldn’t have contemplated. The fear of being shown up as a fake, or not knowing EXACTLY what to do at any point stopped me from venturing. He horrified me further a few years later when he set up his own company. Nearly three decades, two children and a PhD later I am not so fastidious or “nice” in the Jane Austen meaning of the word. If I think I can probably learn how to do something in time to make a reasonable fist of it and not cause actual harm, I’m likely to have a go. Hence taking my redundancy and running!

When I first lectured in statistics for management,  I did not know much beyond what I was teaching. I lived in fear that someone would ask me a question that I couldn’t answer and I would be revealed as the fake I was. Well you know, it never happened! I even taught students who were statistics majors, who did know more than I, and post-graduate students in psychology and heads of mathematics departments, and my fears were never realised. In fact the stats students told me that they finally understood the central limit theorem, thanks to my nifty little exercise using dotplots on minitab. (Which was how I had finally understood the central limit theorem – or at least the guts of it.)

I’m guessing that this is probably true for most of the mathematics teachers who are worrying. Despite their fear, they have not been challenged or called out.

The teachers’ other unease is the feeling that they are not giving the best service to their students, and the students will suffer, miss out on scholarships, decide not to get a higher education and live their lives on the street.  I may be exaggerating a little here, but certainly few of us like to give a service that is less than what we are accustomed to. We feel bad when we do something that feels substandard.

There are two things I learned in my twenty years of lecturing that may help here:

We don’t know how students perceive what we do. Every now and again I would come out of a lecture with sweat trickling down my spine because something had gone wrong. It might be that in the middle of an explanation I had had second thoughts about it, changed tack, then realised I was right in the first-place and ended up confusing myself. Or perhaps part way through a worked example it was pointed out to me that there was a numerical error in line three. To me these were bad, bad things to happen. They undermined my sense of competence. But you know, the students seldom even noticed. What felt like the worst lecture of my life, was in fact still just fine.

The other thing I learned is that we flatter ourselves when we think how much difference our knowledge may make.  Now don’t get me wrong here – teachers make an enormous difference. People who become teachers do so because we want to help people. We want to make a difference in students’ lives. We often have a sense of calling. There may be some teachers who do it because they don’t know what else to do with their degree, but I like to think that most of us teachers teach because to not teach is unthinkable. I despise, to the point of spitting as I talk, the expression “Those who can, do, and those who can’t, teach.” One day when the mood takes me I will write a whole post about the noble art of teaching and the fallacy of that dismissive statement. My next statement is so important I will give it a paragraph of its own.

A teacher who teaches from love, who truly cares about what happens to their students, even if they are struggling on the knife-edge of competence will not ruin their students’ lives through temporary incompetence in an aspect of the curriculum.

There are many ways that a teacher can have devastating effects on their students, but being, for a short time, on the knife-edge of competence, is not one of them.

Take heart, keep calm and carry on!

Difficult concepts in statistics

Recently someone asked: “I don’t suppose you’d like to blog a little on the pedagogical knowledge relevant to statistics teaching, would you? A ‘top five statistics student misconceptions (and what to do about them)’ would be kind of a nice thing to see …”

I wish it were that easy. Here goes:

Things that I have found students find difficult to understand and what I have done about them.

Observations

When I taught second year regression we would get students to collect data and fit their own multiple regressions. The interesting thing was that quite often students would collect unrelated data. The columns of the data would not be of the same observations. These students had made it all the way through first year statistics without really understanding about multivariate data.

So from them on when I taught about regression I would specifically begin by talking about observations (or data points) and explain how they were connected. It doesn’t hurt to be explicit. In the NZ curriculum materials for high school students are exercises using data cards which correspond to individuals from a database. This helps students to see that each card, which corresponds to a line of data, is one person or thing. In my video about Levels of measurement, I take the time to show this.

First suggestion is “Don’t assume”.  This applies to so much!

And this is also why it is vital that instructors do at least some of their own marking (grading). High school teachers are going, “Of course”. College professors – you know you ought to! The only way you find out what the students don’t understand, or misunderstand, or replicate by rote from your own notes, is by reading what they write. This is tedious, painful and sometimes funny in a head-banging sort of way, but necessary. I also check the prevalence of answers to multiple choice questions in my on-line materials. If there is a distracter scoring highly it is worthwhile thinking about either the question or the teaching that is leading to incorrect responses.

Inference

Well duh! Inference is a really, really difficult concept and is the key to inferential statistics. The basic idea, that we use information from a sample to draw conclusions about the population seems straight-forward. But it isn’t. Students need lots and lots of practice at identifying what is the population and what is the sample in any given situation. This needs to be done with different types of observations, such as people, commercial entities, plants or animals, geographical areas, manufactured products, instances of a physical experiment (Barbie bungee jumping), and times.

Second suggestion is “Practice”. And given the choice between one big practical project and a whole lot of small applied exercises, I would go with the exercises. A big real-life project is great for getting an idea of the big picture, and helping students to learn about the process of statistical analysis. But the problem with one big project is that it is difficult to separate the specific from the general. Context is at the core of any analysis in statistics, and makes every analysis different. Learning occurs through experiencing many different contexts and from them extracting what is general to all analysis, what is common to many analyses and what is specific to that example. The more different examples a student is exposed to, the better opportunity they have for constructing that learning. An earlier post extols the virtues of practice, even drill!

Connections

One of the most difficult things is for students to make connections between parts of the curriculum. A traditional statistics course can seem like a toolbox of unrelated but confusingly different techniques. It takes a high level of understanding to link the probability, data and evidence aspects together in a meaningful way. It is good to have exercises that hep students to make these connections. I wrote about this with regard to Operations Research and Statistics. But students need also to be making connections before they get to the end of the course.

The third suggestion is “get students to write”

Get students to write down what is the same and what is different between chi-sq analysis and correlation. Get them to write down how a poisson distribution is similar to and different from a binomial distribution. Get them to write down how bar charts and histograms are similar and different. The reason students must write is that it is in the writing that they become aware of what they know or don’t know. We even teach ourselves things as we write.

Graphs and data

Another type of connection that students have trouble with is that between the data and the graph, and in particular identifying variation and distribution in a histogram or similar. There are many different graphs, that can look quite similar, and students have problems identifying what is going on. The “value graph” which is produced so easily in Excel does nothing to help with these problems. I wrote a full post on the problems of interpreting graphs.

The fourth suggestion is “think hard”. (or borrow)

Teaching statistics is not for wusses. We need to think really hard about what students are finding difficult, and come up with solutions. We need to experiment with different ways of explaining and teaching. One thing that has helped my teaching is the production of my videos. I wish to use both visual and text (verbal) inputs as best as possible to make use of the medium. I have to think of ways of representing concepts visually, that will help both understanding and memory. This is NOT easy, but is extremely rewarding. And if you are not good at thinking up new ideas, borrow other people’s ideas. A good idea collector can be as good as or better than a good creator of ideas.

To think of a fifth suggestion I turned to my favourite book , “The Challenge of Developing Statistical Literacy, Reasoning and Thinking”, edited by Dani Ben-Zvi and Joan Garfield. I feel somewhat inadequate in the suggestions given above. The book abounds with studies that have shown areas of challenge or students and teachers. It is exciting that so many people are taking seriously the development of pedagogical content knowledge regarding the discipline of statistics. Some statisticians would prefer that the general population leave statistics to the experts, but they seem to be in the minority. And of course it depends on what you define “doing statistics” to mean.

But the ship of statistical protectionism has sailed, and it is up to statisticians and statistical educators to do our best to teach statistics in such a way that each student can understand and apply their knowledge confidently, correctly and appropriately.

The median outclasses the mean

The median suffers from poor marketing.

All my time at school the “average” was always calculated as the arithmetic mean, by adding up all the scores and then dividing by the number of scores. When we were taught about the median, it seemed like an inferior version of the mean. It was the thing you worked out when you weren’t smart enough to add and divide. It was used for house prices, and that was about it. Of course the mean was the superior product! Why wouldn’t you use the mean?

I’ve been preparing resources for teaching the fabulous new New Zealand curriculum, and have been brought face-to-face with my prejudices. It strikes me that the median has had very poor representation.

Public opinion of the median and mean

I put a question on Facebook and Twitter to see what people felt about the mean and the median. I briefly explained what each was, then asked which one they thought was better. Some people had no idea what I was talking about, but most felt that the mean was the superior statistic. The following are a selection of responses:

The mean, but I don’t know why.. maybe that’s just what we were taught to use when I was back in school (a long time ago!) lol

When I think of “average” I always think of the mean. I don’t know if it’s actually better though

well the median is a real pain to work out. you have to make a list of all the numbers, in order, and then count how many they are and then go to the middle. PAIN IN THE BUM. the average… well that is somewhat quicker to do, no? and i don’t see the point in the median at all. unless well no, there is just no need for it. who cares what the15th person in the class got on a test? the lowes and highes is much more interesting. As i remember it, the mode is the most commonly occuring number out of a set of numbers… i think of this as the “mode” or in English (not French), the ‘fashionable” number. oh and it stresses me how all 3 start with Ms cos that is confusing. which is why i like to use the word average.

The mean, which I’m guessing is the same as the average? When the media refer to real estate stats they always use median price, which can distort reality, we would prefer the average price. (From a real estate agent)

I don’t really think it’s a case of which is better. They’re two different things aren’t they? I think it’s usually easier to work out the average.

A number of my Facebook friends did know about statistics, and responded in favour of the median in most cases. This was an interesting comment:

“It depends. Everyone who proof read my thesis was like why on earth are you using the median – no one uses it. And most of the other similar primate studies I’ve read use the mean (except one, that was published by my associate supervisor). But my means were off their rocker, and I’m pretty sure my medians were a much better representation of reality in this case. It makes making comparisons between studies a little awkward though.

Why NOT use the median all the time?

I am hard pressed to find an instance where the mean is actually a better measure of central tendency than the median. The purpose of the mean or median (or mode) is to provide a one number summary of a set of data. The whole idea of the mean is actually quite tricky, as you can read in one of my early posts about explaining what the mean is. Generally the summary value is used to compare with another sample or population.

In my lectures I often illustrated times when the median is a better summary measure of a sample or population than the mean. This is quite common in notes and YouTube videos. Never once did I show where the mean was preferred to the median! So why were/are we so loyal to the mean, bringing out the median for special occasions and real estate?

I think there are two answers, both of them no longer valid. It is a question of legacy.

Time and ease to calculate

Despite first appearances, for anything larger than a trivial sample the mean is actually easier to calculate than the median. Putting a set of 100 values in order by hand is no easy task. (Pain in the bum, as my friend so elegantly expressed it.) Adding up scores and dividing by 100 is a walk in the park in comparison.  In the early 1980s when I learned programming (in Fortran, Pascal and Cobol), writing a sorting program was far from trivial and a large set of numbers would take a large amount of time to sort. Only in later years, as computing power has expanded, has it been possible to get a computer to calculate a median.

Formulas for confidence intervals

Means behave nicely and give nice mathematical results when manipulated. Because of this we can calculate confidence intervals using a nifty little formula and statistical tables. Until bootstrapping by computer  became do-able on a large and small scale, there was no practical way to perform inference on a number of very useful statistics, including the median and the inter-quartile range.

Conclusion: the median is better

A median is intrinsically understandable. It is the middle number when the values are put in order. End of story. – Well not quite – you do have that slightly tricky thing where the sample is even and you have to average the middle two terms, but apart from that it is easy!

A median is not affected by outliers. I learned a new term for this when I was reading up in preparation for writing this post. The term is “resistant” and I learned it from one of Mr Tarrou’s videos for AP Statistics. I found these videos after my tirade against videos on confidence intervals. Tarrou’s videos are long and a bit more mathematical than I would like. (He can’t help it – he is a maths teacher and the AP Statistics syllabus seems to have been devised by mathematical statisticians trying to put students off ever taking the subject again.) But they are GOOD. Tarrou’s videos are sound, and interesting and well put together. I will be recommending them as complementary to my own offerings. (Because I sure as heck don’t want to have to do all that icky mathsy stuff).

But I digress. The median is “resistant” because it is not at the mercy of outliers. There are lots of great examples, including in Mr Tarrou’s video. If you have a median of 5 and then add another observation of 80, the median is unlikely to stray far from the 5. However a mean is a fickle beast, and easily swayed by a flashy outlier.

The main disadvantage I can see for the median is that it can be a bit jumpy in small samples made up of discrete values. I guess if you have two well-behaved populations that are very similar and you want to see precise differences then the means might just be better – but even then you would possibly be over-interpreting small differences.

I have found it very interesting observing the behaviour of confidence intervals for the difference of two medians, compared with confidence intervals for the difference in two means. While I was preparing materials for our on-line resource, I performed nine such tests on different real data taken from students at university. The scores are very jumpy, and the differences between the medians often include exactly zero. Consequently the confidence intervals of the difference of two medians quite often have zero as their lower bound. This provides a challenge in interpretation, as I had not met this often when looking at the differences between means. However, it also illuminates the odd relationship we have with zero. Just because a confidence interval for a difference of two means is (-0.13, 3.98) and includes a zero, it is tempting to conclude that there is no significant difference. But is -0.13 really any different from zero in practical terms? The other point is that we should be leaving the confidence interval as it is, rather than stretching it into further inference.

Word on the web

I did a little surfing to see what the word on the web was.  To find out who said what, drop the entire phrase into Google. (Ah ‘tis a wonderful we live in, indeed)

  • “The mean is the one to use with symmetrically distributed data; otherwise, use the median.” Hmm – but if the data is symmetric, surely the mean = the median?
  • “An important property of the mean is that it includes every value in your data set as part of the calculation. In addition, the mean is the only measure of central tendency where the sum of the deviations of each value from the mean is always zero. “ Ok – hard to argue with that.
  • “Calculation of medians is a popular technique in summary statistics and summarizing statistical data, since it is simple to understand and easy to calculate, while also giving a measure that is more robust in the presence of outlier values than is the mean.” Totally!
  • “However, when the sample size is large and does not include outliers, the mean score usually provides a better measure of central tendency. “(Then goes on to give an example of when the median is better.)
  • “Use the median to describe the middle of a set of data that does have an outlier. Advantages of the median: Extreme values (outliers) do not affect the median as strongly as they do the mean, useful when comparing sets of data, it is unique – there is only one answer.
    Disadvantages of the median:  Not as popular as mean.(Not as popular??!)

Sorry median  – you do not win X-Factor for summary statistics. You may be more robust, and less fickle, not to mention easier to understand, but you just aren’t as popular!

I can feel a video coming on – the median has been relegated to the periphery long enough!