Graphs – beauty and truth

Graphs – beauty and truth (with apologies to Keats)

A good graph is elegant

I really like graphs. I like the way graphs turn numbers into pictures. A good graph is elegant. It uses a few well-placed lines to communicate what would take a paragraph of text. And like a good piece of literature or art, a good graph continues to give, beyond the first reading. I love looking at my YouTube and WordPress graphs. These graphs tell me stories. The WordPress analytics tell me that when I put up a new post, I get more hits, but that everyday more than 1000 people read one of my posts. The YouTube analytics tell me stories about when people want to know about different aspects of statistics. It is currently the end of the North American school year, and the demand is for my video on Choosing which statistical test to use. Earlier in the year, the video about levels of measurement is the most popular. And not many people view videos about statistics on the 25th of December. I’m happy to report that the YouTube and WordPress graphs are good graphs.

Spreadsheets have made it possible for anyone and everyone to create graphs. I like that graphs are easier to make. Drawing graphs by hand is a laborious task and fraught with error. But sometimes my heart aches when I see a graph used badly. I suspect that this is when a graphic artist has taken control, and the search for beauty has over-ridden the need for truth.

Three graphs spurred me to write this post.

Graph One: Bad-tasting Donut on house occupation

The first was on a website to find out about property values. I must have clicked onto something to find out about the property values in my area, and was taken to the qv website. And this is the graph that disturbed me.

Graphs named after food are seldom a good idea

Sure it is pretty – uses pretty colours and shading, and you can find out what it is saying by looking at the key – with the numbers beside it. But a pie or donut chart should not be used for data which has inherent order. The result here is that the segments are not in order. Or rather they are ordered from most frequent to least frequent, which is not intuitive. Ordinal data is best represented in a bar or column chart. To be honest, most data is best represented in a bar or column chart. My significant other suggested that bar charts aren’t as attractive as pie charts. Circles are prettier than rectangles. Circles are curvy and seem friendlier than straight lines and rectangles. So prettiness has triumphed over truth.

Graph Two: Misleading pictogram (a tautology?)

It may be a little strong to call bad communication lack of truth. Let’s look at another example. In a way it is cheating to cite a pictogram in a post like this. Pictograms are the lowest form of graph and are so often incorrect, that finding a bad one is easier than finding a good one. In the graph below of fatalities it is difficult to work out what one little person represents.

What does one little person represent?

A quick glance, ignoring the numbers, suggests that the road toll in 2014 is just over half what it was in 2012. However, the truth, calculated from the numbers, is that the relative size is 80%. 2012 has 12 people icons, representing 280 fatalities. One icon is removed for 2013, representing a drop of 9 fatalities. 2011 has one icon fewer again, representing a drop of 2 fatalities. There is so much wrong in the reporting of road fatalities, that I will stop here. Perhaps another day…

Graph Three: Mysterious display on Household income

And here is the other graph that perplexed me for some time. It came in the Saturday morning magazine from our newspaper, as part of an article about inequality in New Zealand. Anyone who reads my blog will be aware that my politics place me well left of centre, and I find inequality one of the great ills of the modern day. So I was keen to see what this graph would tell me. And the answer is…

See how long it takes for you to find where you appear on the graph. (Pretending you live in NZ)

I have no idea. Now, I have expertise in the promulgation of statistics, and this graph stumped me for some time. Take a good look now, before I carry on.

I did work out in the end, what was going on in the graph, but it took far longer than it should. This article is aimed at an educated but not particularly statistically literate audience, and I suspect there will be very few readers who spent long enough working out what was going on here. This graph is probably numerically correct. I had a quick flick back to the source of the data (who, by the way, are not to be blamed for the graph, as the data was presented in a table) and the graph seems to be an accurate depiction of the data. However, the graph is so confusing as to be worse than useless. Please post critiques in the comments. This graph commits several crimes. It is difficult to understand. It poses a question and then fails to help the reader find the answer. And it does not provide insights that an educated reader could not get from a table. In fact, I believe it has obscured the data.

Graphs are the main way that statistical analysts communicate with the outside world. Graphs like these ones do us no favours, even if they are not our fault. We need to do better, and make sure that all students learn about graphs.

Teaching suggestion – a graph a day

Here is a suggestion for teachers at all levels. Have a “graph a day” display – maybe for a month? Students can contribute graphs from the news media. Each day discuss what the graph is saying, and critique the way the graph is communicating. I have a helpful structure for reading graphs in my post: There’s more to reading graphs than meets the eye; 

Here is a summary of what I’ve said and what else I could say on the topic.

Thoughts about Statistical Graphs

  • The choice of graph depends on the purpose
  • The text should state the purpose of the graph
  • There is not a graph for everything you wish to communicate
  • Sometimes a table communicates better than a graph
  • Graphs are part of the analysis as well as part of the reporting. But some graphs are better to stay hidden.
  • If it takes more than a few seconds to work out what a graph is communicating it should either be dumped or have an explanation in the text
  • Truth (or communication) is more important than beauty
  • There is beauty in simplicity
  • Be aware than many people are colour-blind, or cannot easily differentiate between different shades.

Feedback from previous post on which graph to use

Late last year I posted four graphs of the same data and asked for people’s opinions. You can link back to the post here and see the responses: Which Graph to Use.

The interesting thing is not which graph was selected as the most popular, but rather that each graph had a considerable number of votes. My response is that it depends.  It depends on the question you are answering or the message you are sending. But yes – I agree with the crowd that Graph A is the one that best communicates the various pieces of information. I think it would be improved by ordering the categories differently. It is not very pretty, but it communicates.

I recently posted a new video on YouTube about graphs. It is a quick once-over of important types of graphs, and can help to clarify what they are about. There are examples of good graphs in there.

I have written about graphs previously and you can find them here on the Collected Works page.

I’m interested in your thoughts. And I’d love to see some beautiful and truthful graphs in the comments.


Khan Academy Statistics videos are not good

I don’t like the Khan Academy videos about statistics. But I can see why some people do. Some are okay, though some are very bad. I’m rather sorry they exist though, as they perpetuate the idea of statistics as mathematics.

Khan Academy, critics and supporters

Just in case you have been living under a rock, with respect to mathematics education, I will explain what Khan Academy is.

Sal Khan made little YouTube videos to teach a family member maths. Other people watched them and found them useful. Bill Gates discovered them and threw money at them. Now there are heaps of videos, with some back up exercises, and some people think this is the best thing to happen to maths (and other) education. Other people think that the videos lack pedagogical content knowledge. Sal agrees – he says he just makes them up as he goes along.

Diane Ravitch linked into the Khan Academy debate, beginning with this post, which is what got me looking into this. Two mathematics teachers made videos after the style of Mystery Science Theater 3000 starring two of Khan’s poorer mathematical contributions. The one on multiplying negative numbers was particularly poor and has since been replaced. Critiques of Khan seem to meet with two kinds of comments. One group is people who know about teaching, who are pleased that someone is pointing out that the emperor, though not naked, is poorly clad. The other lot are generally telling the mean teachers to leave Khan alone, that he is the saviour of mathematics teaching, and they would never have understood mathematics without him. The supporters also either suggest vested interest (for people who make educational materials) or that the writers should try to do better (for those people who don’t make educational materials). To be fair, the first group are also calling for other people to make better videos and put them out there.

For a good summary of the pros and cons of KA, here is a recent article in the Washington Post: “How well does Khan Academy teach?

Khan Academy Statistics videos

So I took a look at Khan Academy statistics videos. I know something about the teaching of statistics. I have many years of experience of successful teaching, I have done research and I have read some of the literature. I have pedagogical content knowledge (I understand what makes it hard for people to learn statistics.) And I have made my own statistics teaching videos, which have been well received. I wrote some time ago about the educational principles based on research into multimedia, which have been used in developing these videos. Unlike Khan I have thought hard and long about how to present these tricky concepts. I have written and rewritten the scripts. I have edited my audio to remove errors and hesitations, I have…anyway – back to Khan Academy.

To be fair, statistics is one of the most difficult subjects to teach, so I didn’t have high hopes.

To start with the list of topics under the statistics heading showed a strong mathematics influence. This may reflect the state of the curriculum in the United States, but in no way reflects current understanding of how statistics is best understood. I couldn’t find anything about variation, levels of measurement and sampling methods, which are all foundation concepts of statistics. I think it would be more correct to call the collection of videos “the mathematics of statistics”. It starts with the “Mean, Median and Mode.” Not exactly a great way to enter the exciting world of statistics. And he mispronounces the adjectival use of “arithmetic”, which is a bit embarrassing. (Note in 2017 – it has now been corrected. – Yay)

Reading Pie Graphs

I summoned up courage to view the video on reading Pie Graphs. It was not good. The example was percentages of ticket sales for Mediterranean cruises over a year. That data should never have been put into a Pie Graph. For two reasons! First there are too many slices of pie. A pie chart should never be used for that many categories. But worse than that, the categories are ordinal – they are months. The best choice of graph is a bar or column chart, with the months in order, as you would then be able to see trend! (I have to stop myself here or I could rave on much longer). My point is that Khan has used a very BAD graph as his example. This is one of the worst things a teacher can do, as it entrenches in the students’ minds that this is acceptable. The only thing good about the graph was that it was not three dimensional, and  it is not exploding. It didn’t even have a title. Bad, bad, bad. (Sorry I was meant to be stopping)

Confidence Intervals

I am tempted to say Khan is arrogant to think he can produce something after a few minutes thought. Actually I was tempted to say something rather stronger than that. I have to admit I haven’t watched many of the videos, but I really don’t want to spend too much of my life doing that. I chose one on Confidence Intervals, which nearly had me throwing things at the computer. It never explains what a confidence interval is. The bumbling around was so painful I couldn’t watch the video in its entirety. I’m pretty sure he got it wrong. He was so confused by the end that I can’t say for sure. My own confidence intervals video is one of my earlier ones, so it is a little rough, but I’d wager most people understand better what a confidence interval is after watching it. (UPDATE: Since writing this post I have made a better video about confidence intervals. It explains what confidence intervals ARE!) You can watch it here:

So then I decided I should look at the video entitled p-value and hypothesis tests. This is something I know many people struggle with. It is crucial to understanding inferential statistics. I have spent many hours working out ways in which to teach this that will help people to understand.

The p-value and hypothesis testing

Well I watched most of the p-value video, and was pleasantly surprised. The explanation of how we get the p-value is sound, and once he gets into his flow, the hesitations get less irritating. There is a small error – talking about 100 samples, rather than a sample of 100 observations. Also it is a bad idea to have a sample size of 100 in an example as this can get confused with the 100 in the expression of the confidence interval as a percentage. But it does give a good mathematical explanation of how the p-value is calculated. I’m not sure how well it helps students to understand what a p-value is. For a mathematically capable student, this would probably be enlightening. I have my doubts about most of the business students I have taught over the last two decades.

My main criticism is that the video is dull. It doesn’t provide anything more than the mathematics. But apart from alienating non-mathematical students it isn’t harmful. In fact if I had a student who wanted to know the mathematics behind the statistics, I would be happy to send them there. People have commented that my videos don’t tell you how the p-value is calculated. This is true. That is not the aim. Maybe I’ll do one about that one day, but I figured it was more important to know what to do with one.

Khan Academy videos on statistics aren’t good

My point is, surely we can do better than that! Bill Gates has thrown money at the Khan Academy. Wouldn’t it be wonderful if it were the purveyor of really good practice rather than mediocrity? One blogger suggests that if Khan Academy could use really good videos, it really could be useful.

I have gone on long enough.

I realise now, that asking a busy person to watch my videos is a bit of a cheek. They aren’t that long though. They are funny and clever. They are NOT like Khan Academy. I think they are worth the six to ten minutes each.

Here are links to my three most popular ones. Enjoy.