# Beware of Excel Histograms

Excel histograms are a disgrace. Microsoft should be embarrassed to have them associated with their ubiquitous and generally wonderful spreadsheet, Excel. I have previously posted on how useful and versatile Excel is for enabling people to bypass the number crunching, and get to the ideas behind statistics. This is mostly true. But the histogram add-in should come with a health and safety warning.

To start with, the default look for the histogram is outrageously poor. I have some data on times a person takes to solve a Rogo puzzle. (Collected as part of our research on what factors affect solution time.) I put the data in the spreadsheet, and use the data analysis toolpak to create a histogram using the default settings. Voila!

Histogram produced using default settings in Excel

I’ll stretch it out a bit so you can see it in all its glory:

Histogram using Excel defaults, stretched out.

And here is what can be produced from this with a fair degree of manipulation:

Excel Histogram whipped into shape

This is not just a question of cosmetics. The way the horizontal axis is labelled makes it very difficult to read correctly an Excel-produced histogram unless the adaptation shown above is used. And sometimes, an Excel-produced histogram is just plain incorrect.

# Bad histogram in NCEA sample exam

What prompted this tirade is a question from the sample external examination question in NCEA level 2 “Apply probability methods in solving problems”. This is an exam that over 15,000 students are likely to take. Fortunately this is only an examplar and not the real thing. It includes a badly labelled histogram, which I am almost certain was made in Excel.

Histogram taken from an exam exemplar

The introduction says: “Ali has a farm in Southland. She records the weights of 32 lambs born on her farm. The results are shown on the histogram above.”

The first question asks: “What proportion of the lambs weighs less than 1.25 kg?”

Go on – work it out.

I can’t answer this for certain. The labelling of the horizontal axis renders this question unanswerable. I suspect the desired answer is one out of 32. To get this answer I assume that the lambs are weighed to a precision of one decimal place, and the numbers under the graph are the inclusive upper bounds for the area above. I make this assumption because I know that is what Excel does. That is two too many assumptions for students in an exam. This is too many assumptions for any graph. Graphs exist in order to communicate, not confuse.

A histogram always has “bins” which cover a range of values. If you went to school last century and learnt to draw them by hand, you would put the boundary number between the bins on the tick mark on the graph that was the boundary between the bins. Intuitive!

A Google search on the word histogram shows most of the histograms with the tick marks at the boundaries, and quite a few using the Excel work around shown above. That is because the only way you can get the number and the tick mark to line up, is to move the tick mark to the centre. An Excel column chart is designed to be a value graph for nominal data, and it is being pressed into service in an unnatural way.

# Another example

This is a simple mockup to illustrate

Example A

The question is, how many people scored 3 or less in the test?

It isn’t clear. Did one person score between 0 and 3, and then three between 3 and 6? The data is actually:  0, 0.5, 1, 2, 5, 5.5, 7, 8, 9,10 and the answer is that four people scored three or less. The following histogram shows this.

Example B (same data as Example A)

All it takes is some relabelling and the meaning is clear.

# Teaching implications

We thought long and hard about the teaching of histograms within a Business Statistics course. We concluded that any student who is likely to need to produce a histogram in the future, is likely to (ought to?) have a better statistics package than Excel to use. Teaching them this bizarre work-around in Excel is a waste of student time (We decided this after we made students do this in a course.) It is more important for students to be able to interpret histograms correctly, and be aware of the pitfalls of badly labelled histograms. Consequently we taught students to interpret and then critique histograms, rather than construct them themselves, and assessed the same way.

If you ever use a histogram yourself, make sure you do not fall into the pit shown above!

And for those of you who persist in teaching histograms in Excel (or need help yourself in knowing how to do it – hence avoiding said pit), here is a pdf handout.

Drawing_a_histogram_2007

Good luck.

The best outcome would be that Microsoft get their very poor data analysis add-in fixed up, and the world would be a better place. Any chance of that?

This entry was posted in statistics, teaching, technology and tagged , , , by Dr Nic. Bookmark the permalink.

I love to teach just about anything. My specialties are statistics and operations research. I have insider knowledge on Autism through my family. I have a lovely husband, two grown-up sons, a fabulous daughter-in-law and an adorable grandson. I have several blogs - Learn and Teach Statistics, and Building a Statistics Learning Community, are the main ones.

## 23 thoughts on “Beware of Excel Histograms”

1. And it isn’t just the histograms that are substandard in the toolpak. We use toolpak for teaching elementary concepts and have to warn about errors and inadequacies. Where are the diagnostic graphs (for checking assumptions) in ANOVA? Are the diagnostic graphs in regression correct and adequate? etc.

Like

2. Dr Nic

The horizontal scale of a histogram is a continuous scale. It doesn’t have gaps as yours does (for numbers between 0 and 0.1). It’s not saying “my measurements are between a and b inclusive”. It’s saying that if we get any measurements between a and b then they will be put into that specific bin.

And having intervals of varying widths without compensating in some way could be quite misleading – as your 0 and 0.1 to 3 are. The height of the bar is then often a measure of how wide the interval is, rather than as a guide to the relative frequency of data in that bin. (Of course you will not get many “0” compared to “0.1 to 3” as there is only one way of scoring 0 but many ways to score between 0.1 and 3 inclusive. Your “0” interval ought to be very narrow if it only includes values between 0.000000 and 0.100000.

I do agree with your disdain of many aspects of Excel. I can’t for the life of me figure out why they don’t fix these problems. I guess the “Microsoft” before the “Excel” partially explains it.

I love your blog. Thanks. It gets me thinking.

Like

• How rude! 😉
You are absolutely correct, but it is the best I can do with Excel without even more messing around. One way is to put a text box over the horizontal axis, but this is even less dynamic and more prone to error. And I know about the zero thing but I was trying to show how messed up Excel histograms can be. (That’s my excuse anyway)
Thanks for reading! I love blogging – it is so much more fun to say totally unsubstantiated outrageous things than construct well thought out academic papers. It’s also good to know that people will pick on mistakes.

Like

• I thought you would take it the right way. Thanks for that.

I was only having a go at the histograms – not you. So the rudeness – which I apologise for – can be deflected at the graphs.

Also – I reckon you meant “is that four people scored 3 or less” in your articel above.

And if you visualise a tick mark in the centre of each interval I’m sure that the birth weight of lambs histogram is meaningful.

Thanks again

Like

• I fixed the error – thanks. Again you might be right about the lambs, but I think they should be clear as they are, especially in an exam, unless the aim is for students to find errors, which it isn’t.

Like

3. Actuall, Nic, it is possible to do a bit better, and create proper histograms in Excel – it’s just messy and long-winded. You need to take the data and transform it into something that you could plot using a flatbed XY plotter. (ie, create a sequence of (x,y) coordinates that correspond to the vertices of the desired histogram, and then draw the histogram using the XY plot with connected points) Your primary point regarding the use of this in an exam is of course exactly to the point – this is outrageous, and does our discipline a serious disservice.
Keep up the good work!

Like

4. (My two bob’s worth). In general, I think the multiple drop down, point a click menus in the newer versions of excel result in poorer graphics. The editing features are very limited. I use Excel 2003 which is much more versatile and it is easier to edit and tweak the graphs anyway you want, without copious and nauseating clicking. Most of the stats software out there produce graphs that are not very editable either, I find. Even less so than excel.

Like

5. why not just create a good template and make available to all students
creating & using templates is a handy skill as well
i would also recommend omitting the legend, which is a waste of space for 1 variable,
legends above rather than on right also leave more real estate for the interpretable part – another sully excel default – but easily fixed

Like

6. I love this blog. Great blog.

Like

7. Dr Nic, Excel’s histogram is really horrible especially for occasional users (moi). However I have come across a little (free) add-in called ‘better histogram’ that seems to work in 2010 even though it looks old. The best thing about it is that it doesn’t need ‘adding in’. I have it in the top right hand corner of my desktop and if I need to do a histogram whilst I am working in excel, I just shrink excel, click on the desktop icon and then it becomes available in the first workbook under the ‘add-Ins’ tab 🙂

Like

8. THERE **IS** AN EASIER WAY. It uses pivot tables and grouping (in other wording binning data. See http://youtu.be/ebiC71Wh5sY. I agree using the analysis toolpak is a bad idea – as is using the frequency function. PivotTables and PivotCharts though are relatively simple and are very useful to learn in general,

Like

9. I stumbled on this block seeking advice on the dodgy histogram function in Excel 2010. The poptools add-in has an alternative version, which is slightly more useful but not especially user friendly either. Then I realised that I had Golden Software’s Grapher installed (!), imported the Excel file, marked the column, and kazaam I had a perfectly formatted and customisable histogram. Expensive software though, but well worth it if you graph a lot and have reached the limits of Excel.

Like

10. PS. Thanks to Jonathan for the pivot chart tip, very useful indeed. Grapher is still the way to go for publishable figures though.

Like

11. Hi Dr Nic, I am agree with you. We thought long and hard about the teaching of histograms within a Business Statistics course. We concluded that any student who is likely to need to produce a histogram in the future, is likely to (ought to?) have a better statistics package than Excel to use.
Learn Analytics

Like

12. Microsoft apparently doesnt care about quality until it receives enough complaints, if then maybe; so, how can we get more public concern to help send a deluge of complaints that may result in getting excel to include real histograms, which have been neglected for so many years ?!

Like

• Thanks. I wasn’t aware of the Frequency function, so that was a big help.

Like