Summarising with Box and Whisker plots

In the Northern Hemisphere, it is the start of the school year, and thousands of eager students are beginning their study of statistics. I know this because this is the time of year when lots of people watch my video, Types of Data. On 23rd August the hits on the video bounced up out of their holiday slumber, just as they do every year. They gradually dwindle away until the end of January when they have a second jump in popularity, I suspect at the start of the second semester.

One of the first topics in many statistics courses is summary statistics. The greatest hits of summary statistics tend to be the mean and the standard deviation. I’ve written previously about what a difficult concept a mean is, and then another post about why the median is often preferable to the mean. In that one I promised a video. Over two years ago – oops. But we have now put these ideas into a video on summary statistics. Enjoy! In 5 minutes you can get a conceptual explanation on summary measures of position. (Also known as location or central tendency)


I was going to follow up with a video on spread and started to think about range, Interquartile range, mean absolute deviation, variance and standard deviation. So I decided instead to make a video on the wonderful boxplot, again comparing the shoe- owning habits of male and female students in a university in New Zealand.

Boxplots are great. When you combine them with dotplots as done in iNZIght and various other packages, they provide a wonderful way to get an overview of the distribution of a sample. More importantly, they provide a wonderful way to compare two samples or two groups within a sample. A distribution on its own has little meaning.

John Tukey was the first to make a box and whisker plot out of the 5-number summary way back in 1969. This was not long before I went to High School, so I never really heard about them until many years later. Drawing them by hand is less tedious than drawing a dotplot by hand, but still time consuming. We are SO lucky to have computers to make it possible to create graphs at the click of a mouse.

Sample distributions and summaries are not enormously interesting on their own, so I would suggest introducing boxplots as a way to compare two samples. Their worth then is apparent.

A colleague recently pointed out an interesting confusion and distinction. The interquartile range is the distance between the upper quartile and the lower quartile. The box in the box plot contains the middle 50% of the values in the sample. It is tempting for people to point this out and miss the point that the interquartile range is a good resistant measure of spread for the WHOLE sample. (Resistant means that it is not unduly affected by extreme values.) The range is a poor summary statistic as it is so easily affected by extreme values.

And now we come to our latest video, about the boxplot. This one is four and a half minutes long, and also uses the shoe sample as an example. I hope you and your students find it helpful. We have produced over 40 statistics videos, some of which are available for free on YouTube. If you are interested in using our videos in your teaching, do let us know and we will arrange access to the remainder of them.

Engaging students in learning statistics using The Islands.

Three Problems and a Solution

Modern teaching methods for statistics have gone beyond the mathematical calculation of trivial problems. Computers can enable large size studies, bringing reality to the subject, but this is not without its own problems.

Problem 1: Giving students experience of the whole statistical process

There are many reasons for students to learn statistics through running their own projects, following the complete statistical enquiry process, posing a problem, planning the data collection, collecting and cleaning the data, analysing the data and drawing conclusions that relate back to the original problem. Individual projects can be both time consuming and risky, as the quality of the report, and the resultant grade can be dependent on the quality of the data collected, which may be beyond the control of the student.

The Statistical Enquiry Cycle, which underpins the NZ statistics curriculum.

The Statistical Enquiry Cycle, which underpins the NZ statistics curriculum.

Problem 2: Giving students experience of different types of sampling

If students are given an existing database and then asked to sample from it, this can be confusing for student and sends the misleading message that we would not want to use all the data available. But physically performing a sample, based on a sampling frame, can be prohibitively time consuming.

Problem 3: Giving students experience conducting human experiments

The problem here is obvious. It is not ethical to perform experiments on humans simply to learn about performing experiments.

An innovative solution: The Islands virtual world.

I recently ran an exciting workshop for teachers on using The Islands. My main difficulty was getting the participants to stop doing the assigned tasks long enough to discuss how we might implement this in their own classrooms. They were too busy clicking around different villages and people, finding subjects of the right age and getting them to run down a 15degree slope – all without leaving the classroom.

The Island was developed by Dr Michael Bulmer from the University of Queensland and is a synthetic learning environment. The Islands, the second version, is a free, online, virtual human population created for simulating data collection.

The synthetic learning environment overcomes practical and ethical issues with applied human research, and is used for teaching students at many different levels. For a login, email james.baglin @ (without the spaces in the email address).

There are now approximately 34,000 inhabitants of the Islands, who are born, have families (or not) and die in a speeded up time frame where 1 Island year is equivalent to about 28 earth days. They each carry a genetic code that affects their health etc. The database is dynamic, so every student will get different results from it.

The Islanders

Some of the Islanders

Two magnificent features

To me the one of the two best features is the difficulty of acquiring data on individuals. It takes time for students to collect samples, as each subject must be asked individually, and the results recorded in a database. There is no easy access to the population. This is still much quicker than asking people in real-life (or “irl” as it is known on the social media.) It is obvious that you need to sample and to have a good sampling plan, and you need to work out how to record and deal with your data.

The other outstanding feature is the ability to run experiments. You can get a group of subjects and split them randomly into treatment and control groups. Then you can perform interventions, such as making them sit quietly or run about, or drink something, and then evaluate their performance on some other task. This is without requiring real-life ethical approval and informed consent. However, in a touch of reality the people of the Islands sometimes lie, and they don’t always give consent.

There are over 200 tasks that you can assign to your people, covering a wide range of topics. They include blood tests, urine tests, physiology, food and drinks, injections, tablets, mental tasks, coordination, exercise, music, environment etc. The tasks occur in real (reduced) time, so you are not inclined to include more tasks than are necessary. There is also the opportunity to survey your Islanders, with more than fifty possible questions. These also take time to answer, which encourages judicious choice of questions.


In the workshop we used the Islands to learn about sampling distributions. First each teacher took a sample of one male and one female and timed them running down a hill. We made (fairly awful) dotplots on the whiteboard using sticky notes with the individual times on them. Then each teacher took a sample and found the median time. We used very small samples of 7 each as we were constrained by time, but larger samples would be preferable. We then looked at the distributions of the medians and compared that with the distribution of our first sample. The lesson was far from polished, but the message was clear, and it gave a really good feel for what a sampling distribution is.

Within the New Zealand curriculum, we could also use The Islands to learn about bivariate relationships, sampling methods and randomised experiments.

In my workshop I had educators from across the age groups, and a primary teacher assured me that Year 4 students would be able to make use of this. Fortunately there is a maturity filter so that you can remove options relating to drugs and sexual activity.

James Baglin from RMIT University has successfully trialled the Island with high school students and psychology research methods students. The owners of the Island generously allow free access to it. Thanks to James Baglin, who helped me prepare this post.

Here are links to some interesting papers that have been written about the use of The Islands in teaching. We are excited about the potential of this teaching tool.

Michael Bulmer and J. Kimberley Haladyn (2011) Life on an Island: a simulated population to support student projects in statistics. Technology Innovations in Statistics Education, 5(1). 

Huynh, Baglin, Bedford (2014) Improving the attitudes of high school students towards statistics: An Island-based approach. ICOTS9

Baglin, Reece, Bulmer and Di Benedetto, (2013) Simulating the data investigative cycle in less than two hours: using a virtual human population, cloud collaboration and a statistical package to engage students in a quantitative research methods course.

Bulmer, M. (2010). Technologies for enhancing project assessment in large classes. In C. Reading (Ed.), Proceedings of the Eighth International Conference on Teaching Statistics, July 2010. Ljubljana, Slovenia. Retrieved from

Bulmer, M., & Haladyn, J. K. (2011). Life on an Island: A simulated population to support student projects in statistics. Technology Innovations in Statistics Education, 5. Retrieved from

Baglin, J., Bedford, A., & Bulmer, M. (2013). Students’ experiences and perceptions of using a virtual environment for project-based assessment in an online introductory statistics course. Technology Innovations in Statistics Education, 7(2), 1–15. Retrieved from

Learning to teach statistics, in a MOOC

I am participating in a MOOC, Teaching statistics through data investigations. A MOOC is a fancy name for an online, free, correspondence course.  The letters stand for Massive Open Online Course. I decided to enrol for several reasons. First I am always keen to learn new things. Second, I wanted to experience what it is like to be a student in a MOOC. And third I wanted to see what materials we could produce that might help teachers or learners of statistics in the US. We are doing well in the NZ market, but it isn’t really big enough to earn us enough money to do some of the really cool things we want to do in teaching statistics to the masses.

I am now up to Unit 4, and here is what I have learned so far:

Motivation and persistence

It is really difficult to stay motivated even in the best possible MOOC. Life gets in the way and there is always something more pressing than reading the materials, taking part in discussions and watching the videos. I looked up the rate of completion for MOOCs, and this article from IEEE gives the completion rate at 5%. Obviously it will differ between MOOCs, depending on the content, the style, the reward. I have found I am best to schedule time to apply to the MOOC each week, or it just doesn’t happen.

I know more than I thought I did

It is reassuring to find out that I really do have some expertise. (This may be a bit of a worry to those of you who regularly read my blog and think I am an expert in teaching statistics.) My efforts to read and ponder, to discuss and to experiment have meant that I do know more than teachers who are just beginning to teach statistics. Phew!

The investigative process matters

I finally get the importance of the Statistical Enquiry Cycle (PPDAC in New Zealand) or Statistical Investigation Cycle (Pose Collect, Analyse, Interpret in the US). I sort of got it before, but now it is falling into place. In the old-fashioned approach to teaching statistics, almost all the emphasis was on the calculations. There would be questions asking students to find the mean of a set of numbers, with no context. This is not statistics, but an arithmetic exercise. Unless a question is embedded in the statistical process, it is not statistics. There needs to be a reason, a question to answer, real data and a conclusion to draw. Every time we develop a teaching exercise for students, we need to think about where it sits in the process, and provide the context.

Brilliant questions

I was happy to participate in the LOCUS quiz to evaluate my own statistical understanding. I was relieved to get 100%. But I was SO impressed with the questions, which reflected the work and thinking that have produced them. I understand how difficult it is to write questions to teach and assess statistical understanding, as I have written hundreds of them myself. The FOCUS questions are great questions. I will be writing some of my own following their style. I loved the ones that asked what would be the best way to improve an experimental design. Inspired!

It’s easier to teach the number stuff

I’m sure I knew this, but to see so many teachers say it, cemented it in. Teacher after teacher commented that teaching procedure is so much easier than teaching concepts. Testing knowledge of procedure is so much easier than assessing conceptual understanding. Maths teachers are really good at procedure. That fluffy, hand-waving meaning stuff is just…difficult. And it all depends. Every answer depends! The implication of this is that we need to help teachers become more confident in helping students to learn the concepts of statistics. We need to develop materials that focus on the concepts. I’m pretty happy that most of my videos do just that – my “Understanding Confidence Intervals” is possibly the only video on confidence intervals that does not include a calculation or procedure.

You learn from other participants

I’ve never been keen on group work. I suspect this is true of most over-achievers. We don’t like to work with other people on assignments as they might freeload, or worse – drag our grade down. Over the years I’ve forced students to do group assignments, as they learn so much more in the process. And I hate to admit that I have also learned more when forced to do group assignments. It isn’t just about reducing the marking load. In this MOOC we are encouraged to engage with other participants through the discussion forums. This is an important part of on-line learning, particularly in a solely on-line platform (as opposed to blended learning). I just love reading what other people say. I get ideas, and I understand better where other people are coming from.

I have something to offer

It was pretty exciting to see my own video used as a resource in the course, and to hear from the instructor how she loves our Statistics Learning Centre videos.

What now?

I still have a few weeks to run on the MOOC and I will report back on what else I learn. And then in late May I am going to USCOTS (US Conference on Teaching Statistics). It’s going to cost me a bit to get there, living as I do in the middle of nowhere in Middle Earth. But I am thrilled to be able to meet with the movers and shakers in US teaching of statistics. I’ll keep you posted!

Nominal, Ordinal, Interval, Schmordinal

Everyone wants to learn about ordinal data!

I have a video channel with about 40 videos about statistics, and I love watching to see which videos are getting the most viewing each day. As the Fall term has recently started in the northern hemisphere, the most popular video over the last month is “Types of Data: Nominal, Ordinal, Interval/Ratio.” Similarly one of the most consistently viewed posts in this blog is one I wrote over a year ago, entitled, “Oh Ordinal Data, what do we do with you?”. Understanding about the different levels of data, and what we do with them, is obviously an important introductory topic in many statistical courses. In this post I’m going to look at why this is, as it may prove useful to learner and teacher alike.

And I’m happy to announce the launch of our new Snack-size course: Types of Data. For $2.50US, anyone can sign up and get access to video, notes, quizzes and activities that will help them, in about an hour, gain a thorough understanding of types of data.

Costing no more than a box of popcorn, our snack-size course will help help you learn all you need to know about types of data.

Costing no more than a box of popcorn, our snack-size course will help help you learn all you need to know about types of data.

The Big Deal

Data is essential to statistical analysis. Without data there is no investigative process. Data can be generated through experiments, through observational studies, or dug out from historic sources. I get quite excited at the thought of the wonderful insights that good statistical analysis can produce, and the stories it can tell. A new database to play with is like Christmas morning!

But all data is not the same. We need to categorise the data to decide what to do with it for analysis, and what graphs are most appropriate. There are many good and not-so-good statistical tools available, thanks to the wonders of computer power, but they need to be driven by someone with some idea of what is sensible or meaningful.

A video that becomes popular later in the semester is entitled, “Choosing the test”. This video gives a procedure for deciding which of seven common statistical tests is most appropriate for a given analysis. It lists three things to think about – the level of data, the number of samples, and the purpose of the analysis. We developed this procedure over several years with introductory quantitative methods students. A more sophisticated approach may be necessary at higher levels, but for a terminal course in statistics, this helped students to put their new learning into a structure. Being able to discern what level of data is involved is pivotal to deciding on the appropriate test.

Categorical Data

In many textbooks and courses, the types of data are split into two – categorical and measurement. Most state that nominal and ordinal data are categorical. With categorical data we can only count the responses to a category, rather than collect up values that are measurements or counts themselves. Examples of categorical data are colour of car, ethnicity, choice of vegetable, or type of chocolate.

With Nominal data, we report frequencies or percentages, and display our data with a bar chart, or occasionally a pie chart. We can’t find a mean of nominal data. However if the different responses are coded as numbers for ease of use in a database, it is technically possible to calculate the mean and standard deviation of those numbers. A novice analyst may do so and produce nonsense output.

The very first data most children will deal with is nominal data. They collect counts of objects and draw pictograms or bar charts of them. They ask questions such as “How many children have a cat at home?” or “Do more boys than girls like Lego as their favourite toy?” In each of these cases the data is nominal, probably collected by a survey asking questions like “What pets do you have?” and “What is your favourite toy?”

Ordinal data

Another category of data is ordinal, and this is the one that causes the most problems in understanding. My blog discusses this. Ordinal data has order, and numbers assigned to responses are meaningful, in that each level is “more” than the previous level. We are frequently exposed to ordinal data in opinion polls, asking whether we strongly disagree, disagree, agree or strongly agree with something. It would be acceptable to put the responses in the opposite order, but it would have been confusing to list them in alphabetical order: agree, disagree, strongly agree, strongly disagree. What stops ordinal data from being measurement data is that we can’t be sure about how far apart the different levels on the scale are. Sometimes it is obvious that we can’t tell how far apart they are. An example of this might be the scale assigned by a movie reviewer. It is clear that a 4 star movie is better than a 3 star movie, but we can’t say how much better. Other times, when a scale is well defined and the circumstances are right, ordinal data is appropriately, but cautiously treated as interval data.

Measurement Data

The most versatile data is measurement data, which can be split into interval or ratio, depending on whether ratios of numbers have meaning. For example temperature is interval data, as it makes no sense to say that 70 degrees is twice as hot as 35 degrees. Weight, on the other hand, is ratio data, as it is true to say that 70 kg is twice as heavy as 35kg.

A more useful way to split up measurement data, for statistical analysis purposes, is into discrete or continuous data. I had always explained that discrete data was counts, and recorded as whole numbers, and that continuous data was measurements, and could take any values within a range. This definition works to a certain degree, but I recently found a better way of looking at it in the textbook published by Wiley, Chance Encounters, by Wild and Seber.

“In analyzing data, the main criterion for deciding whether to treat a variable as discrete or continuous is whether the data on that variable contains a large number of different values that are seldom repeated or a relatively small number of distinct values that keep reappearing. Variables with few repeated values are treated as continuous. Variables with many repeated values are treated as discrete.”

An example of this is the price of apps in the App store. There are only about twenty prices that can be charged – 0.99, 1.99, 2.99 etc. These are neither whole numbers, nor counts, but as you cannot have a price in between the given numbers, and there is only a small number of possibilities, this is best treated as discrete data. Conversely, the number of people attending a rock concert is a count, and you cannot get fractions of people. However, as there is a wide range of possible values, and it is unlikely that you will get exactly the same number of people at more than one concert, this data is actually continuous.

Maybe I need to redo my video now, in light of this!

And please take a look at our new course. If you are an instructor, you might like to recommend it for your students.

A Statistics-centric curriculum

Calculus is the wrong summit of the pyramid.

“The mathematics curriculum that we have is based on a foundation of arithmetic and algebra. And everything we learn after that is building up towards one subject. And at top of that pyramid, it’s calculus. And I’m here to say that I think that that is the wrong summit of the pyramid … that the correct summit — that all of our students, every high school graduate should know — should be statistics: probability and statistics.”

Ted talk by Arthur Benjamin in February 2009. Watch it – it’s only 3 minutes long.

He’s right, you know.

And New Zealand would be the place to start. In New Zealand, the subject of statistics is the second most popular subject in our final year of schooling, with a cohort of 12,606. By comparison, the cohort for  English is 16,445, and calculus has a final year cohort of 8392, similar in size to Biology (9038), Chemistry (8183) and Physics (7533).

Some might argue that statistics is already the summit of our curriculum pyramid, but I would see it more as an overly large branch that threatens to unbalance the mathematics tree. I suspect many maths teachers would see it more as a parasite that threatens to suck the life out of their beloved calculus tree. The pyramid needs some reconstruction if we are really to have a statistics-centric curriculum. (Or the tree needs pruning and reshaping – I think I have too many metaphors!)

Statistics-centric curriculum

So, to use a popular phrase, what would a statistics-centric curriculum look like? And what would be the advantages and disadvantages of such a curriculum? I will deal with implementation issues later.

To start with, the base of the pyramid would look little different from the calculus-pinnacled pyramid. In the early years of schooling the emphasis would be on number skills (arithmetic), measurement and other practical and concrete aspects. There would also be a small but increased emphasis on data collection and uncertainty. This is in fact present in the NZ curriculum. Algebra would be introduced, but as a part of the curriculum, rather than the central idea. There would be much more data collection, and probability-based experimentation. Uncertainty would be embraced, rather than ignored.

In the early years of high school, probability and statistics would take a more central place in the curriculum, so that students develop important skills ready for their pinnacle course in the final two years. They would know about the statistical enquiry cycle, how to plan and collect data and write questionnaires.  They would perform their own experiments, preferably in tandem with other curriculum areas such as biology, food-tech or economics. They would understand randomness and modelling. They would be able to make critical comments about reports in the media . They would use computers to create graphs and perform analyses.

As they approach the summit, most students would focus on statistics, while those who were planning to pursue a career in engineering would also take calculus. In the final two years students would be ready to build their own probabilistic models to simulate real-world situations and solve problems. They would analyse real data and write coherent reports. They would truly understand the concept of inference, and why confidence intervals are needed, rather than calculating them by hand or deriving formulas.

There is always a trade-off. Here is my take on the skills developed in each of the curricula.

Calculus-centric curriculum

Statistics-centric curriculum

Logical thinking Communication
Abstract thinking Dealing with uncertainty and ambiguity
Problem-solving Probabilistic models
Modelling (mainly deterministic) Argumentation, deduction
Proof, induction Critical thinking
Plotting deterministic graphs from formulas Reading and creating tables and graphs from data

I actually think you also learn many of the calc-centric skills in the stats-centric curriculum, but I wanted to look even-handed.

Implementation issues

Benjamin suggests, with charming optimism, that the new focus would be “easy to implement and inexpensive.”  I have been a very interested observer in the implementation of the new statistics curriculum in New Zealand. It has not happened easily, being inexpensive has been costly, and there has been fallout. Teachers from other countries (of which there are many in mathematics teaching in NZ) have expressed amazement at how much the NZ teachers accept with only murmurs of complaint. We are a nation with a “can do” attitude, who, by virtue of small population and a one-tier government, can be very flexible. So long as we refrain from following the follies of our big siblings, the UK, US and Australia, NZ has managed to have a world-class education system. And when a new curriculum is implemented, though there is unrest and stress, there is seldom outright rebellion.

In my business, I get the joy of visiting many schools and talking with teachers of mathematics and statistics. I am fascinated by the difference between schools, which is very much a function of the head of mathematics and principal. Some have embraced the changes in focus, and are proactively developing pathways to help all students and teachers to succeed. Others are struggling to accept that statistics has a place in the mathematics curriculum, and put the teachers of statistics into a ghetto where they are punished with excessive marking demands.

The problem is that the curriculum change has been done “on the cheap”. As well as being small and nimble, NZ is not exactly rich. The curriculum change needed more advisors, more release time for teachers to develop and more computer power. These all cost. And then you have the problem of “me too” from other subjects who have had what they feel are similar changes.

And this is not really embracing a full stats-centric curriculum. Primary school teachers need training in probability and statistics if we are really to implement Benjamin’s idea fully. The cost here is much greater as there are so many more primary school teachers. It may well take a generation of students to go through the curriculum and enter back as teachers with an improved understanding.

Computers make it possible

Without computers the only statistical analysis that was possible in the classroom was trivial. Statistics was reduced to mechanistic and boring hand calculation of light-weight statistics and time-filling graph construction. With computers, graphs and analysis can be performed at the click of a mouse, making graphs a tool, rather than an endpoint. With computing power available real data can be used, and real problems can be addressed. High level thinking is needed to make sense and judgements and to avoid wrong conclusions.

Conversely, the computer has made much of calculus superfluous. With programs that can bash their way happily through millions of iterations of a heuristic algorithm, the need for analytic methods is seriously reduced. When even simple apps on an iPad can solve an algebraic equation, and Excel can use “What if” to find solutions, the need for algebra is also questionable.

Efficient citizens

In H.G. Wells’ popular but misquoted words, efficient citizenry calls for the ability to make sense of data. As the science fiction-writer that he was, he foresaw the masses of data that would be collected and available to the great unwashed. The levelling nature of the web has made everyone a potential statistician.

According to the engaging new site from the ASA, “This is statistics”, statisticians make a difference, have fun, satisfy curiosity and make money. And these days they don’t all need to be good at calculus.

Let’s start redesigning our pyramid.

Support Dr Nic and Statistics Learning Centre videos

This is a short post, sometimes called e-begging!
I had been toying with the idea of a Kickstarter project, as a way for supporters of my work to help us keep going. Kickstarter is a form of crowd-sourcing, which lets a whole lot of people each contribute a little bit to get a project off the ground.

But we don’t really have one big project, but rather a stream of videos and web-posts to support the teaching and learning of statistics. Patreon provides a more incremental way for appreciative fans to support the work of content creators.

You can see a video about it here:

And here is a link to the Patreon page: Link to Patreon

Rather than producing for one big publishing company, who then hold the rights to our material, we would love to keep making our content freely available to all. You can help, with just a few dollars per video.

Those who can, teach statistics

The phrase I despise more than any in popular use (and believe me there are many contenders) is “Those who can, do, and those who can’t, teach.” I like many of the sayings of George Bernard Shaw, but this one is dismissive, and ignorant and born of jealousy. To me, the ability to teach something is a step higher than being able to do it. The PhD, the highest qualification in academia, is a doctorate. The word “doctor” comes from the Latin word for teacher.

Teaching is a noble profession, on which all other noble professions rest. Teachers are generally motivated by altruism, and often go well beyond the requirements of their job-description to help students. Teachers are derided for their lack of importance, and the easiness of their job. Yet at the same time teachers are expected to undo the ills of society. Everyone “knows” what teachers should do better. Teachers are judged on their output, as if they were the only factor in the mix. Yet how many people really believe their success or failure is due only to the efforts of their teacher?

For some people, teaching comes naturally. But even then, there is the need for pedagogical content knowledge. Teaching is not a generic skill that transfers seamlessly between disciplines. You must be a thinker to be a good teacher. It is not enough to perpetuate the methods you were taught with. Reflection is a necessary part of developing as a teacher. I wrote in an earlier post, “You’re teaching it wrong”, about the process of reflection. Teachers need to know their material, and keep up-to-date with ways of teaching it. They need to be aware of ways that students will have difficulties. Teachers, by sharing ideas and research, can be part of a communal endeavour to increase both content knowledge and pedagogical content knowledge.

There is a difference between being an explainer and being a teacher. Sal Khan, maker of the Khan Academy videos, is a very good explainer. Consequently many students who view the videos are happy that elements of maths and physics that they couldn’t do, have been explained in such a way that they can solve homework problems. This is great. Explaining is an important element in teaching. My own videos aim to explain in such a way that students make sense of difficult concepts, though some videos also illustrate procedure.

Teaching is much more than explaining. Teaching includes awakening a desire to learn and providing the experiences that will help a student to learn.  In these days of ever-expanding knowledge, a content-driven approach to learning and teaching will not serve our citizens well in the long run. Students need to be empowered to seek learning, to criticize, to integrate their knowledge with their life experiences. Learning should be a transformative experience. For this to take place, the teachers need to employ a variety of learner-focussed approaches, as well as explaining.

It cracks me up, the way sugary cereals are advertised as “part of a healthy breakfast”. It isn’t exactly lying, but the healthy breakfast would do pretty well without the sugar-filled cereal. Explanations really are part of a good learning experience, but need to be complemented by discussion, participation, practice and critique.  Explanations are like porridge – healthy, but not a complete breakfast on their own.

Why statistics is so hard to teach

“I’m taking statistics in college next year, and I can’t wait!” said nobody ever!

Not many people actually want to study statistics. Fortunately many people have no choice but to study statistics, as they need it. How much nicer it would be to think that people were studying your subject because they wanted to, rather than because it is necessary for psychology/medicine/biology etc.

In New Zealand, with the changed school curriculum that gives greater focus to statistics, there is a possibility that one day students will be excited to study stats. I am impressed at the way so many teachers have embraced the changed curriculum, despite limited resources, and late changes to assessment specifications. In a few years as teachers become more familiar with and start to specialise in statistics, the change will really take hold, and the rest of the world will watch in awe.

In the meantime, though, let us look at why statistics is difficult to teach.

  1. Students generally take statistics out of necessity.
  2. Statistics is a mixture of quantitative and communication skills.
  3. It is not clear which are right and wrong answers.
  4. Statistical terminology is both vague and specific.
  5. It is difficult to get good resources, using real data in meaningful contexts.
  6. One of the basic procedures, hypothesis testing, is counter-intuitive.
  7. Because the teaching of statistics is comparatively recent, there is little developed pedagogical content knowledge. (Though this is growing)
  8. Technology is forever advancing, requiring regular updating of materials and teaching approaches.

On the other hand, statistics is also a fantastic subject to teach.

  1. Statistics is immediately applicable to life.
  2. It links in with interesting and diverse contexts, including subjects students themselves take.
  3. Studying statistics enables class discussion and debate.
  4. Statistics is necessary and does good.
  5. The study of data and chance can change the way people see the world.
  6. Technlogical advances have put the power for real statistical analysis into the hands of students.
  7. Because the teaching of statistics is new, individuals can make a difference in the way statistics is viewed and taught.

I love to teach. These days many of my students are scattered over the world, watching my videos (for free) on YouTube. It warms my heart when they thank me for making something clear, that had been confusing. I realise that my efforts are small compared to what their teacher is doing, but it is great to be a part of it.

On-line learning and teaching resources

Twenty-first century Junior Woodchuck Guidebook

I grew up reading Donald Duck comics. I love the Junior Woodchucks, and their Junior Woodchuck Guidebook. The Guidebook is a small paperback book, containing information on every conceivable subject, including geography, mythology, history, literature and the Rubaiyat of Omar Khayyam.  In our family, when we want to know something or check some piece of information, we talk about consulting the Junior Woodchuck Guidebook. (Imagine my joy when I discovered that a woodchuck is another name for a groundhog, the star of my favourite movie!) What we are referring to is the internet, the source of all possible information! Thanks to search engines, there is very little we cannot find out on the internet. And very big thanks to Wikipedia, to which I make an annual financial contribution, as should all who use it and can afford to.

You can learn just about anything on the internet. Problem is, how do you know what is good? And how do you help students find good stuff? And how do you use the internet wisely? And how can it help us as learners and teachers of statistics and operations research? These questions will take more than my usual 1000 words, so I will break it up a bit. This post is about the ways the internet can help in teaching and learning. In a later post I will talk about evaluating resources, and in particular multimedia resources.


Both the disciplines in which I am interested, statistics and operations research, apply mathematical and analytic methods to real-world problems. In statistics we are generally trying to find things out, and in operations research we are trying to make them better. Either way, the context is important. The internet enables students to find background knowledge regarding the context of the data or problem they are dealing with. It also enables instructors to write assessments and exercises that have a degree of veracity to them even if the actual raw data proves elusive. How I wish people would publish standard deviations as well as means when reporting results!


Which brings us to the second use for on-line resources. Real problems with real data are much more meaningful for students, and totally possible now that we don’t need to calculate anything by hand. Sadly, it is more difficult than first appears to find good quality raw data to analyse, but there is some available. You can see some sources in a previous post and the helpful comments.


If you are struggling to understand a concept, or to know how to teach or explain it, do a web search. I have found some great explanations, and diagrams especially, that have helped me. Or I have discovered a dearth of good diagrams, which has prompted me to make my own.


Videos can help with background knowledge, with explanations, and with inspiring students with the worth of the discipline. The problem with videos is that it takes a long time to find good ones and weed out the others. One suggestion is to enlist the help of your students. They can each watch two or three videos and decide which are the most helpful. The teacher then watches the most popular ones to check for pedagogical value. It is great when you find a site that you can trust, but even then you can’t guarantee the approach will be compatible with your own.

Social support

I particularly love Twitter, from which I get connection with other teachers and learners, and ideas and links to blogs. I belong to a Facebook group for teachers of statistics in New Zealand, and another Facebook group called “I love Operations Research”. These wax and wane in activity, and can be very helpful at times. Students and teachers can gain a lot from social networking.


There is good open-source software available, and 30-day trial versions for other software. Many schools in New Zealand use the R-based iNZight collection of programmes, which provide purpose-built means for timeseries analysis, bootstrapping and line fitting.

Answers to questions

The other day I lost the volume control off my toolbar. (Windows Vista, I’m embarrassed to admit). So I put in the search box “Lost my volume control” and was directed to a YouTube video that took me step-by-step through the convoluted process of reinstating my volume control! I was so grateful I made a donation. Just about any computer related question can be answered through a search.

Interactive demonstrations

I love these. There are two sites I have found great:

The National Library of Virtual Manipulatives, based in Utah.

NRich – It has some great ideas in the senior statistics area. From the UK.

A problem with some of these is the use of Flash, which does not play on all devices. And again – how do we decide if they are any good or not?

On-line textbooks

Why would you buy a textbook when you can get one on-line. I routinely directed my second-year statistical methods for business students to “Concepts and Applications of Inferential Statistics”. I’ve found it just the right level. Another source is Stattrek. I particularly like their short explanations of the different probability distributions.

Practice quizzes

There aren’t too many practice quizzes  around for free. Obviously, as a provider of statistical learning materials, I believe quizzes and exercises have merit for practice with immediate and focussed feedback. However, it can be very time-consuming to evaluate practice quizzes, and some just aren’t very good. On the other hand, some may argue that any time students spend learning is better than none.

Live help

There are some places that provide live, or slightly delayed help for students. I got hooked into a very fun site where you earned points by helping students. Sadly I can’t find it now, but as I was looking I found vast numbers of on-line help sites, often associated with public libraries. And there are commercial sites that provide some free help as an intro to their services. In New Zealand there is the StudyIt service, which helps students preparing for assessments in the senior high school years. At StatsLC we provide on-line help as part of our resources, and will be looking to develop this further. From time to time I get questions as a result of my YouTube videos, and enjoy answering them ,unless I am obviously doing someone’s homework! I also discovered “ShowMe” which looks like a great little iPad app, that I can use to help people more.

This has just been a quick guide to how useful the internet can be in teaching and learning. Next week I will address issues of quality and equity.

Open Letter to Khan Academy about Basic Probability

Khan academy probability videos and exercises aren’t good either

Dear Mr Khan

You have created an amazing resource that thousands of people all over the world get a lot of help from. Well done. Some of your materials are not very good, though, so I am writing this open letter in the hope that it might make some difference. Like many others, I believe that something as popular as Khan Academy will benefit from constructive criticism.

I fear that the reason that so many people like your mathematics videos so much is not because the videos are good, but because their experience in the classroom is so bad, and the curriculum is poorly thought out and encourages mechanistic thinking. This opinion is borne out by comments I have read from parents and other bloggers. The parents love you because you help their children pass tests.  (And these tests are clearly testing the type of material you are helping them to pass!) The bloggers are not so happy, because you perpetuate a type of mathematical instruction that should have disappeared by now. I can’t even imagine what the history teachers say about your content-driven delivery, but I will stick to what I know. (You can read one critique here)

Just over a year ago I wrote a balanced review of some of the Khan Academy videos about statistics. I know that statistics is difficult to explain – in fact one of the hardest subjects to teach. You can read my review here. I’ve also reviewed a selection of videos about confidence intervals, one of which was from Khan Academy. You can read the review here.

Consequently I am aware that blogging about the Khan Academy in anything other than glowing terms is an invitation for vitriol from your followers.

However, I thought it was about time I looked at the exercises that are available on KA, wondering if I should recommend them to high school teachers for their students to use for review. I decided to focus on one section, introduction to probability. I put myself in the place of a person who was struggling to understand probability at school.

Here is the verdict.

First of all the site is very nice. It shows that it has a good sized budget to use on graphics and site mechanics. It is friendly to get into. I was a bit confused that the first section in the Probability and Statistics Section is called “Independent and dependent events”. It was the first section though. The first section of this first section is called Basic Probability, so I felt I was in the right place. But then under the heading, Basic probability, it says, “Can I pick a red frog out of a bag that only contains marbles?” Now I have no trouble with humour per se, and some people find my videos pretty funny. But I am very careful to avoid confusing people with the humour. For an anxious student who is looking for help, that is a bit confusing.

I was excited to see that this section had five videos, and two sets of exercises. I was pleased about that, as I’ve wanted to try out some exercises for some time, particularly after reading the review from Fawn Nguyen on her experience with exercises on Khan Academy. (I suggest you read this – it’s pretty funny.)

So I watched the first video about probability and it was like any other KA video I’ve viewed, with primitive graphics and a stumbling repetitive narration. It was correct enough, but did not take into account any of the more recent work on understanding probability. It used coins and dice. Big yawn. It wastes a lot of time. It was ok. I do like that you have the interactive transcript so you can find your way around.

It dawned on me that nowhere do you actually talk about what probability is. You seem to assume that the students already know that. In the very start of the first video it says,

“What I want to do in this video is give you at least a basic overview of probability. Probability, a word that you’ve probably heard a lot of and you are probably just a little bit familiar with it. Hopefully this will get you a little deeper understanding.”

Later in the video there is a section on the idea of large numbers of repetitions, which is one way of understanding probability. But it really is a bit skimpy on why anyone would want to find or estimate a probability, and what the values actually mean. But it was ok.

The first video was about single instances – one toss of a coin or one roll of a die. Then the second video showed you how to answer the questions in the exercises, which involved two dice. This seemed ok, if rather a sudden jump from the first video. Sadly both of these examples perpetuate the common misconception that if there are, say, 6 alternative outcomes, they will necessarily be equally likely.


Then we get to some exercises called “Probability Space” , which is not an enormously helpful heading. But my main quest was to have a go at the exercises, so that is what I did. And that was not a good thing. The exercises were not stepped, but started right away with an example involving two dice and the phrase “at least one of”. There was meant to be a graphic to help me, but instead I had the message “scratchpad not available”. I will summarise my concerns about the exercises at the end of my letter. I clicked on a link to a video that wasn’t listed on the left, called Probability Space and got a different kind of video.

This video was better in that it had moving pictures and a script. But I have problems with gambling in videos like this. There are some cultures in which gambling is not acceptable. The other problem I have is with the term  “exact probability”, which was used several times. What do we mean by “exact probability”? How does he know it is exact? I think this sends the wrong message.

Then on to the next videos which were worked examples, entitled “Example: marbles from a bag, Example: Picking a non-blue marble, Example: Picking a yellow marble.” Now I understand that you don’t want to scare students with terminology too early, but I would have thought it helpful to call the second one, “complementary events, picking a non-blue marble”. That way if a student were having problems with complementary events in exercises from school, they could find their way here. But then I’m not sure who your audience is. Are you sure who your audience is?

The first marble video was ok, though the terminology was sloppy.

The second marble video, called “Example: picking a non-blue marble”, is glacially slow. There is a point, I guess in showing students how to draw a bag and marbles, but… Then the next example is of picking numbers at random. Why would we ever want to do this? Then we come to an example of circular targets. This involves some problem-solving regarding areas of circles, and cancelling out fractions including pi. What is this about? We are trying to teach about probablity so why have you brought in some complication involving the area of a circle?

The third marble video attempts to introduce the idea of events, but doesn’t really. By trying not to confuse with technical terms, the explanation is more confusing.

Now onto some more exercises. The Khan model is that you have to get 5 correct in a row in order to complete an exercise. I hope there is some sensible explanation for this, because it sure would drive me crazy to have to do that. (As I heard expressed on Twitter)

What are circular targets doing in with basic probability?

The first example is a circular target one.  I SO could not be bothered working out the area stuff so I used the hints to find the answer so I could move onto a more interesting example. The next example was finding the probability of a rolling a 4 from a fair six sided die. This is trivial, but would have been not a bad example to start with. Next question involve three colours of marbles, and finding the probability of not green. Then another dart-board one. Sigh. Then another dart board one. I’m never going to find out what happens if I get five right in a row if I don’t start doing these properly. Oh now – it gave me circumference. SO can’t be bothered.

And that was the end of Basic probability. I never did find out what happens if I get five correct in a row.

Venn diagrams

The next topic is called “Venn diagrams and adding probabilities “. I couldn’t resist seeing what you would do with a Venn diagram. This one nearly reduced me to tears.

As you know by now, I have an issue with gambling, so it will come as no surprise that I object to the use of playing cards in this example. It makes the assumption that students know about playing cards. You do take one and a half minutes to explain the contents of a standard pack of cards.  Maybe this is part of the curriculum, and if so, fair enough. The examples are standard – the probability of getting a Jack of Hearts etc. But then at 5:30 you start using Venn diagrams. I like Venn diagrams, but they are NOT good for what you are teaching at this level, and you actually did it wrong. I’ve put a comment in the feedback section, but don’t have great hopes that anything will change. Someone else pointed this out in the feedback two years ago, so no – it isn’t going to change.

Khan Venn diagram

This diagram is misleading, as is shown by the confusion expressed in the questions from viewers. There should be a green 3, a red 12, and a yellow 1.

Now Venn diagrams seem like a good approach in this instance, but decades of experience in teaching and communicating complex probabilities has shown that in most instances a two-way table is more helpful. The table for the Jack of Hearts problem would look like this:

Jacks Not Jacks Total
Hearts 1 12 13
Not Hearts 3 36 39
Total 4 48 52

(Any teachers reading this letter – try it! Tables are SO much easier for problem solving than Venn diagrams)

But let’s get down to principles.

The principles of instruction that KA have not followed in the examples:

  • Start easy and work up
  • Be interesting in your examples – who gives a flying fig about two dice or random numbers?
  • Make sure the hardest part of the question is the thing you are testing. This is particularly violated with the questions involving areas of circles.
  • Don’t make me so bored that I can’t face trying to get five in a row and not succeed.

My point

Yes, I do have one. Mr Khan you clearly can’t be stopped, so can you please get some real teachers with pedagogical content knowledge to go over your materials systematically and make them correct. You have some money now, and you owe it to your benefactors to GET IT RIGHT. Being flippant and amateurish is fine for amateurs but you are now a professional, and you need to be providing material that is professionally produced. I don’t care about the production values – keep the stammers and “lellows” in there if you insist. I’m very happy you don’t have background music as I can’t stand it myself. BUT… PLEASE… get some help and make your videos and exercises correct and pedagogically sound.

Dr Nic

PS – anyone else reading this letter, take a look at the following videos for mathematics.

And of course I think my own Statistics Learning Centre videos are pretty darn good as well.

Other posts about concerns about Khan:

Another Open Letter to Sal ( I particularly like the comment by Michael Paul Goldenberg)

Breaking the cycle (A comprehensive summary of the responses to criticism of Khan

Teaching a service course in statistics

Teaching a service course in statistics

Most students who enrol in an initial course in statistics at university level do so because they have to. I did some research on attitudes to statistics in my entry level quantitative methods course, and fewer than 1% of the students had chosen to be in that course. This is a little demoralising, if you happen to think that statistics is worthwhile and interesting.

Teaching a service course in statistics is one of the great challenges of teaching. A “Service Course” is a course in statistics for students who are majoring in some other subject, such as Marketing or Medicine or Education. For some students it is a terminating course – they will never have to look at a p-value again (they hope). For some students it is the precursor to further applied statistics such as marketing research or biological research. Having said that, statistics for citizens is important and interesting and engaging if taught that way. And we might encourage some students to carry on.

Yet the teachers and textbook writers seem to do their best to remove the joy. Statistics is a difficult subject to understand. Often the way the instructor thinks is at odds with the way the students think and learn. The mathematical nature of the subject is invested with all sorts of emotional baggage.

Here are some of the challenges of teaching a statistics service course.

Limited mathematical ability

It is important to appreciate how limited the mathematical understanding is of some of the students in service courses. In my first year quantitative methods course, I made sure my students knew basic algebra, including rearranging and solving equations. This was all done within a business context. Even elementary algebra  was quite a stumbling block to some students, for whom algebra had been a bridge too far at school. There were students in a postgrad course I taught who were not sure which was larger, out of 0.05 and 0.1, and talked about crocodiles with regard to greater than and less than signs. And these were schoolteachers! Another senior maths teacher in that group had been teaching the calculation of confidence intervals, without actually understanding what they were.

The students are not like statisticians. Methods that worked to teach statisticians and mathematicians are unlikely to work for them. I wrote about this in my post about the Golden Rule, and how it applies at a higher level for teaching.

I realised a few years ago that I am not a mathematician. I do not have the ability to think in the abstract that is part of a true mathematician. Operations Research was my thing, because I was good at mathematics, but my understanding was concrete. This has been a surprising gift for me as a teacher, as it has meant that I can understand better what the students find difficult. Formulas do not tell them anything. Calculating by hand does not lead to understanding. It is from this philosophy that I approach the production of my videos. I am particularly pleased with my recent video about confidence intervals, which explains the ideas, with nary a formula in sight, but plenty of memorable images.


One of my more constantly accessed posts is  Excel, SPSS, Minitab or R?. This consistent interest indicates that the course of software is a universal problem.  People are very quick to say how evil Excel is, and I am under no illusions as to many of the shortcomings. The main point of my post was, however, that it depends on the class you are teaching.

As I have taught mainly business students, I still hold that for them, Excel is ideal. Not so much for the statistical aspects, but because they learn to use Excel. Last Saturday the ideas for today’s posts were just forming in my mind when the phone rang, and despite my realising it was probably a telemarketer (we have caller ID on our phone) I answered it. It was a nice young woman asking me to take part in a short survey about employment opportunities for women in the Christchurch Rebuild. After I’d answered the questions, explaining that I was redundant from the university because of the earthquakes and that I had taught statistics, she realised that I had taught her. (This is a pretty common occurrence for me in our small town-city – even when I buy sushi I am served by ex-students). So I asked her about her experience in my course, and she related how she would never have taken the course, but enjoyed it and passed. I asked about Excel, and she told me that she had never realised what you could do with Excel before, and now still used it. This is not an isolated incident. When students are taught Excel as a tool, they use it as a tool, and continue to do so after the course has ended.

When business students learn using Excel, it has the appearance of relevance. They are aware that spreadsheets are used in business. It doesn’t seem like time wasted. So I stand by my choice to use Excel. However if I were still teaching at University, I would also be using iNZight. And if I taught higher levels I would continue to use SPSS, and learn more about R.


As I said in a previous post Statistics Textbooks suck out all the fun. Very few textbooks do no harm. I wonder if this site could provide a database of statistics texts and reviews. I would be happy to review textbooks and include them here. My favourite elementary textbook is, sadly, out of print. It is called “Taking the Fear out of Data Analysis”, by the fabulously named Adamantis Diamantopoulos and Bodo Schlegelmilch. It takes a practical approach, and has a warm, nurturing style. It lacks exercises. I have used extracts from it over the years. The choice of textbook, like the choice of software, is “horses for courses”, but I think there are some horses that should not be put anywhere near a course. I do wonder how many students use textbooks as anything other than a combination lucky charm and paper weight.

In comparison with the plethora of college texts of varying value, at high-school level the pickings for textbooks are thin. This probably reflects the newness of the teaching of statistics at high-school level.  A major problem with textbooks is that they are so quickly out of date, and at school level it is not practical to replace class sets too often.

Perhaps the answer is online resources, which can be updated as needed, and are flexible and give immediate feedback.  ;-)

Emotional baggage

I was less than gentle with a new acquaintance in the weekend. When asked about my business, I told him that I make on-line materials to help people teach and learn statistics. He proceeded to relate a story of a misplaced use of a percentage as a reason why he never takes any notice of statistics. I have tired of the “Lies, damned lies, and statistics” jibe and decided not to take it lying down. I explained that the world is a better place because of statistical analysis. Much research, including medical would not be possible in the absence of methods for statistical analysis. An understanding of the concepts of statistics is a vital part of intelligent citizenship, especially in these days of big and ubiquitous data.

I stopped at that point, but have pondered since. What is it that makes people so quick to denigrate the worth of statistics? I suspect it is ignorance and fear. They make themselves feel better about their inadequacies by devaluing the things they lack. Just a thought.

This is not an isolated instance. In fact I was so surprised when a lighthouse keeper said that statistics sounded interesting and wanted to know more, that I didn’t really know what to say next! You can read about that in a previous post. Statistics is an interesting subject – really!

But the students in a service course in statistics may well be in the rather large subset of humanity who have yet to appreciate the worth of the subject. They may even have fear and antipathy towards the subject, as I wrote about previously. Anxiety, fear and antipathy for maths, stats and OR.

People are less likely to learn if they have negative attitudes towards the subject. And when they do learn it may well be “learning to pass” rather than actual learning which is internalised.

So what?

Keep the faith! Statistics is an important subject. Keep trying new things. If you never have a bad moment in your teaching, you are not trying enough new things. And when you hear from someone whose life was changed because of your teaching, there is nothing like it!