Teachers and resource providers – uneasy bedfellows

Trade stands and cautious teachers

It is interesting to provide a trade stand at a teachers’ conference. Some teachers are keen to find out about new things, and come to see how we can help them. Others studiously avoid eye-contact in the fear that we might try to sell them something. Trade stand holders regularly put sweets and chocolate out as “bait” so that teachers will approach close enough to engage. Maybe it gives the teachers an excuse to come closer? Either way it is representative of the uneasy relationship that “trade” has with salaried educators.

Money and education

Money and education have an uneasy relationship. For schools to function, they need considerable funding – always more than what they get. In New Zealand, and in many countries, education is predominantly funded by the state. Schools are built and equipped, teachers are paid and resources are purchased with money provided by the taxpayer. Extras are raised through donations from parents and fund-raising efforts. However, because it is not apparent that money is changing hands, schools are perceived as virtuous establishments, existing only because of the goodness of the teachers. This contrasts with the attitude to resource providers, who are sometimes treated as parasitic with their motives being all about the money. It is possible that some resource providers are in it just for the money, but it seems to me that there are richer seams to mine in health, sport, retail etc.

Statistics Learning Centre is a social enterprise

Statistics Learning Centre is a social enterprise. We fit in the fuzzy area between “not-for-profit” and commercial enterprise. We measure our success by the impact we are having in empowering teachers to teach statistics and all people to understand statistics. We need money in order to continue to make an impact. Statistics Learning Centre has made considerable contributions to the teaching and learning of statistics in New Zealand and beyond for several years. This post lists just some of the impact we have had.  We believe in what we are doing, and work hard so that our social enterprise is on a solid financial footing.

StatsLC empowers teachers

Soon after the change to the NCEA Statistics standards, there was a shortage of good quality practice external exams. Even the ones provided as official exemplars did not really fit the curriculum. Teachers approached us, requesting that we create practice exams that they could trust were correct and aligned to the curriculum. We did so in 2015 and 2016, at considerable personal effort and only marginal financial recompense. We see that as helping statistics to be better understood in schools and the wider community.

We, at Statistics Learning Centre, grasp at opportunities to teach teachers how to teach statistics better, to empower all teachers to teach statistics. Our workshops are well received, and we have regular attenders who know they will get value for their time. We use an inclusive, engaging approach, and participants have a good time. I believe in our resources – the videos, the quizzes, the data cards, the activities, the professional development. I believe that they are among the best you can get. So when I give workshops, I do talk about the resources. It would seem counter-productive for all concerned, not to mention contrived, to do otherwise. They are part of a full professional development session. Many mathematical associations have no trouble with this, and I love to go to conferences, and contribute.

I am aware that there are some commercial enterprises who wish to give commercial presentations at conferences. If their materials are not of a high standard, this can put the organisers in a difficult position. Consequently some organisations have a blanket ban on any presentations that reference any paid product. I feel this is a little unfortunate, as teachers miss out on worthwhile contributions. But I understand the problem.

The Open Market model – supply and demand

I believe that there is value in a market model for resources.  People have suggested that we should get the Government to fund access to Statistics Learning Centre resources for all schools. That would be delightful, and give us the freedom and time to create even better resources. But that would make it almost impossible for any other new provider, who may have an even better product, to get a look in. When such a monopoly occurs, it reduces the incentives for providers to keep improving.

Saving work for the teachers, and building on a product

Teachers want the best for their students, and have limited budgets. They may spend considerable amounts of time printing, cutting and laminating in order to provide teaching resources at a low cost. This was one of the drivers for producing our Dragonistics data cards – to provide at a reasonable cost, some ready-made, robust resources, so that teachers did not have to make their own. As it turned out we were able to provide interesting data with clear relationships, and engaging graphics so that we provide something more than just data turned into datacards.

Free resources

There are free resources available on the internet. Other resources are provided by teachers who are sharing what they have done while teaching their own students. Resources provided for free can be of a high pedagogical standard. Having a high production standard, however, can be prohibitively expensive for individual producers who are working in their spare time.  It can also be tricky for another teacher to know what is suitable, and a lot of time can be spent trying to find high quality, reliable resources.

Teachers and resource providers – a symbiotic relationship

Teachers need good resource providers. It makes sense for experts to create high quality resources, drawing on current thinking with regard to content specific pedagogy. These can support teachers, particularly in areas in which they are less confident, such as statistics. And they do need to be paid for their work.

It helps when people recognise that our materials are sound and innovative, when they give us opportunities to contribute and when they include us at the decision-making table. Let us know how we can help you, and in partnership we can become better bed-fellows.

What do you think?


(Note that this post is also being published on our blog: Building a Statistics Learning  Community, as I felt it was important,)


Enriching mathematics with statistics

Statistics enriches everything!

In many school systems in the world, subjects are taught separately. In primary school, children  learn reading and writing, maths and social studies at different times of the day. But more than that, many topics within subjects are also taught separately. In mathematics we often teach computational skills, geometry, measurement and statistics in separate topics throughout the school year. Textbooks tend to encourage this segmentation of the curriculum. This causes problems as students compartmentalise their learning.  They think that something learned in mathematics can’t possibly be used in Physics. They complain in mathematics if they are asked to write a sentence or a report, saying that it belongs in English.

I participated in an interesting discussion on Twitter recently about Stretch and Challenge. (Thanks #mathschat) My interpretation of “Stretch and challenge” is ways of getting students to extend their thinking beyond the original task so that they are learning more and feeling challenged. This reminds me a lot of the idea of “Low floor High Ceiling” that Jo Boaler talks about. We need tasks that are easy for students to get started on, but that do not limit students, particularly ones who have really caught onto the task and wish to keep going.


As a statistics educator, I see applications of statistics and probability everywhere. At a workshop on proportional thinking we were each asked to represent three-quarters, having been told that our A5 piece of paper was “one”. When I saw the different representations used by the participants, I could see a graph as a great way to represent it. You could make a quick set of axes on a whiteboard, and get people to put crosses on which representation they used. The task of categorising all the representations reinforces the idea that there are many ways to show the same thing. It also gets students more aware of the different representations. Then the barchart/dotplot provides a reminder of the outcome of the task. Students who are excited about this idea could make up a little questionnaire to take home and get other family members to draw different fractions, and look at the representations, adding them to the graph back at school.


Measurement is an area of the mathematics curriculum that is just begging to be combined with statistics. Just physically measuring an object leads to a variation in responses, which can be graphed. Getting each child to measure each object three times and take the middle value, should lead to a distribution of values with less spread. And then there is estimation. I love the example Dan Meyer uses in his Ted talk in 2010 of filling a tank with water. Students could be asked their estimate of the filling time, simply by guessing, and then use mathematical modelling to refine their estimate. Both values can be graphed and compared.

Area and Probability

Area calculations can be used nicely with probability. Children can invent games that involve tossing a coin onto a shape or shapes. The score depends on whether the coin lands within the shape, outside the shape or on a line. They can estimate what the score will be from 10 throws, simply by looking at the shape, then try it out with one lot of ten throws. Now do some area calculations. Students may have different ways of dealing with the overlap issue. Use the area calculations to improve their theoretical estimates of the probability of each outcome, and from there work out the expected value. Then do multiple trials of ten throws and see how you need to modify the model.  So much learning in one task!

Statistics obviously fits well in much topic work as well. The Olympics are looming, with all the interest and the flood of statistics they provide. Students can be given the fascinating question of which country does the best? There are so many ways to measure and to account for population. Drawing graphs gives an idea of spread and distribution.

There is so much you can do with statistics and other strands and other curriculum areas!  Statistics requires a context, and it is economical use of time if the context is something else you are teaching.

Can you tell me some ways you have incorporated statistics into other strands of mathematics or other subject areas?

Teaching sampling with dragon data cards

Data cards for teaching statistics

Data cards are a wonderful way for students to get a feel for data. As a University lecturer in the 1990s, I found that students often didn’t understand about the multivariate nature of data. This may well be an artifact of the kind of statistics they studied at school, which was univariate (finding the confidence interval for the mean of a set of numbers) or bivariate at best. And back then, when statistical analysis was done by hand calculation, this was all you could expect. How times have changed!

At the NZAMT (NZ Association of Mathematics Teachers) conference in 2015, both Dick de Veaux and Rob Gould suggested in their keynote addresses that students need to be exposed to multivariate data. Rob endorsed the use of data cards to enable this. Data cards are a wonderful tool for all levels of learning. In the New Zealand “Figure it out” series, there are several lessons that use data cards, generally made by the students themselves. We were inspired by this and have developed a set of 240 data cards with information about dragons, to help teachers and students learn and be successful in their statistical endeavours. In an earlier post I discuss the pros and cons of fictional data.

The Dragonistics data cards are now available to purchase, and we have a range of supporting materials for lessons and activities at various levels. You can find out more about data cards by clicking on this link.

Teaching Sampling using Dragonistics data cards

A small sample of Dragonistics data cards

A small sample of Dragonistics data cards

The real advantage of using data cards to teach sampling is that it is difficult, and approaching prohibitive, to record and analyse all the information. When you have a spreadsheet of data on a computer, to take a sample is contrived and can confuse students. They wonder why you would not simply analyse all the data for the population.Physically collecting data can take more time than is practical. With the data cards, we know we cannot easily process the data from all 240 or 480 dragons (depending on how many boxes you use.) Sampling then becomes a sensible solution. Different groups of students take different samples, and perform their own analysis, leading to similar, but not identical results. This shows the concept of variation due to sampling in a concrete and memorable way.

Some decades ago I developed a set of counters of four different colours, with data with different means and standard deviations. I used these to teach about the concept of sampling, and the students did ANOVA analysis on them to see if the means of the four groups were the same. This was a good way to teach this principle. However there were two limitations. The first limitation is that the data is not multivariate. There are just two


The old technology – two variables, and no embedded context

variables, colour and the number. And the second limitation is that there was no context. I made up a context to go with it, something around sales I think, as this was for an MBA class, which partly overcame that problem.


I’d like to think that I have learned from all the reading, research, experience, seminars etc on how to teach statistics that I have participated in. Consequently, were I to teach an MBA Quantitative methods class again, I think I would use the Dragon data cards. We have recently produced this lesson plan, that teaches about the concept of sampling and variation due to sampling. Dragon data cards could also be used for teaching about the mechanics of sampling, such as stratification and systematic sampling. There needs to be a story behind the analysis or there is no point to the conclusion. In the lesson previously alluded to, the scenario is that we are building separate shelters for male and female dragons, and it would be useful to have an idea of the relative strengths of male and female dragons.

Evidence and Distribution

Using data cards gives a wonderful opportunity to explore the concepts of evidence and of distribution. The students lay out their cards in a nice bar chart arrangement, and say, “See  – there is a difference.” Teachers should then ask for evidence. Students need to be able to articulate what evidence there is for the effect they have observed, and place it in context. We have found this to be a useful process when teaching students of all levels.

With regard to distribution, if we work only with numbers, and find the medians of the two groups and observe that the median is higher for one group than the other, this is rather limited information. By observing the distribution of the dragon cards between the two sexes, we can see that there is overlap. It is not a clearcut difference. Additionally we may observe other effects, such as due to colour, which we might like to explore further in another journey around the Statistical Enquiry Cycle.

Data cards are a win

It is fascinating that the concept of data cards is so new. It seems like an obvious idea, and makes concrete some very tricky abstract ideas. Data cards are useful at almost any level of understanding. As the need for understanding of statistics grows, there has been an emphasis on finding out better ways to teach for understanding. Clearly data cards are a win!



There’s more to reading graphs than meets the eye

There’s more to reading graphs than meets the eye

For those of us who know how to read a graph, it can be difficult to imagine what another person could find difficult. But then when I am presented with an unusual style of graph, or one where the data has been presented badly, I suddenly feel empathy for those who are less graph-literate.

Graphs are more common now as we have Excel to make them for us – for better or worse. An important skill for the citizens of tomorrow and today is to be able to read a graph or table and to be critical of how well it accomplishes its goals.

Here are some stages of reading a graph, much of which also applies to reading a table.

Reading about the graph

When one is familiar with graphs, and the graph is well made, we can become oblivious to the conventions. Just as readers know that English is written from left to right, graph readers understand that the height of a bar chart corresponds to the quantity of something. When people familiar with graphs look at a graph, they take in information unconsciously. This would include what type of graph it is – bar chart, line graph, scatterplot…and what it is about – the title, axis labels and legend tell us this. And they are also able to ignore unimportant aspects. For example if someone has made a 3-D bar chart, experienced graph-readers know that the thickness of the bar does not express information. Colours are generally used to distinguish different elements, but the choice of which colour is used is seldom part of the message. Other aspects about graphs, which may or may not be apparent, include the purpose of the graph and the source of the data.

Beginner graph readers need to learn how to use the various conventions to read ABOUT the data or graph. Any exploration of a graph needs to start with the question, “What is this graph about?”

Identifying one piece of data

When children start making and reading graphs, it is good for them to start with data about themselves, often represented in a picture graph, where each individual observation is shown.  A picture graph is concrete. Each child may point out their particular piece of data – the one that says that they like Wheaties, or prefer mushrooms on their pizza. This is an early stage in  the process of abstraction, that leads eventually to understanding less intuitive graphs such as the box and whisker or a time series chart. It is also important for all graph readers to be aware what each piece of data, or observation, represents and how it is represented.

Identifying one piece of data may help avoid the confusion of graphs which show raw data rather than summary data. For an example, a class may have data about the number of people in households. If this data is entered raw into a spreadsheet, and a graph created, we can end up with something like the graph immediately below (Graph 1).

This is not a good graph, but is what a naive user may well get out of Excel

Graph 1: This is not a good graph, but is what a naive user may well get out of Excel

In this we can identify that each member is represented by a bar, and the height gives the number of people in their family. I usually call this a value graph, as it shows only the individual values, with no aggregation.

A more useful representation of this same data is a summary bar chart, as shown below. (Graph 2) There are two dimensions operating. Horizontally we have the number of people in a household, and vertically we have the number of class members that have the corresponding number of people in their household. Note that it is less intuitive seeing where each class member is. Dividing the bar up into individual blocks can help with that.

Household size

Graph 2: A summary of the size of household for a group of people

Reading off the graph

In order to make sense of a graph, we often need to look at two dimensions simultaneously. If we wish to know how many people in the class come from a household of 5, we need to select along the horizontal axis, the value 5. Then we follow the bar up to the top and take our eye back to the vertical axis to see how high this value is. A ruler can help with this process.  When we read off a graph, our statements tend to be summaries of a single attribute, such as “There are 2 people who come from households of 6.”  “There are 17 dragons that breathe fire.”

Reading within the graph (comparisons, relationships)

Reading within the graph is a more complex task, even with simple graphs. When we read within a graph we are interested in comparisons and relationships. For example we may wish to see which breath type is most common among our herd of dragons. In order to answer this using the graph below, we first need to find the highest bar, by drawing our eye along the top, or drawing a ruler down the page. Then we look down that bar, and read of the name of the breath type. There are many more complex relationships, such as whether green dragons tend to be taller or shorter than red dragons, and which are more likely to be friendly. By introducing another attribute, we are in fact adding a dimension to our analysis.

This is a column chart (or bar chart) summarising the breath types.

This is a column chart (or bar chart) summarising the breath types.

Reading beyond the graph, beyond the data

This idea of reading beyond the data has been suggested as a step towards informal and then formal inference. We can perceive that our data does not represent all existing instances, and can make predictions or suppositions about what might happen in the other instances. For example, for our sample of dragons, we have seen that the green dragons tend to be more likely to be friendly than the red dragons. We could surmise that this holds over the other dragons as well. We can introduce this idea by asking the students, “I wish to have a new dragon join the herd and would prefer it to be friendly. Would I be better to get a green dragon or a red dragon?”

Judging the graph

The advantage of programs like Excel is that many people can make graphs without too much trouble. This is also a problem, as often the graph Excel produces is not really suitable for the task, and can have all sorts of visual clutter which obscures the information displayed. Learners need to think about the graph, either their own, or one they are reading and ask whether it is successful in communicating correctly the information that needs to be communicated. Does the graph serve the purpose it was created for?

I suggest that the steps listed here are a worthwhile structure to use in reading graphs, particularly for beginners. This then leads into another process, summarised as OSEM. You can read about this here in this post, A helpful structure for analysing graphs.

Papamoa College statistics excursion to Hamilton Zoo

Pizza in the park

Pizza in the park

Last week I had a lovely experience. I visited the Hamilton Observatory and Zoo as part of a Statistics excursion with the Year 13 statistics class of Papamoa College.

The trip was organised to help students learn about where data comes from. I went along because I really love teachers and students, and it was an opportunity to experience innovation by a team of wonderful teachers.  The students travelled from Papamoa to Hamilton, stopping for pizza in Cambridge. When we got to the Hamilton Observatory, Dave welcomed us and gave an excellent talk about the stars and data. I found it fascinating to think how much data there is, and also the level of (in)accuracy of their measurements.  I then gave a short talk on the importance of statistics in terms of citizenship, and how the students can be successful in learning statistics. I talked about analysis of the Disney Princess movies and the Zika virus.


My favourite animal of the day

The next morning we went over to the Hamilton Zoo for breakfast followed by a talk by Ken on the use of data in the Zoo. That too was fascinating, and got my brain whirring. Zoos these days are all about education and helping endangered species to survive. They have records of weights of all the animals over time, making for some very interesting data. Weights are used as an indication of health in the animals. Ken shared pictures of animals being weighed – including tricky keas and fantastically large rhinos. The Zoo also collects a wide range of other data, such as the visitor numbers, satisfaction surveys, quantity of waste and food consumption. We visited the food preparation area and heard how the diets are carefully worked out, and the food fed in such a way as to give the animals something to think about.

Papamoa stats class

Dr Nic and the teachers and students of Papamoa College give statistics two thumbs up!

Though most of my work these days is in the field of statistics education, a part of my heart still belongs to Operations Research. I saw so many ways in which OR could help with things such as diets, logistics etc. I’m not saying that they are doing anything wrong, but there is always room for improvement. Were I still teaching OR to graduate students I would be looking for a project with a zoo.

I am sure the students benefited from the experience of seeing first-hand the use of data in multiple contexts. I was glad to be able to meet with the students
and talk to many about the assignments they will be doing throughout the year. Each student has the opportunity to choose an application area for the multiple assessments. I was impressed with their level of motivation, which will lead to better learning outcomes.

Well done team at Papamoa!


Data for teaching – real, fake, fictional

There is a push for teachers and students to use real data in learning statistics. In this post I am going to address the benefits and drawbacks of different sources of real data, and make a case for the use of good fictional data as part of a statistical programme.

Here is a video introducing our fictional data set of 180 or 240 dragons, so you know what I am referring to.

Real collected, real database, trivial, fictional

There are two main types of real data. There is the real data that students themselves collect and there is real data in a dataset, collected by someone else, and available in its entirety. There are also two main types of unreal data. The first is trivial and lacking in context and useful only for teaching mathematical manipulation. The second is what I call fictional data, which is usually based on real-life data, but with some extra advantages, so long as it is skilfully generated. Poorly generated fictional data, as often found in case studies, is very bad for teaching.


When deciding what data to use for teaching statistics, it matters what it is that you are trying to teach. If you are simply teaching how to add up 8 numbers and divide the result by 8, then you are not actually doing statistics, and trivial fake data will suffice. Statistics only exists when there is a context. If you want to teach about the statistical enquiry process, then having the students genuinely involved at each stage of the process is a good idea. If you are particularly wanting to teach about fitting a regression line, you generally want to have multiple examples for students to use. And it would be helpful for there to be at least one linear relationship.

I read a very interesting article in “Teaching Children Mathematics” entitled, “Practıcal Problems: Using Literature to Teach Statistics”. The authors, Hourigan and Leavy, used a children’s book to generate the data on the number of times different characters appeared. But what I liked most, was that they addressed the need for a “driving question”. In this case the question was provided by a pre-school teacher who could only afford to buy one puppet for the book, and wanted to know which character appears the most in the story. The children practised collecting data as the story is read aloud. They collected their own data to analyse.

Let’s have a look at the different pros and cons of student-collected data, provided real data, and high-quality fictional data.

Collecting data

When we want students to experience the process of collecting real data, they need to collect real data. However real time data collection is time consuming, and probably not necessary every year. Student data collection can be simulated by a program such as The Islands, which I wrote about previously. Data students collect themselves is much more likely to have errors in it, or be “dirty” (which is a good thing). When students are only given clean datasets, such as those usually provided with textbooks, they do not learn the skills of deciding what to do with an errant data point. Fictional databases can also have dirty data, generated into it. The fictional inhabitants of The Islands sometimes lie, and often refuse to give consent for data collection on them.


One of the species of dragons included in our database

One of the species of dragons included in our database

I have heard that after a few years of school, graphs about cereal preference, number of siblings and type of pet get a little old. These topics, relating to the students, are motivating at first, but often there is no purpose to the investigation other than to get data for a graph.  Students need to move beyond their own experience and are keen to try something new. Data provided in a database can be motivating, if carefully chosen. There are opportunities to use databases that encourage awareness of social justice, the environment and politics. Fictional data must be motivating or there is no point! We chose dragons as a topic for our first set of fictional data, as dragons are interesting to boys and girls of most ages.

A meaningful  question

Here I refer again to that excellent article that talks about a driving question. There needs to be a reason for analysing the data. Maybe there is concern about food provided at the tuck shop, with healthy alternatives. Or can the question be tied into another area of the curriculum, such as which type of bean plant grows faster? Or can we increase the germination rate of seeds. The Census@school data has the potential for driving questions, but they probably need to be helped along. For existing datasets the driving question used by students might not be the same as the one (if any) driving the original collection of data. Sometimes that is because the original purpose is not ‘motivating’ for the students or not at an appropriate level. If you can’t find or make up a motivating meaningful question, the database is not appropriate. For our fictional dragon data, we have developed two scenarios – vaccinating for Pacific Draconian flu, and building shelters to make up for the deforestation of the island. With the vaccination scenario, we need to know about behaviour and size. For the shelter scenario we need to make decisions based on size, strength, behaviour and breath type. There is potential for a number of other scenarios that will also create driving questions.

Getting enough data

It can be difficult to get enough data for effects to show up. When students are limited to their class or family, this limits the number of observations. Only some databases have enough observations in them. There is no such problem with fictional databases, as you can just generate as much data as you need! There are special issues with regard to teaching about sampling, where you would want a large database with constrained access, like the Islands data, or the use of cards.


A problem with the data students collect is that it tends to be categorical, which limits the types of analysis that can be used. In databases, it can also be difficult to find measurement level data. In our fictional dragon database, we have height, strength and age, which all take numerical values. There are also four categorical variables. The Islands database has a large number of variables, both categorical and numerical.

Interesting Effects

Though it is good for students to understand that quite often there is no interesting effect, we would like students to have the satisfaction of finding interesting effects in the data, especially at the start. Interesting effects can be particularly exciting if the data is real, and they can apply their findings to the real world context. Student-collected-data is risky in terms of finding any noticeable relationships. It can be disappointing to do a long and involved study and find no effects. Databases from known studies can provide good effects, but unfortunately the variables with no effect tend to be left out of the databases, giving a false sense that there will always be effects. When we generate our fictional data, we make sure that there are the relationships we would like there, with enough interaction and noise. This is a highly skilled process, honed by decades of making up data for student assessment at university. (Guilty admission)


There are ethical issues to be addressed in the collection of real data from people the students know. Informed consent should be granted, and there needs to be thorough vetting. Young students (and not so young) can be damagingly direct in their questions. You may need to explain that it can be upsetting for people to be asked if they have been beaten or bullied. When using fictional data, that may appear real, such as the Islands data, it is important for students to be aware that the data is not real, even though it is based on real effects. This was one of the reasons we chose to build our first database on dragons, as we hope that will remove any concerns about whether the data is real or not!

The following table summarises the post.

Real data collected by the students Real existing database Fictional data
(The Islands, Kiwi Kapers, Dragons, Desserts)
Data collection Real experience Nil Sometimes
Dirty data Always Seldom Can be controlled
Motivating Can be Can be Must be!
Enough data Time consuming, difficult Hard to find Always
Meaningful question Sometimes. Can be trivial Can be difficult Part of the fictional scenario
Variables Tend towards nominal Often too few variables Generate as needed
Ethical issues Often Usually fine Need to manage reality
Effects Unpredictable Can be obvious or trivial, or difficult Can be managed

What does it mean to understand statistics?

It is possible to get a passing grade in a statistics paper by putting numbers into formulas and words into memorised phrases. In fact I suspect that this is a popular way for students to make their way through a required and often unwanted subject.

Most teachers of statistics would say that they would like students to understand what they are doing. This was a common sentiment expressed by participants in the excellent MOOC, Teaching statistics through data investigations (which is currently running again in January to May 2016.)


This makes me wonder what it means for students to understand statistics. There are many levels to understanding things. The concept of understanding has many nuances. If a person understands English, it means that they can use English with proficiency. If they are native speakers they may have little understanding of how grammar works, but they can still speak with correct grammar. We talk about understanding how a car works. I have no idea how a car works, apart from some idea that it requires petrol and the pistons go really, really fast. I can name parts of a car engine, such as distributor and drive shaft. But that doesn’t stop me from driving a car.

Understanding statistics

I propose that when we talk about teaching students to understand statistics, we want our students to know why they are doing something, and have an idea of how it works. Students also need to be fluent in the language of statistics. I would not expect any student of an introductory or high school statistics class to be able to explain how least squares regression works in terms of matrix algebra, but I would expect them to have an idea that the fitted line in a bivariate plot is a model that minimises the squared error terms. I’m not sure anyone needs to know why “degrees of freedom” are called that – or even really what degrees of freedom do. These days computer packages look after degrees of freedom for us. We DO need to understand what a p-value is, and what it is telling us. For many people it is not necessary to know how a p-value is calculated.

Ways to teach statistics

There are several approaches to teaching statistics. The approach needs to be tailored to the students and the context of the course. I prefer a hands-on, conceptual approach rather than a mathematical one. In current literature and practice there is a push for learning through investigations, often based around the statistical inquiry cycle. The problem with one long project is that students don’t get opportunities to apply principles in different situations, in such a way that will help in transfer of learning to other situations. There are some people who still teach statistics through the mathematical formulas, but I fear they are missing out on the opportunity to help students really enjoy statistics.

I do not propose to have all the answers, but we did discover one way to help students learn, alongside other methods. This approach is to use a short video, followed by a ten question true/false quiz. The quiz serves to reinforce and elaborate on concepts taught in the video, challenge students’ misconceptions, and help students be more familiar with the vocabulary and terminology of statistics. The quizzes we develop have multiple questions that randomise to give students the opportunity to try multiple times which seems to help understanding.

This short and entertaining video gives an illustration of how you can use videos and quizzes to help students learn difficult concepts.

And here is a link to a listing of all our videos and how you can get access to them. Statistics Learning Centre Videos

We have just started a newsletter letting people know of new products and hints for teaching. You can sign up here. Sign up for newsletter

The normal distribution – three tricky bits

There are several tricky things about teaching and understanding the normal distribution, and in this post I’m going to talk about three of them. They are the idea of a model, the limitations of the normal distribution, and the idea of the probability being the area under the graph.

It’s a model!

When people hear the term distribution, they tend to think of the normal distribution. It is an appealing idea, and remarkably versatile. The normal distribution is an appropriate model for the outcome of many natural, manufacturing and human endeavours. However, it is only a model, not a rule. But sometimes the way we talk about things as “being normally distributed” can encourage incorrect thinking.

This problem can be seen in exam questions about the application of the normal distribution. They imply that the normal distribution controls the universe.

Here is are examples of question starters taken from a textbook:

  1. “The time it takes Steve to walk to school follows a normal distribution with mean 30 minutes…”.
  2. Or “The time to failure for a new component is normally distributed with a mean of…”

This terminology is too prescriptive. There is no rule that says that Steve has to time his walks to school to fit a certain distribution. Nor does a machine create components that purposefully follow a normal distribution with regard to failure time. I remember, as a student being intrigued by this idea, not really understanding the concept of a model.

When we are teaching, and at other times, it is preferable to say that things are appropriately modelled by a normal distribution. This reminds students that the normal distribution is a model. The above examples could be rewritten as

  1. “The time it takes Steve to walk to school is appropriately modelled using a normal distribution with mean 30 minutes…”.
  2. And  “The time to failure for a new component is found to have a distribution well modelled by the normal, with a mean of…”

They may seem a little clumsy, but send the important message that the normal distribution is the approximation of a random process, not the other way around.

Not everything is normal

It is also important that students do not get the idea that all distributions, or even all continuous distributions are normal. The uniform distribution and negative exponential distributions are both useful in different circumstances, and look nothing like the normal distribution. And distributions of real entities can often have many zero values, that make a distribution far from normal-looking.

The normal distribution is great for things that measure mostly around a central value, and there are increasingly fewer things as you get further from the mean in both directions. I suspect most people can understand that in many areas of life you get lots of “average” people or things, and some really good and some really bad. (Except at Lake Wobegon “where all the women are strong, all the men are good looking, and all the children are above average.”)

However the normal distribution is not useful for modelling distributions that are heavily skewed. For instance, house prices tend to have a very long tail to the right, as there are some outrageously expensive houses, even several times the value of the median. At the same time there is a clear lower bound at zero, or somewhere above it.

Inter-arrival times are not well modelled by the normal distribution, but are well modelled by a negative exponential distribution. If we want to model how long it is likely to be before the next customer arrives, we would not expect there to be as many long times as there are short times, but fewer and fewer arrivals will occur with longer gaps.

Daily rainfall is not well modelled by the normal distribution as there will be many days of zero rainfall. Amount claimed in medical insurance or any kind of insurance are not going to be well modelled by the normal distribution as there are zero claims, and also the effect of excesses. Guest stay lengths at a hotel would not be well modelled by the normal distribution. Most guests will stay one or two days, and the longer the time, the fewer people would stay that long.

Area under the graph – idea of sand

The idea of the area under the graph being the probability of an outcome’s happening in that range is conceptually challenging. I was recently introduced to the sand metaphor by Holly-Lynne  and Todd Lee. If you think about each outcome as being a grain of sand (or a pixel in a picture) then you think about how likely it is to occur, by the size of the area that encloses it. I found the metaphor very appealing, and you can read the whole paper here:

Visual representations of empirical probability distributions when using the granular density metaphor

There are other aspects of the normal distribution that can be challenging. Here is our latest video to help you to teach and learn and understand the normal distribution.

Understanding Statistical Inference

Inference is THE big idea of statistics. This is where people come unstuck. Most people can accept the use of summary descriptive statistics and graphs. They can understand why data is needed. They can see that the way a sample is taken may affect how things turn out. They often understand the need for control groups. Most statistical concepts or ideas are readily explainable. But inference is a tricky, tricky idea. Well actually – it doesn’t need to be tricky, but the way it is generally taught makes it tricky.

Procedural competence with zero understanding

I cast my mind back to my first encounter with confidence intervals and hypothesis tests. I learned how to calculate them (by hand  – yes I am that old) but had not a clue what their point was. Not a single clue. I got an A in that course. This is a common occurrence. It is possible to remain blissfully unaware of what inference is all about, while answering procedural questions in exams correctly.

But, thanks to the research and thinking of a lot of really smart and dedicated statistics teachers, we are able put a stop to that. And we must. Help us make great resourcces

We need to explicitly teach what statistical inference is. Students do not learn to understand inference by doing calculations. We need to revisit the ideas behind inference frequently. The process of hypothesis testing, is counter-intuitive and so confusing that it spills its confusion over into the concept of inference. Confidence intervals are less confusing so a better intermediate point for understanding statistical inference. But we need to start with the concept of inference.

What is statistical inference?

The idea of inference is actually not that tricky if you unbundle the concept from the application or process.

The concept of statistical inference is this –

We want to know stuff about a large group of people or things (a population). We can’t ask or test them all so we take a sample. We use what we find out from the sample to draw conclusions about the population.

That is it. Now was that so hard?

Developing understanding of statistical inference in children

I have found the paper by Makar and Rubin, presenting a “framework for thinking about informal statistical inference”, particularly helpful. In this paper they summarise studies done with children learning about inference. They suggest that “ three key principles … appeared to be essential to informal statistical inference: (1) generalization, including predictions, parameter estimates, and conclusions, that extend beyond describing the given data; (2) the use of data as evidence for those generalizations; and (3) employment of probabilistic language in describing the generalization, including informal reference to levels of certainty about the conclusions drawn.” This can be summed up as Generalisation, Data as evidence, and Probabilistic Language.

We can lead into informal inference early on in the school curriculum. The key Ideas in the NZ curriculum suggest that “ teachers should be encouraging students to read beyond the data. Eg ‘If a new student joined our class, how many children do you think would be in their family?’” In other words, though we don’t specifically use the terms population and sample, we can conversationally draw attention to what we learn from this set of data, and how that might relate to other sets of data.

Explaining directly to Adults

When teaching adults we may use a more direct approach, explaining explicitly, alongside experiential learning to understanding inference. We have just completed made a video: Understanding Inference. Within the video we have presented three basic ideas condensed from the Five Big Ideas in the very helpful book published by NCTM, “Developing Essential Understanding of Statistics, Grades 9 -12”  by Peck, Gould and Miller and Zbiek.

Ideas underlying inference

  • A sample is likely to be a good representation of the population.
  • There is an element of uncertainty as to how well the sample represents the population
  • The way the sample is taken matters.

These ideas help to provide a rationale for thinking about inference, and allow students to justify what has often been assumed or taught mathematically. In addition several memorable examples involving apples, chocolate bars and opinion polls are provided. This is available for free use on YouTube. If you wish to have access to more of our videos than are available there, do email me at n.petty@statslc.com.

Please help us develop more great resources

We are currently developing exciting innovative materials to help students at all levels of the curriculum to understand and enjoy statistical analysis. We would REALLY appreciate it if any readers here today would help us out by answering this survey about fast food and dessert. It will take 10 minutes at a maximum. We don’t mind what country you are from, and will do the currency conversions.  And in a few months I will let you know how we got on. and we would love you to forward it to your friends and students to fill it out also – the more the merrier! It is an example of a well-designed questionnaire, with a meaningful purpose.



Summarising with Box and Whisker plots

In the Northern Hemisphere, it is the start of the school year, and thousands of eager students are beginning their study of statistics. I know this because this is the time of year when lots of people watch my video, Types of Data. On 23rd August the hits on the video bounced up out of their holiday slumber, just as they do every year. They gradually dwindle away until the end of January when they have a second jump in popularity, I suspect at the start of the second semester.

One of the first topics in many statistics courses is summary statistics. The greatest hits of summary statistics tend to be the mean and the standard deviation. I’ve written previously about what a difficult concept a mean is, and then another post about why the median is often preferable to the mean. In that one I promised a video. Over two years ago – oops. But we have now put these ideas into a video on summary statistics. Enjoy! In 5 minutes you can get a conceptual explanation on summary measures of position. (Also known as location or central tendency)


I was going to follow up with a video on spread and started to think about range, Interquartile range, mean absolute deviation, variance and standard deviation. So I decided instead to make a video on the wonderful boxplot, again comparing the shoe- owning habits of male and female students in a university in New Zealand.

Boxplots are great. When you combine them with dotplots as done in iNZIght and various other packages, they provide a wonderful way to get an overview of the distribution of a sample. More importantly, they provide a wonderful way to compare two samples or two groups within a sample. A distribution on its own has little meaning.

John Tukey was the first to make a box and whisker plot out of the 5-number summary way back in 1969. This was not long before I went to High School, so I never really heard about them until many years later. Drawing them by hand is less tedious than drawing a dotplot by hand, but still time consuming. We are SO lucky to have computers to make it possible to create graphs at the click of a mouse.

Sample distributions and summaries are not enormously interesting on their own, so I would suggest introducing boxplots as a way to compare two samples. Their worth then is apparent.

A colleague recently pointed out an interesting confusion and distinction. The interquartile range is the distance between the upper quartile and the lower quartile. The box in the box plot contains the middle 50% of the values in the sample. It is tempting for people to point this out and miss the point that the interquartile range is a good resistant measure of spread for the WHOLE sample. (Resistant means that it is not unduly affected by extreme values.) The range is a poor summary statistic as it is so easily affected by extreme values.

And now we come to our latest video, about the boxplot. This one is four and a half minutes long, and also uses the shoe sample as an example. I hope you and your students find it helpful. We have produced over 40 statistics videos, some of which are available for free on YouTube. If you are interested in using our videos in your teaching, do let us know and we will arrange access to the remainder of them.