# Spreadsheets, statistics, mathematics and computational thinking

We need to teach all our students how to design, create, test, debug and use spreadsheets. We need to teach this integrated with mathematics, statistics and computational thinking. Spreadsheets can be a valuable tool in many other subject areas including biology, physics, history and geography, thus facilitating integrated learning experiences.

Spreadsheets are versatile and ubiquitous – and most have errors. A web search on “How many spreadsheets have errors?” gives alarming results. The commonly quoted figure is 88%. These spreadsheets with errors are not just little home spreadsheets for cataloguing your Lego collection or planning your next vacation. These spreadsheets with errors involve millions of dollars, and life-affecting medical and scientific research.

# Using spreadsheets to teach statistics

## Use a spreadsheet to draw graphs

One of the great contributions computers make to statistical analysis is the ability to display graphs of non-trivial sets of data without onerous drawing by hand. In the early 1980s I had a summer job as a research assistant to a history professor. One of my tasks was to create a series of graphs of the imports and exports for New Zealand over several decades, illustrating the effect of the UK joining the Common Market (now the EU). It required fastidious drawing and considerable time. (And correcting fluid) These same graphs can now be created almost instantaneously, and the requirement has shifted to interpreting these graphs.

Similarly, in the classroom we should not be requiring students of any age to draw statistical graphs by hand. Drawing statistical graphs by hand is a waste of time. Students may enjoy creating the graphs by hand – I understand that – it is rewarding and not cognitively taxing. So is colouring in. The important skill that students need is to be able to read the graph – to find out what it is telling them and what it is not telling them. Their time would be far better spent looking at multiple graphs of different types, and learning how to report and critique them. They also need to be able to decide what graph will best show what they are looking for or communicating. (There will be teachers saying students need to draw graphs by hand to understand them. I’d like to know the evidence for this claim. People have said for years that students need to calculate standard deviation by hand to understand it, and I reject that also.)

At primary school level, the most useful graph is almost always the bar or column chart. These are easily created physically using data cards, or by entering category totals and using a spreadsheet. Here is a video showing just how easy it is.

## Use a spreadsheet for statistical calculations

Spreadsheets are also very capable of calculating summary statistics and creating hypothesis tests and confidence intervals. Dedicated statistical packages are better, but spreadsheets are generally good enough. I would also teach pivot-tables as soon as possible, but that is a topic for another day.

# Using spreadsheets to teach mathematics

Spreadsheets are so versatile! Spreadsheets help students to understand the concept of a variable. When you write a formula in a cell, you are creating an algebraic formula. Spreadsheets illustrate the need for sensible rounding and numeric display. Use of order of operations and brackets is essential. They can be used for exploring patterns and developing number sense. I have taught algebraic graphing, compared with line fitting using spreadsheets. Spreadsheets can solve algebraic problems. Spreadsheets make clear the concept of mathematics as a model. Combinatorics and Graph Theory are also enabled through spreadsheets. For users using a screenreader, the linear nature of formulas in spreadsheets makes it easier to read.

# Using spreadsheets to teach computational thinking

In New Zealand we are rolling out a new curriculum for information technology, including  computational thinking. At primary school level, computational thinking includes “[students] develop and debug simple programs that use inputs, outputs, sequence and iteration.” (Progress outcome 3, which is signposted to be reached at about Year 7) Later the curriculum includes branching.

In most cases the materials include unplugged activities, and coding using programmes such as Scratch or Java script. Robots such as Sphero and Lego make it all rather exciting.

All of these ideas can also be taught using a spreadsheet. Good spreadsheet design has clear inputs and outputs. The operations need to be performed in sequence, and iteration occurs when we have multiple rows in a spreadsheet. Spreadsheets need to be correct, robust and easy to use and modify. These are all important principles in coding. Unfortunately too many people have never had the background in coding and program design and thus their spreadsheets are messy, fragile, oblique and error-prone.

When we teach spreadsheets well to our students we are giving them a gift that will be useful for their life.

# Experience teaching spreadsheets

I designed and taught a course in quantitative methods for business, heavily centred on spreadsheets. The students were required to use spreadsheets for mathematical and statistical tasks. Many students have since expressed their gratitude that they are capable of creating and using spreadsheets, a skill that has proved useful in employment.

# Statistical software for worried students

Statistical software for worried students: Appearances matter

Let’s be honest. Most students of statistics are taking statistics because they have to. I asked my class of 100 business students who choose to take the quantitative methods course if they did not have to. Two hands went up.

Face it – statistics is necessary but not often embraced.

But actually it is worse than that. For many people statistics is the most dreaded course they are required to take. It can be the barrier to achieving their career goals as a psychologist, marketer or physician. (And it should be required for many other careers, such as journalism, law and sports commentator.)

## Choice of software

Consequently, we have worried students in our statistics courses. We want them to succeed, and to do that we need to reduce their worry. One decision that will affect their engagement and success is the choice of computer package. This decision rightly causes consternation to instructors. It is telling that one of the most frequently and consistently accessed posts on this blog is Excel, SPSS, Minitab or R. It has been  viewed 55,000 times in the last five years.

The problem of which package to use is no easier to solve than it was five years ago when I wrote the post. I am helping a tertiary institution to re-develop their on-line course in statistics. This is really fun – applying all the great advice and ideas from ”
Guidelines for Assessment and Instruction in Statistics” or GAISE. They asked for advice on what statistics package to use. And I am torn.

## Requirements

Here is what I want from a statistical teaching package:

• Easy to use
• Attractive to look at (See “Appearances Matter” below)
• Good instructional materials with videos etc (as this is an online course)
• Supports good pedagogy

If I’m honest I also want it to have the following characteristics:

• Guidance for students as to what is sensible
• Only the tests and options I want them to use in my course – not too many choices
• An interpretation of the output
• Data handling capabilities, including missing values
• A pop up saying “Are you sure you want to make a three dimensional pie-chart?”

Is this too much to ask?

Possibly.

## Overlapping objectives

Here is the thing. There are two objectives for introductory statistics courses that partly overlap and partly conflict. We want students to

• Learn what statistics is all about
• Learn how to do statistics.

They probably should not conflict, but they require different things from your software. If all we want the students to do is perform the statistical tests, then something like Excel is not a bad choice, as they get to learn Excel as well, which could be handy for c.v. expansion and job-getting. If we are more concerned about learning what statistics is all about, then an exploratory package like Tinkerplots or iNZight could be useful.

Ideally I would like students to learn both what statistics is all about and how to do it. But most of all, I want them to feel happy about doing statistical analysis.

## Appearances matter

Eye-appeal is important for overcoming fear. I am confident in mathematics, but a journal article with a page of Greek letters and mathematical symbols, makes me anxious. The Latex font makes me nervous. And an ugly logo puts me off a package. I know it is shallow. But it is a thing, and I suspect I am far from alone. Marketing people know that the choice of colour, word, placement – all sorts of superficial things effect whether a product sells. We need to sell our product, statistics, and to do that, it needs to be attractive. It may well be that the people who design software are less affected by appearance, but they are not the consumers.

## Terminal or continuing?

This is important: Most of our students will never do another statistical analysis.

Think about it :

Most of our students will never do another statistical analysis.

Here are the implications: It is important for the students to learn what statistics is about, where it is needed, potential problems and good communication and critique of statistical results. It is not important for students to learn how to program or use a complex package.

Students need to experience statistical analysis, to understand the process. They may also discover the excitement of a new set of data to explore, and the anticipation of an interesting result. These students may decide to study more statistics, at which time they will need to learn to operate a more comprehensive package. They will also be motivated to do so because they have chosen to continue to learn statistics.

## Excel

In my previous post I talked about Excel, SPSS, Minitab and R. I used to teach with Excel, and I know many of my past students have been grateful they learned it. But now I know better, and cannot, hand on heart recommend Excel as the main software. Students need to be able to play with the data, to look at various graphs, and get a feel for variation and structure. Excel’s graphing and data-handling capabilities, particularly with regard to missing values, are not helpful. The histograms are disastrous. Excel is useful for teaching students how to do statistics, but not what statistics is all about.

## SPSS and Minitab

SPSS was a personal favourite, but it has been a while since I used it. It is fairly expensive, and chances are the students will never use it again. I’m not sure how well it does data exploration. Minitab is another nice little package. Both of these are probably overkill for an introductory statistics course.

## R and R Commander

R is a useful and versatile statistical language for higher level statistical analysis and learning but it is not suitable for worried students. It is unattractive.

R Commander is a graphical user interface for R. It is free, and potentially friendlier than R. It comes with a book. I am told it is a helpful introduction to R. R Commander is also unattractive. The book was formatted in Latex. The installation guide looks daunting. That is enough to make me reluctant – and I like statistics!

The screenshot displayed on the front page of R Commander

## iNZight and iNZight Lite

I have used iNZight a lot. It was developed at the University of Auckland for use in their statistics course and in New Zealand schools. The full version is free and can be installed on PC and Mac computers, though there may be issues with running it on a Mac. The iNZight lite, web-based version is fine. It is free and works on any platform. I really like how easy it is to generate various plots to explore the data. You put in the data, and the graphs appear almost instantly. IiNZIght encourages engagement with the data, rather than doing things to data.

For a face-to-face course I would choose iNZight Lite. For an online course I would be a little concerned about the level of support material available. The newer version of iNZight, and iNZight lite have benefitted from some graphic design input. I like the colours and the new logo.

## Genstat

I’ve heard about Genstat for some time, as an alternative to iNZight for New Zealand schools, particularly as it does bootstrapping. So I requested an inspection copy. It has a friendly vibe. I like the dialog box suggesting the graph you might like try. It lacks the immediacy of iNZight lite. It has the multiple window thing going on, which can be tricky to navigate. I was pleased at the number of sample data sets.

## NZGrapher

NZGrapher is popular in New Zealand schools. It was created by a high school teacher in his spare time, and is attractive and lean. It is free, funded by donations and advertisements. You enter a data set, and it creates a wide range of graphs. It does not have the traditional tests that you would want in an introductory statistics course, as it is aimed at the NZ school curriculum requirements.

## Statcrunch

Statcrunch is a more attractive, polished package, with a wide range of supporting materials. I think this would give confidence to the students. It is specifically designed for teaching and learning and is almost conversational in approach. I have not had the opportunity to try out Statcrunch. It looks inviting, and was created by Webster West, a respected statistics educator. It is now distributed by Pearson.

## Jasp

I recently had my attention drawn to this new package. It is free, well-supported and has a clean, attractive interface. It has a vibe similar to SPSS. I like the immediate response as you begin your analysis. Jasp is free, and I was able to download it easily. It is not as graphical as iNZight, but is more traditional in its approach. For a course emphasising doing statistics, I like the look of this.

Data, controls and output from Jasp

# Conclusion

So there you have it. I have mentioned only a few packages, but I hope my musings have got you thinking about what to look for in a package. If I were teaching an introductory statistics course, I would use iNZight Lite, Jasp, and possibly Excel. I would use iNZight Lite for data exploration. I might use Jasp for hypothesis tests, confidence intervals and model fitting. And if possible I would teach Pivot Tables in Excel, and use it for any probability calculations.

This is a very important topic and I would appreciate input. Have I missed an important contender? What do you look for in a statistical package for an introductory statistics course? As a student, how important is it to you for the software to be attractive?

# Trade stands and cautious teachers

It is interesting to provide a trade stand at a teachers’ conference. Some teachers are keen to find out about new things, and come to see how we can help them. Others studiously avoid eye-contact in the fear that we might try to sell them something. Trade stand holders regularly put sweets and chocolate out as “bait” so that teachers will approach close enough to engage. Maybe it gives the teachers an excuse to come closer? Either way it is representative of the uneasy relationship that “trade” has with salaried educators.

# Money and education

Money and education have an uneasy relationship. For schools to function, they need considerable funding – always more than what they get. In New Zealand, and in many countries, education is predominantly funded by the state. Schools are built and equipped, teachers are paid and resources are purchased with money provided by the taxpayer. Extras are raised through donations from parents and fund-raising efforts. However, because it is not apparent that money is changing hands, schools are perceived as virtuous establishments, existing only because of the goodness of the teachers. This contrasts with the attitude to resource providers, who are sometimes treated as parasitic with their motives being all about the money. It is possible that some resource providers are in it just for the money, but it seems to me that there are richer seams to mine in health, sport, retail etc.

# Statistics Learning Centre is a social enterprise

Statistics Learning Centre is a social enterprise. We fit in the fuzzy area between “not-for-profit” and commercial enterprise. We measure our success by the impact we are having in empowering teachers to teach statistics and all people to understand statistics. We need money in order to continue to make an impact. Statistics Learning Centre has made considerable contributions to the teaching and learning of statistics in New Zealand and beyond for several years. This post lists just some of the impact we have had.  We believe in what we are doing, and work hard so that our social enterprise is on a solid financial footing.

# StatsLC empowers teachers

Soon after the change to the NCEA Statistics standards, there was a shortage of good quality practice external exams. Even the ones provided as official exemplars did not really fit the curriculum. Teachers approached us, requesting that we create practice exams that they could trust were correct and aligned to the curriculum. We did so in 2015 and 2016, at considerable personal effort and only marginal financial recompense. We see that as helping statistics to be better understood in schools and the wider community.

We, at Statistics Learning Centre, grasp at opportunities to teach teachers how to teach statistics better, to empower all teachers to teach statistics. Our workshops are well received, and we have regular attenders who know they will get value for their time. We use an inclusive, engaging approach, and participants have a good time. I believe in our resources – the videos, the quizzes, the data cards, the activities, the professional development. I believe that they are among the best you can get. So when I give workshops, I do talk about the resources. It would seem counter-productive for all concerned, not to mention contrived, to do otherwise. They are part of a full professional development session. Many mathematical associations have no trouble with this, and I love to go to conferences, and contribute.

I am aware that there are some commercial enterprises who wish to give commercial presentations at conferences. If their materials are not of a high standard, this can put the organisers in a difficult position. Consequently some organisations have a blanket ban on any presentations that reference any paid product. I feel this is a little unfortunate, as teachers miss out on worthwhile contributions. But I understand the problem.

# The Open Market model – supply and demand

I believe that there is value in a market model for resources.  People have suggested that we should get the Government to fund access to Statistics Learning Centre resources for all schools. That would be delightful, and give us the freedom and time to create even better resources. But that would make it almost impossible for any other new provider, who may have an even better product, to get a look in. When such a monopoly occurs, it reduces the incentives for providers to keep improving.

# Saving work for the teachers, and building on a product

Teachers want the best for their students, and have limited budgets. They may spend considerable amounts of time printing, cutting and laminating in order to provide teaching resources at a low cost. This was one of the drivers for producing our Dragonistics data cards – to provide at a reasonable cost, some ready-made, robust resources, so that teachers did not have to make their own. As it turned out we were able to provide interesting data with clear relationships, and engaging graphics so that we provide something more than just data turned into datacards.

# Free resources

There are free resources available on the internet. Other resources are provided by teachers who are sharing what they have done while teaching their own students. Resources provided for free can be of a high pedagogical standard. Having a high production standard, however, can be prohibitively expensive for individual producers who are working in their spare time.  It can also be tricky for another teacher to know what is suitable, and a lot of time can be spent trying to find high quality, reliable resources.

# Teachers and resource providers – a symbiotic relationship

Teachers need good resource providers. It makes sense for experts to create high quality resources, drawing on current thinking with regard to content specific pedagogy. These can support teachers, particularly in areas in which they are less confident, such as statistics. And they do need to be paid for their work.

It helps when people recognise that our materials are sound and innovative, when they give us opportunities to contribute and when they include us at the decision-making table. Let us know how we can help you, and in partnership we can become better bed-fellows.

What do you think?

(Note that this post is also being published on our blog: Building a Statistics Learning  Community, as I felt it was important,)

# Summarising with Box and Whisker plots

In the Northern Hemisphere, it is the start of the school year, and thousands of eager students are beginning their study of statistics. I know this because this is the time of year when lots of people watch my video, Types of Data. On 23rd August the hits on the video bounced up out of their holiday slumber, just as they do every year. They gradually dwindle away until the end of January when they have a second jump in popularity, I suspect at the start of the second semester.

One of the first topics in many statistics courses is summary statistics. The greatest hits of summary statistics tend to be the mean and the standard deviation. I’ve written previously about what a difficult concept a mean is, and then another post about why the median is often preferable to the mean. In that one I promised a video. Over two years ago – oops. But we have now put these ideas into a video on summary statistics. Enjoy! In 5 minutes you can get a conceptual explanation on summary measures of position. (Also known as location or central tendency)

I was going to follow up with a video on spread and started to think about range, Interquartile range, mean absolute deviation, variance and standard deviation. So I decided instead to make a video on the wonderful boxplot, again comparing the shoe- owning habits of male and female students in a university in New Zealand.

Boxplots are great. When you combine them with dotplots as done in iNZIght and various other packages, they provide a wonderful way to get an overview of the distribution of a sample. More importantly, they provide a wonderful way to compare two samples or two groups within a sample. A distribution on its own has little meaning.

John Tukey was the first to make a box and whisker plot out of the 5-number summary way back in 1969. This was not long before I went to High School, so I never really heard about them until many years later. Drawing them by hand is less tedious than drawing a dotplot by hand, but still time consuming. We are SO lucky to have computers to make it possible to create graphs at the click of a mouse.

Sample distributions and summaries are not enormously interesting on their own, so I would suggest introducing boxplots as a way to compare two samples. Their worth then is apparent.

A colleague recently pointed out an interesting confusion and distinction. The interquartile range is the distance between the upper quartile and the lower quartile. The box in the box plot contains the middle 50% of the values in the sample. It is tempting for people to point this out and miss the point that the interquartile range is a good resistant measure of spread for the WHOLE sample. (Resistant means that it is not unduly affected by extreme values.) The range is a poor summary statistic as it is so easily affected by extreme values.

And now we come to our latest video, about the boxplot. This one is four and a half minutes long, and also uses the shoe sample as an example. I hope you and your students find it helpful. We have produced over 40 statistics videos, some of which are available for free on YouTube. If you are interested in using our videos in your teaching, do let us know and we will arrange access to the remainder of them.

# Engaging students in learning statistics using The Islands.

## Three Problems and a Solution

Modern teaching methods for statistics have gone beyond the mathematical calculation of trivial problems. Computers can enable large size studies, bringing reality to the subject, but this is not without its own problems.

# Problem 1: Giving students experience of the whole statistical process

There are many reasons for students to learn statistics through running their own projects, following the complete statistical enquiry process, posing a problem, planning the data collection, collecting and cleaning the data, analysing the data and drawing conclusions that relate back to the original problem. Individual projects can be both time consuming and risky, as the quality of the report, and the resultant grade can be dependent on the quality of the data collected, which may be beyond the control of the student.

The Statistical Enquiry Cycle, which underpins the NZ statistics curriculum.

# Problem 2: Giving students experience of different types of sampling

If students are given an existing database and then asked to sample from it, this can be confusing for student and sends the misleading message that we would not want to use all the data available. But physically performing a sample, based on a sampling frame, can be prohibitively time consuming.

# Problem 3: Giving students experience conducting human experiments

The problem here is obvious. It is not ethical to perform experiments on humans simply to learn about performing experiments.

# An innovative solution: The Islands virtual world.

I recently ran an exciting workshop for teachers on using The Islands. My main difficulty was getting the participants to stop doing the assigned tasks long enough to discuss how we might implement this in their own classrooms. They were too busy clicking around different villages and people, finding subjects of the right age and getting them to run down a 15degree slope – all without leaving the classroom.

The Island was developed by Dr Michael Bulmer from the University of Queensland and is a synthetic learning environment. The Islands, the second version, is a free, online, virtual human population created for simulating data collection.

The synthetic learning environment overcomes practical and ethical issues with applied human research, and is used for teaching students at many different levels. For a login, email james.baglin @ rmit.edu.au (without the spaces in the email address).

There are now approximately 34,000 inhabitants of the Islands, who are born, have families (or not) and die in a speeded up time frame where 1 Island year is equivalent to about 28 earth days. They each carry a genetic code that affects their health etc. The database is dynamic, so every student will get different results from it.

Some of the Islanders

## Two magnificent features

To me the one of the two best features is the difficulty of acquiring data on individuals. It takes time for students to collect samples, as each subject must be asked individually, and the results recorded in a database. There is no easy access to the population. This is still much quicker than asking people in real-life (or “irl” as it is known on the social media.) It is obvious that you need to sample and to have a good sampling plan, and you need to work out how to record and deal with your data.

The other outstanding feature is the ability to run experiments. You can get a group of subjects and split them randomly into treatment and control groups. Then you can perform interventions, such as making them sit quietly or run about, or drink something, and then evaluate their performance on some other task. This is without requiring real-life ethical approval and informed consent. However, in a touch of reality the people of the Islands sometimes lie, and they don’t always give consent.

There are over 200 tasks that you can assign to your people, covering a wide range of topics. They include blood tests, urine tests, physiology, food and drinks, injections, tablets, mental tasks, coordination, exercise, music, environment etc. The tasks occur in real (reduced) time, so you are not inclined to include more tasks than are necessary. There is also the opportunity to survey your Islanders, with more than fifty possible questions. These also take time to answer, which encourages judicious choice of questions.

## Uses

In the workshop we used the Islands to learn about sampling distributions. First each teacher took a sample of one male and one female and timed them running down a hill. We made (fairly awful) dotplots on the whiteboard using sticky notes with the individual times on them. Then each teacher took a sample and found the median time. We used very small samples of 7 each as we were constrained by time, but larger samples would be preferable. We then looked at the distributions of the medians and compared that with the distribution of our first sample. The lesson was far from polished, but the message was clear, and it gave a really good feel for what a sampling distribution is.

Within the New Zealand curriculum, we could also use The Islands to learn about bivariate relationships, sampling methods and randomised experiments.

In my workshop I had educators from across the age groups, and a primary teacher assured me that Year 4 students would be able to make use of this. Fortunately there is a maturity filter so that you can remove options relating to drugs and sexual activity.

James Baglin from RMIT University has successfully trialled the Island with high school students and psychology research methods students. The owners of the Island generously allow free access to it. Thanks to James Baglin, who helped me prepare this post.

Here are links to some interesting papers that have been written about the use of The Islands in teaching. We are excited about the potential of this teaching tool.

Michael Bulmer and J. Kimberley Haladyn (2011) Life on an Island: a simulated population to support student projects in statistics. Technology Innovations in Statistics Education, 5(1).

Huynh, Baglin, Bedford (2014) Improving the attitudes of high school students towards statistics: An Island-based approach. ICOTS9

Bulmer, M. (2010). Technologies for enhancing project assessment in large classes. In C. Reading (Ed.), Proceedings of the Eighth International Conference on Teaching Statistics, July 2010. Ljubljana, Slovenia. Retrieved from http://www.stat.auckland.ac.nz/~iase/publications/icots8/ICOTS8_5D3_BULMER.pdf

Bulmer, M., & Haladyn, J. K. (2011). Life on an Island: A simulated population to support student projects in statistics. Technology Innovations in Statistics Education, 5. Retrieved from http://escholarship.org/uc/item/2q0740hv

Baglin, J., Bedford, A., & Bulmer, M. (2013). Students’ experiences and perceptions of using a virtual environment for project-based assessment in an online introductory statistics course. Technology Innovations in Statistics Education, 7(2), 1–15. Retrieved from http://www.escholarship.org/uc/item/137120mt

# Learning to teach statistics, in a MOOC

I am participating in a MOOC, Teaching statistics through data investigations. A MOOC is a fancy name for an online, free, correspondence course.  The letters stand for Massive Open Online Course. I decided to enrol for several reasons. First I am always keen to learn new things. Second, I wanted to experience what it is like to be a student in a MOOC. And third I wanted to see what materials we could produce that might help teachers or learners of statistics in the US. We are doing well in the NZ market, but it isn’t really big enough to earn us enough money to do some of the really cool things we want to do in teaching statistics to the masses.

I am now up to Unit 4, and here is what I have learned so far:

# Motivation and persistence

It is really difficult to stay motivated even in the best possible MOOC. Life gets in the way and there is always something more pressing than reading the materials, taking part in discussions and watching the videos. I looked up the rate of completion for MOOCs, and this article from IEEE gives the completion rate at 5%. Obviously it will differ between MOOCs, depending on the content, the style, the reward. I have found I am best to schedule time to apply to the MOOC each week, or it just doesn’t happen.

# I know more than I thought I did

It is reassuring to find out that I really do have some expertise. (This may be a bit of a worry to those of you who regularly read my blog and think I am an expert in teaching statistics.) My efforts to read and ponder, to discuss and to experiment have meant that I do know more than teachers who are just beginning to teach statistics. Phew!

# The investigative process matters

I finally get the importance of the Statistical Enquiry Cycle (PPDAC in New Zealand) or Statistical Investigation Cycle (Pose Collect, Analyse, Interpret in the US). I sort of got it before, but now it is falling into place. In the old-fashioned approach to teaching statistics, almost all the emphasis was on the calculations. There would be questions asking students to find the mean of a set of numbers, with no context. This is not statistics, but an arithmetic exercise. Unless a question is embedded in the statistical process, it is not statistics. There needs to be a reason, a question to answer, real data and a conclusion to draw. Every time we develop a teaching exercise for students, we need to think about where it sits in the process, and provide the context.

# Brilliant questions

I was happy to participate in the LOCUS quiz to evaluate my own statistical understanding. I was relieved to get 100%. But I was SO impressed with the questions, which reflected the work and thinking that have produced them. I understand how difficult it is to write questions to teach and assess statistical understanding, as I have written hundreds of them myself. The FOCUS questions are great questions. I will be writing some of my own following their style. I loved the ones that asked what would be the best way to improve an experimental design. Inspired!

# It’s easier to teach the number stuff

I’m sure I knew this, but to see so many teachers say it, cemented it in. Teacher after teacher commented that teaching procedure is so much easier than teaching concepts. Testing knowledge of procedure is so much easier than assessing conceptual understanding. Maths teachers are really good at procedure. That fluffy, hand-waving meaning stuff is just…difficult. And it all depends. Every answer depends! The implication of this is that we need to help teachers become more confident in helping students to learn the concepts of statistics. We need to develop materials that focus on the concepts. I’m pretty happy that most of my videos do just that – my “Understanding Confidence Intervals” is possibly the only video on confidence intervals that does not include a calculation or procedure.

# You learn from other participants

I’ve never been keen on group work. I suspect this is true of most over-achievers. We don’t like to work with other people on assignments as they might freeload, or worse – drag our grade down. Over the years I’ve forced students to do group assignments, as they learn so much more in the process. And I hate to admit that I have also learned more when forced to do group assignments. It isn’t just about reducing the marking load. In this MOOC we are encouraged to engage with other participants through the discussion forums. This is an important part of on-line learning, particularly in a solely on-line platform (as opposed to blended learning). I just love reading what other people say. I get ideas, and I understand better where other people are coming from.

# I have something to offer

It was pretty exciting to see my own video used as a resource in the course, and to hear from the instructor how she loves our Statistics Learning Centre videos.

# What now?

I still have a few weeks to run on the MOOC and I will report back on what else I learn. And then in late May I am going to USCOTS (US Conference on Teaching Statistics). It’s going to cost me a bit to get there, living as I do in the middle of nowhere in Middle Earth. But I am thrilled to be able to meet with the movers and shakers in US teaching of statistics. I’ll keep you posted!

# Everyone wants to learn about ordinal data!

I have a video channel with about 40 videos about statistics, and I love watching to see which videos are getting the most viewing each day. As the Fall term has recently started in the northern hemisphere, the most popular video over the last month is “Types of Data: Nominal, Ordinal, Interval/Ratio.” Similarly one of the most consistently viewed posts in this blog is one I wrote over a year ago, entitled, “Oh Ordinal Data, what do we do with you?”. Understanding about the different levels of data, and what we do with them, is obviously an important introductory topic in many statistical courses. In this post I’m going to look at why this is, as it may prove useful to learner and teacher alike.

And I’m happy to announce the launch of our new Snack-size course: Types of Data. For \$2.50US, anyone can sign up and get access to video, notes, quizzes and activities that will help them, in about an hour, gain a thorough understanding of types of data.

Costing no more than a box of popcorn, our snack-size course will help help you learn all you need to know about types of data.

# The Big Deal

Data is essential to statistical analysis. Without data there is no investigative process. Data can be generated through experiments, through observational studies, or dug out from historic sources. I get quite excited at the thought of the wonderful insights that good statistical analysis can produce, and the stories it can tell. A new database to play with is like Christmas morning!

But all data is not the same. We need to categorise the data to decide what to do with it for analysis, and what graphs are most appropriate. There are many good and not-so-good statistical tools available, thanks to the wonders of computer power, but they need to be driven by someone with some idea of what is sensible or meaningful.

A video that becomes popular later in the semester is entitled, “Choosing the test”. This video gives a procedure for deciding which of seven common statistical tests is most appropriate for a given analysis. It lists three things to think about – the level of data, the number of samples, and the purpose of the analysis. We developed this procedure over several years with introductory quantitative methods students. A more sophisticated approach may be necessary at higher levels, but for a terminal course in statistics, this helped students to put their new learning into a structure. Being able to discern what level of data is involved is pivotal to deciding on the appropriate test.

# Categorical Data

In many textbooks and courses, the types of data are split into two – categorical and measurement. Most state that nominal and ordinal data are categorical. With categorical data we can only count the responses to a category, rather than collect up values that are measurements or counts themselves. Examples of categorical data are colour of car, ethnicity, choice of vegetable, or type of chocolate.

With Nominal data, we report frequencies or percentages, and display our data with a bar chart, or occasionally a pie chart. We can’t find a mean of nominal data. However if the different responses are coded as numbers for ease of use in a database, it is technically possible to calculate the mean and standard deviation of those numbers. A novice analyst may do so and produce nonsense output.

The very first data most children will deal with is nominal data. They collect counts of objects and draw pictograms or bar charts of them. They ask questions such as “How many children have a cat at home?” or “Do more boys than girls like Lego as their favourite toy?” In each of these cases the data is nominal, probably collected by a survey asking questions like “What pets do you have?” and “What is your favourite toy?”

# Ordinal data

Another category of data is ordinal, and this is the one that causes the most problems in understanding. My blog discusses this. Ordinal data has order, and numbers assigned to responses are meaningful, in that each level is “more” than the previous level. We are frequently exposed to ordinal data in opinion polls, asking whether we strongly disagree, disagree, agree or strongly agree with something. It would be acceptable to put the responses in the opposite order, but it would have been confusing to list them in alphabetical order: agree, disagree, strongly agree, strongly disagree. What stops ordinal data from being measurement data is that we can’t be sure about how far apart the different levels on the scale are. Sometimes it is obvious that we can’t tell how far apart they are. An example of this might be the scale assigned by a movie reviewer. It is clear that a 4 star movie is better than a 3 star movie, but we can’t say how much better. Other times, when a scale is well defined and the circumstances are right, ordinal data is appropriately, but cautiously treated as interval data.

# Measurement Data

The most versatile data is measurement data, which can be split into interval or ratio, depending on whether ratios of numbers have meaning. For example temperature is interval data, as it makes no sense to say that 70 degrees is twice as hot as 35 degrees. Weight, on the other hand, is ratio data, as it is true to say that 70 kg is twice as heavy as 35kg.

A more useful way to split up measurement data, for statistical analysis purposes, is into discrete or continuous data. I had always explained that discrete data was counts, and recorded as whole numbers, and that continuous data was measurements, and could take any values within a range. This definition works to a certain degree, but I recently found a better way of looking at it in the textbook published by Wiley, Chance Encounters, by Wild and Seber.

“In analyzing data, the main criterion for deciding whether to treat a variable as discrete or continuous is whether the data on that variable contains a large number of different values that are seldom repeated or a relatively small number of distinct values that keep reappearing. Variables with few repeated values are treated as continuous. Variables with many repeated values are treated as discrete.”

An example of this is the price of apps in the App store. There are only about twenty prices that can be charged – 0.99, 1.99, 2.99 etc. These are neither whole numbers, nor counts, but as you cannot have a price in between the given numbers, and there is only a small number of possibilities, this is best treated as discrete data. Conversely, the number of people attending a rock concert is a count, and you cannot get fractions of people. However, as there is a wide range of possible values, and it is unlikely that you will get exactly the same number of people at more than one concert, this data is actually continuous.

Maybe I need to redo my video now, in light of this!

And please take a look at our new course. If you are an instructor, you might like to recommend it for your students.

# Calculus is the wrong summit of the pyramid.

“The mathematics curriculum that we have is based on a foundation of arithmetic and algebra. And everything we learn after that is building up towards one subject. And at top of that pyramid, it’s calculus. And I’m here to say that I think that that is the wrong summit of the pyramid … that the correct summit — that all of our students, every high school graduate should know — should be statistics: probability and statistics.”

Ted talk by Arthur Benjamin in February 2009. Watch it – it’s only 3 minutes long.

He’s right, you know.

And New Zealand would be the place to start. In New Zealand, the subject of statistics is the second most popular subject in our final year of schooling, with a cohort of 12,606. By comparison, the cohort for  English is 16,445, and calculus has a final year cohort of 8392, similar in size to Biology (9038), Chemistry (8183) and Physics (7533).

Some might argue that statistics is already the summit of our curriculum pyramid, but I would see it more as an overly large branch that threatens to unbalance the mathematics tree. I suspect many maths teachers would see it more as a parasite that threatens to suck the life out of their beloved calculus tree. The pyramid needs some reconstruction if we are really to have a statistics-centric curriculum. (Or the tree needs pruning and reshaping – I think I have too many metaphors!)

# Statistics-centric curriculum

So, to use a popular phrase, what would a statistics-centric curriculum look like? And what would be the advantages and disadvantages of such a curriculum? I will deal with implementation issues later.

To start with, the base of the pyramid would look little different from the calculus-pinnacled pyramid. In the early years of schooling the emphasis would be on number skills (arithmetic), measurement and other practical and concrete aspects. There would also be a small but increased emphasis on data collection and uncertainty. This is in fact present in the NZ curriculum. Algebra would be introduced, but as a part of the curriculum, rather than the central idea. There would be much more data collection, and probability-based experimentation. Uncertainty would be embraced, rather than ignored.

In the early years of high school, probability and statistics would take a more central place in the curriculum, so that students develop important skills ready for their pinnacle course in the final two years. They would know about the statistical enquiry cycle, how to plan and collect data and write questionnaires.  They would perform their own experiments, preferably in tandem with other curriculum areas such as biology, food-tech or economics. They would understand randomness and modelling. They would be able to make critical comments about reports in the media . They would use computers to create graphs and perform analyses.

As they approach the summit, most students would focus on statistics, while those who were planning to pursue a career in engineering would also take calculus. In the final two years students would be ready to build their own probabilistic models to simulate real-world situations and solve problems. They would analyse real data and write coherent reports. They would truly understand the concept of inference, and why confidence intervals are needed, rather than calculating them by hand or deriving formulas.

There is always a trade-off. Here is my take on the skills developed in each of the curricula.

## Statistics-centric curriculum

Logical thinking Communication
Abstract thinking Dealing with uncertainty and ambiguity
Problem-solving Probabilistic models
Modelling (mainly deterministic) Argumentation, deduction
Proof, induction Critical thinking
Plotting deterministic graphs from formulas Reading and creating tables and graphs from data

I actually think you also learn many of the calc-centric skills in the stats-centric curriculum, but I wanted to look even-handed.

## Implementation issues

Benjamin suggests, with charming optimism, that the new focus would be “easy to implement and inexpensive.”  I have been a very interested observer in the implementation of the new statistics curriculum in New Zealand. It has not happened easily, being inexpensive has been costly, and there has been fallout. Teachers from other countries (of which there are many in mathematics teaching in NZ) have expressed amazement at how much the NZ teachers accept with only murmurs of complaint. We are a nation with a “can do” attitude, who, by virtue of small population and a one-tier government, can be very flexible. So long as we refrain from following the follies of our big siblings, the UK, US and Australia, NZ has managed to have a world-class education system. And when a new curriculum is implemented, though there is unrest and stress, there is seldom outright rebellion.

In my business, I get the joy of visiting many schools and talking with teachers of mathematics and statistics. I am fascinated by the difference between schools, which is very much a function of the head of mathematics and principal. Some have embraced the changes in focus, and are proactively developing pathways to help all students and teachers to succeed. Others are struggling to accept that statistics has a place in the mathematics curriculum, and put the teachers of statistics into a ghetto where they are punished with excessive marking demands.

The problem is that the curriculum change has been done “on the cheap”. As well as being small and nimble, NZ is not exactly rich. The curriculum change needed more advisors, more release time for teachers to develop and more computer power. These all cost. And then you have the problem of “me too” from other subjects who have had what they feel are similar changes.

And this is not really embracing a full stats-centric curriculum. Primary school teachers need training in probability and statistics if we are really to implement Benjamin’s idea fully. The cost here is much greater as there are so many more primary school teachers. It may well take a generation of students to go through the curriculum and enter back as teachers with an improved understanding.

## Computers make it possible

Without computers the only statistical analysis that was possible in the classroom was trivial. Statistics was reduced to mechanistic and boring hand calculation of light-weight statistics and time-filling graph construction. With computers, graphs and analysis can be performed at the click of a mouse, making graphs a tool, rather than an endpoint. With computing power available real data can be used, and real problems can be addressed. High level thinking is needed to make sense and judgements and to avoid wrong conclusions.

Conversely, the computer has made much of calculus superfluous. With programs that can bash their way happily through millions of iterations of a heuristic algorithm, the need for analytic methods is seriously reduced. When even simple apps on an iPad can solve an algebraic equation, and Excel can use “What if” to find solutions, the need for algebra is also questionable.

## Efficient citizens

In H.G. Wells’ popular but misquoted words, efficient citizenry calls for the ability to make sense of data. As the science fiction-writer that he was, he foresaw the masses of data that would be collected and available to the great unwashed. The levelling nature of the web has made everyone a potential statistician.

According to the engaging new site from the ASA, “This is statistics”, statisticians make a difference, have fun, satisfy curiosity and make money. And these days they don’t all need to be good at calculus.

Let’s start redesigning our pyramid.

# Support Dr Nic and Statistics Learning Centre videos

This is a short post, sometimes called e-begging!
I had been toying with the idea of a Kickstarter project, as a way for supporters of my work to help us keep going. Kickstarter is a form of crowd-sourcing, which lets a whole lot of people each contribute a little bit to get a project off the ground.

But we don’t really have one big project, but rather a stream of videos and web-posts to support the teaching and learning of statistics. Patreon provides a more incremental way for appreciative fans to support the work of content creators.

You can see a video about it here:

And here is a link to the Patreon page: Link to Patreon

Rather than producing for one big publishing company, who then hold the rights to our material, we would love to keep making our content freely available to all. You can help, with just a few dollars per video.

# Those who can, teach statistics

The phrase I despise more than any in popular use (and believe me there are many contenders) is “Those who can, do, and those who can’t, teach.” I like many of the sayings of George Bernard Shaw, but this one is dismissive, and ignorant and born of jealousy. To me, the ability to teach something is a step higher than being able to do it. The PhD, the highest qualification in academia, is a doctorate. The word “doctor” comes from the Latin word for teacher.

Teaching is a noble profession, on which all other noble professions rest. Teachers are generally motivated by altruism, and often go well beyond the requirements of their job-description to help students. Teachers are derided for their lack of importance, and the easiness of their job. Yet at the same time teachers are expected to undo the ills of society. Everyone “knows” what teachers should do better. Teachers are judged on their output, as if they were the only factor in the mix. Yet how many people really believe their success or failure is due only to the efforts of their teacher?

For some people, teaching comes naturally. But even then, there is the need for pedagogical content knowledge. Teaching is not a generic skill that transfers seamlessly between disciplines. You must be a thinker to be a good teacher. It is not enough to perpetuate the methods you were taught with. Reflection is a necessary part of developing as a teacher. I wrote in an earlier post, “You’re teaching it wrong”, about the process of reflection. Teachers need to know their material, and keep up-to-date with ways of teaching it. They need to be aware of ways that students will have difficulties. Teachers, by sharing ideas and research, can be part of a communal endeavour to increase both content knowledge and pedagogical content knowledge.

There is a difference between being an explainer and being a teacher. Sal Khan, maker of the Khan Academy videos, is a very good explainer. Consequently many students who view the videos are happy that elements of maths and physics that they couldn’t do, have been explained in such a way that they can solve homework problems. This is great. Explaining is an important element in teaching. My own videos aim to explain in such a way that students make sense of difficult concepts, though some videos also illustrate procedure.

Teaching is much more than explaining. Teaching includes awakening a desire to learn and providing the experiences that will help a student to learn.  In these days of ever-expanding knowledge, a content-driven approach to learning and teaching will not serve our citizens well in the long run. Students need to be empowered to seek learning, to criticize, to integrate their knowledge with their life experiences. Learning should be a transformative experience. For this to take place, the teachers need to employ a variety of learner-focussed approaches, as well as explaining.

It cracks me up, the way sugary cereals are advertised as “part of a healthy breakfast”. It isn’t exactly lying, but the healthy breakfast would do pretty well without the sugar-filled cereal. Explanations really are part of a good learning experience, but need to be complemented by discussion, participation, practice and critique.  Explanations are like porridge – healthy, but not a complete breakfast on their own.

## Why statistics is so hard to teach

“I’m taking statistics in college next year, and I can’t wait!” said nobody ever!

Not many people actually want to study statistics. Fortunately many people have no choice but to study statistics, as they need it. How much nicer it would be to think that people were studying your subject because they wanted to, rather than because it is necessary for psychology/medicine/biology etc.

In New Zealand, with the changed school curriculum that gives greater focus to statistics, there is a possibility that one day students will be excited to study stats. I am impressed at the way so many teachers have embraced the changed curriculum, despite limited resources, and late changes to assessment specifications. In a few years as teachers become more familiar with and start to specialise in statistics, the change will really take hold, and the rest of the world will watch in awe.

In the meantime, though, let us look at why statistics is difficult to teach.

1. Students generally take statistics out of necessity.
2. Statistics is a mixture of quantitative and communication skills.
3. It is not clear which are right and wrong answers.
4. Statistical terminology is both vague and specific.
5. It is difficult to get good resources, using real data in meaningful contexts.
6. One of the basic procedures, hypothesis testing, is counter-intuitive.
7. Because the teaching of statistics is comparatively recent, there is little developed pedagogical content knowledge. (Though this is growing)
8. Technology is forever advancing, requiring regular updating of materials and teaching approaches.

On the other hand, statistics is also a fantastic subject to teach.

1. Statistics is immediately applicable to life.
2. It links in with interesting and diverse contexts, including subjects students themselves take.
3. Studying statistics enables class discussion and debate.
4. Statistics is necessary and does good.
5. The study of data and chance can change the way people see the world.
6. Technlogical advances have put the power for real statistical analysis into the hands of students.
7. Because the teaching of statistics is new, individuals can make a difference in the way statistics is viewed and taught.

I love to teach. These days many of my students are scattered over the world, watching my videos (for free) on YouTube. It warms my heart when they thank me for making something clear, that had been confusing. I realise that my efforts are small compared to what their teacher is doing, but it is great to be a part of it.