Teaching statistical report-writing

Teaching how to write statistical reports

It is difficult to write statistical reports and it is difficult to teach how to write statistical reports.

When statistics is taught in the traditional way, with emphasis on the underlying mathematics the process of statistics is truncated at both ends. When we concentrate on the sterile analysis, the messy “writing stuff” is avoided. Students do not devise their own investigative questions, and they do not write up the results.

Here’s the thing though – in reality, the analysis step of a statistical investigation is a very small part of the whole, and performed at the click of a button or two.

Ultimately the embedding of the analysis back into an investigation should not be a problem. The really interesting part of statistics happens all around the analysis. Understanding the context enriches the learning, transforming the discipline from mathematics to statistics. We can help students embrace the excitement of a true statistical investiation. But in this time of transition, the report-writing aspects are a problem. They are a problem for the learner and for the teacher.

The new New Zealand curriculum for statistics requires report-writing as an essential component of the majority of assessment, particularly at the final year of high school. This is causing understandable concern among teachers, who come predominantly from a mathematical background. I can imagine myself a few years ago saying. “I became a maths teacher so I wouldn’t have to teach and mark essays!” In addition the results from the students are less than stellar, even from capable students. Teachers do not like their students to perform poorly.

All statistics courses should have a component of report-writing, unless they are courses in the mathematics of statistics. The problem here is, like the secondary school teachers in New Zealand, many statistics instructors are dealing with the mathematics more than the application of statistics, and are not confident of their own ability at report-writing themselves. Normal human behaviour is to avoid it. Having taught service statistics courses in a business school for two decades, I have gradually made the transition to more emphasis on report-writing and am convinced that statistical report-writing needs to be taught explicitly, and taught well.

Report-writing is a fundamental and useful skill

For teachers who are uncomfortable with teaching and marking reports, it would be nice to dismiss the process of report-writing  as “not important”. Much of statistics teaching is in a service course, as discussed in my previous blog. It is unlikely that any of these students will ever have to write a report on a statistical analysis, other than as part of the assessment for the course.  So why do we put them and ourselves through this?

You don’t realise whether you understand or not until you try to write it down.

The written word requires a higher level of precision than a thought or a spoken explanation. Your sentences look at you from the page and mock you with their vagueness and ambiguity. I find this out time and again as I blog. What seems like a well thought out argument in my head as I do my morning run, falls to shreds on paper, before being mustered into some semblance of order. It is in writing that we identify the flaws in our understanding. As we try to write our findings we become more aware of fuzzy thinking and gaps in reasoning. As we write we are required to organise our thoughts.

Better critics of other reports

A student who has been required to produce a report of a good standard will be exposed to examples of good and bad reports and will be better able to identify incorrect thinking in reports they read themselves. This is perhaps the most important purpose of a terminal course in statistics. Having said that, it is both heart-warming and alarming to hear from past-students the wonderful things they are doing with the statistics they learned in my one-semester course.

Useful skill for employment

Students need to be able to read and write as part of empowered citizenship. The skill of writing a coherent report in good English is highly sought after by employers, and of great use at university in just about every discipline. It is a transferable skill to many endeavours.

Reports are needed for assessment

On a practical level, if the teacher is going to evaluate understanding they need evidence to work from. A written report provides one form of evidence of understanding.

Report-writing is difficult to teach

Some maths teachers may feel inadequate in teaching “English”, as they see report-writing. They do not have the pedagogical content knowledge in teaching writing that they do for teaching algebra or percentages, for instance. Pedagogical content knowledge is more than the intersection of knowing a subject, and being able to teach in a general sort of way. It is the knowledge of how to teach a certain discipline, what is difficult to learners, and how to help them learn.

Some basic ideas for teaching report-writing

To write at good report you need to understand what is going on, have the appropriate vocabulary, and use a clear structure. Good teaching will emphasise understanding. Getting students to write sentences about output, and sharing them with their peers is a great way to identify misunderstandings. As these sentences are shared, the teacher can model the use of correct technical language. They can say, for instance, “You have the essence correct here, but there are some more precise terms you could use, such as …” Teachers can either give students outlines for reports, or they can give them several good reports and get the students to identify the underlying structure. I am a firm believer in the generous use of headings within a report. They provide signposts for writer and reader alike.

You can see this in my video, Writing up a Time Series Report.

Report-writing requires practice. The assessment report should not be the first report of that type that a student writes. In the world of motivated students with no other demands on their time, it would be great to have them write up one assignment for the practice and then learn from that to produce a better one. I am aware that students tend not to do the work unless there is a grade attached to it, so it can be difficult to get a student to do a “practice report” ahead of the “real assessment.”  There are other alternatives that approximate this, however, which require less input from the teacher. One of these, the use of templates, is explained in an earlier post, Templates for statistical reports – spoon-feeding?

There is nothing wrong with using templates and “sensible sentences”. (not to be confused with “sensible sentencing”, which seems devoid of sense.) There are only so many ways to say that “the median number of pairs of shoes owned by women is ten.” It is also a difficult sentence to make sound elegant. Good reports will look similar. This is not creative-writing – it is report-writing. Sure the marking may be boring when all the reports seem very similar, but it is a small price to pay when you avoid banging your head against the desk at the bizarre and disorganised offerings.

This is but a musing on the teaching of report-writing. Glenda Francis, in  “An approach to report writing in statistics courses” identifies similar issues, and provides a fuller background to the problem. She also indicates that there is much to be done in developing this area of teaching and research. I will be providing professional development in this area over the next month to at least three groups of teachers, and I look forward to learning a great deal from them, as we explore these issues together.

Is statistical enquiry a cycle?

What is the statistical enquiry cycle and why is it a cycle? Is it really a cycle?

The New Zealand curriculum for Mathematics and statistics was recently held up as an example of good practice with regard to statistics. Yay us! In New Zealand the learning of statistics starts at the beginning of schooling and is part of the curriculum right through the school years. Statistics is developed as a discipline alongside mathematics, rather than as a subset of it. There are mathematics teachers who view this as an aberration, and believe that when this particular fad is over statistics will go back where it belongs, tucked quietly behind measurement, algebra and arithmetic. But the statisticians rejoice that the rich and exciting world of real data and detective work is being opened up to early learners. The outcome for mathematics and statistics remains to be seen.

A quick look over the Australian curriculum shows ostensibly a similar emphasis with regard to content at most levels.  The big difference (at first perusal) is that the New Zealand curriculum has two strands of statistics – statistical investigation, and statistical literacy, whereas the Australian curriculum has the more mathematical approach of “Data representation and interpretation”.  Both include probability as another strand.

Data Detective Cycle

In the New Zealand curriculum, the statistical investigation strand at every level refers to the “Statistical enquiry cycle”, shown here, which is also known as the PPDAC cycle. This is a unifying theme and organising framework for teachers and learners.

The data detective poster

The data detective poster

This link takes you to a fuller explanation of the statistical enquiry cycle and its role at the different levels of the school curriculum. Note that the levels do not correspond to years. Click here to see the correspondence. The first five levels correspond to about 2 years each, whereas levels 6,7 and 8 correspond to the final three years of high school. So a child working on level 3 is generally aged about 10 or 11.

As I provide resources to support teaching and learning within the NZ curriculum I have become more aware of this framework, and have some questions and suggestions. I have made a table from which I hope to develop another diagram that students at higher levels can engage with, particularly with regard to the reporting aspects. As this is a work in progress you will have to wait!

Origins

Let’s look at the origins of the diagram and terminology. Maxine Pfannkuch (an educator) worked with Chris Wild (a statistician) to articulate what it is that statisticians do. They published their results in the international statistical review in 1999 and contributed the chapter “Towards an understanding of statistical thinking” in “The Challenge of Developing Statistical Literacy, Reasoning and Thinking”, edited by Dani Ben-Zvi and Joan Garfield. The statistical enquiry cycle has consequently been promulgated in the diagram and description referred to above. There is sound research behind this, and it makes good sense as a way of explaining what statisticians do.

Diagrams

I love diagrams. Anyone who has viewed my videos will know this. I spend a great deal of mental energy (usually while running) trying to work out ways to convey ideas in a visual way that will help people to learn, understand and remember. I also do NOT believe in the fad of learning styles, but rather I believe that all learners will gain from different presentations of concepts. I also believe that it is a useful discipline for a teacher to create different ways of expressing concepts. I am rather fussy about diagrams, however, as our Honours students would attest. I have a particular problem with arrows which mean different things in different places. If an arrow denotes passage of time in one instance it should do so in all instances, or a different style of arrow should be employed.

No way in or out

A problem I have with the PPDAC “Cycle” being a cycle is that it seems to imply that we can come in at any point and that there is no escape. If there is a logical starting point, and the link back to it is not one of process, then that should be indicated. Because the arrows are all the same style in the PPDAC diagram, it is also difficult to see a way out of the cycle. As a learner I would find it a little daunting to think that I could never escape! I am also concerned about understanding in what way does a Conclusion lead to a Problem? Surely the whole point of the word “Conclusion” is that it concludes or ends something?

To me there are at least three linkages between the Problem and the Conclusion. First of all, while in the Problem stage, we need to think about what we want to be able to say in the future Conclusion stage.  We may not know which way our conclusion will go, though we will probably have an opinion, or even a hope! (I am too post-modern in my thinking to believe in the objectivity of the researcher.) For instance we may want to be able to say – There is (or is not) evidence that women own more pairs of shoes than men. Another linkage is that when we write up our conclusion we must refer back to the original problem. And the third linkage comes from a comment Jean Thompson made on my blog about teaching time series without many computers. “Often the answer from a good statistical analysis is more questions”.  One conclusion can lead to a new problem.

I found a similar diagram online which is more sequential, starting with the problem and working vertically through the steps, with a link at the end going back to the beginning. I like this, because it does give an idea of conclusion and moving on, rather than being caught in some endless cycle. The reality for students is that they will generally do some project, which will start with a problem and end with a conclusion. Then they will move on to an unrelated project. It has also been my experience as a practitioner.

In my experience the cyclical behaviour which this diagram portrays is generally more within the cycle than over the whole cycle. For instance one may be part way through the data collection and realise that it isn’t going to work, and go back to the “Plan” stage. Some of these extra loops are suggested in my table.

Reporting

For students at a higher level who are required to write reports, it is difficult to see how the report fits in with the cycle. The “Conclusion” step includes “communication”, which could imply a report. However reports often include most of the steps, particularly when their purpose is to satisfy an assessment requirement.

Existing datasets

It is also difficult to apply the cycle in a non-cynical way to work with existing datasets. Often, in the interests of time and quality control, students are given a dataset. In reality they start, not at the Problem step, but somewhere between the Data step and the Analysis step. In their assessments they are required to read around the topic and use their imaginations to come up with the problem, look at how the data was collected, and move on from there.  This is not always the case, but it is for NCEA level 3 Bivariate Investigation, Time Series analysis and Formal Inference areas (called ‘standards’). The only area where they really do plan and collect the data is in the Experimental Design standard. Might it not be helpful to provide an adapted plan that takes into account these exigencies? Let us be explicit about it rather than coyly pretend that the data wasn’t driving everything?

In general I like the concept of the statistical enquiry cycle, and I am happy that it is providing a unifying theme to the curriculum. However, particularly at higher levels, I think it needs a bit of tweaking, taking into account the experience of teachers and learners.  If it is to hold such an important place in a curriculum that is leading the world, it deserves on-going attention.

Disclaimer

This is a blog and not an academic journal. The ideas I have contemplated need a lot more thought and background reading, but I do not have the time or the university salary to support such a luxury right now. Maybe someone else does!

Shibboleth, Mixolydian, Heteroscedasticity – and Kipling

All areas of human endeavour have specific language. Cricket commentators, art critics and wines buff make this very obvious.

Mixolydian

My son, who is blind, autistic and plays the piano like an angel, is studying Jazz, and I’m helping him. You can read more about this in my other blog Never Ordinary. There is a specific language around Jazz, and I’m not talking about ‘scat’. (Hmm just realised the other meaning for that word!) In the Jazz course they use words like Mixolydian, Chromatisism, Quartal Harmony…  I nod and smile. This language expresses ideas clearly and uniquely and is outside my comprehension. (Mixolydian is based on the Major scale, but with a flat 7. – clearer now?)

Trumpetty yellow, Daffodils, Narcissus

This week there was a statistics list discussion about the meaning of the term “multivariate”. As part of the ongoing discussion, someone suggested that using exact terminology exactly avoids a situation such as saying “I have yellow flowers in my garden with trumpetty bits, that come out in spring and have oniony looking bits in the ground.” This can also be said as “I have daffodils in my garden”.  However it can also be said as “I have Narcissus pseudonarcissus  in my garden”. Each of those phrases expresses the same idea, but with differing clarity or exclusiveness depending on the audience.

Hagley Park Daffodils

Shibboleth

Language can be used to exclude, as well as to inform or communicate. The term “shibboleth” comes from the book of Judges. When the Gileadites wished to find out if people crossing the river were Ephraimites, they would ask them to say the word “shibboleth”. If they said it as sibboleth, they killed them. The Old Testament can be a bit like that. The word “shibboleth” is now used to mean a code word, or knowledge that only a certain culture or group will know. Sometimes it can seem that statistical terms are used so only the initiated will be able to understand.

Virtue and Common Touch

As statisticians, operations researchers and teachers of statisticians and operations researchers we have many different opportunities to select the language we use. We must always be aware of our audience. In the poem, “If”, Kipling encourages people to be able to “…talk with crowds and keep your virtue, Or walk with Kings – nor lose the common touch,” Academics “walk with kings” when they write academic papers, using highly specialised and exclusive language. We need to make sure we do not lose the common touch. At the same time we should “keep our virtue”, and use the correct statistical term when the circumstances arise, making sure that we retain the common touch so that all understand.

Heteroscedasticity

When I use the term heteroscedasticity I am usually doing so for one of two reasons. First, that the data in question has non-constant variance, and I am explaining the concept and technical term to a client, student or colleague. Second, because I really like the word. “Heteroscedasticity” is eight syllables of tongue-twisting goodness! But, really, “non-constant variance” says exactly the same thing, has only six syllables and is easier to understand. I suspect a degree of linguistic snobbery appearing.

Communicating Statistics

Greenfield wrote a paper in 1993, which is still disappointingly relevant today. In “Communicating Statistics” (http://greenfieldresearch.co.uk/papers/Communicating%20stats.pdf) he suggests that statisticians have a great deal to offer the world, and that we aren’t doing a good job of making people aware of that. He was damning of the type of language used in academic publications, which ensure that any potentially useful results are obscured by “prolix and pseudo-objective style”.

This flows over into our consulting endeavours, where the aim should be to communicate rather than exclude. Greenfield gives the example fictionalised in this comic:

Depiction of true event.

Depiction of almost true event. Click to view.

Greenfield’s parting provocative statement was to suggest that statisticians produce more cookery-books and more easy-to-use programs, and encourage their use by everybody who can benefit. These books and programs can carry the message that if they want to do better they should study more and seek the guidance of statisticians.

In closing he says “Our audience, our customers are out there. They need us, even if they do not realise it. We must change our culture, our philosophy, our public relations and our use of language to reach them.”

Greenfield Challenge

I’m not sure I want to be telling you about the Greenfield challenge, as I’m thinking of entering it, and would really like a trip to Ankara for the ENBIS conference. But in pursuit of the greater good, I am putting a link here: The Greenfield Challenge. The blurb explains:

“We would like to encourage you to report immediately whenever you’ve had dealings with non-statisticians – in whichever form (face-to-face, in writing, in form of an audio or video recording, in interactive social media … ) or context (interactions with students, educators, managers and employees of organizations in private and public sectors … ).”

Greenfield even suggests “You might even write a short story or a play.”

Still thinking about that one. I guess there is always “The Goal” to look to for an example. In the meantime I’ll stick to this work of mostly non-fiction, interspersed with opinion and anecdote.

Choose our words

When we use very specific technical terms we need to make sure that they are really necessary. Is there a simpler, and just as accurate way of saying the same thing? If our audience is statisticians, then really we can indulge in specific technical language. But if the audience includes students, non-statisticians and the general public, then we should probably use simpler terms, or at least “gloss”, or say what the word means along with its use. (There was an example of glossing right there!)

I have written earlier about the minefield of statistical terminology, particularly when the statistical word also has an everyday meaning which is not quite the same. Examples of this are “significant”, “random” and “relationship”. The post includes some suggestions for teaching statistical language.

But as well as teachers, we are also communicators, and need to get our message across in the best way possible. It is vital to determine our audience, and make sure we bring them along with us.

I contemplate the new New Zealand curriculum with excitement. Through the efforts of a group of statisticians we are able to inculcate a greater understanding of the essentials of statistics from an early age to much of the population. The role of the statisticians is to help the teachers feel at home in the world of statistics, so that they can invite their students along. These are exciting times. The rest of the world is watching.

Statistical Story-telling with time series data

Statistics is about story-telling.

For people who understand them, graphs tell a story. To the initiated, even a p-value, and some summary statistics can help to tell a story. Part of the role of a statistician is to extract the story from the data. The role of a statistics teacher is to enable students first to recognise that there is a story, then to enable them to tell the story through the tools of analysis and communication.

This idea of statistics as story-telling is explained in an award-winning paper byPfannkuch, Regan, Wild and Horton,Telling Data Stories: Essential Dialogues for Comparative Reasoning, which won  the inaugural Journal of Statistics Education Best Paper Award.

Time series data, especially seasonal time series data, yields its story abundantly. For this reason I changed my mind about the teaching of time series analysis at high school. I used to think that it was far too complex for high school students and should be left to higher education. In a way that is true, but if you stick to the basic concepts, it is a contextually rich area of study.

Time series data is full of little hazards, not the least being auto-correlation. We can use moving averages to take out the bumps and exponential smoothing to be more responsive to more recent data. We can deseasonalise and fit a trend line, predict and then put the seasonality back in. There are weighty (in more ways than one) volumes dedicated to time series analysis and the various discoveries and inventions that have helped us draw meaning from the past and forecast the future.

Because of the inherent complexity of time series analysis, I used to think that time series was not an appropriate part of the high school curriculum.

However, if a storytelling approach is used, backed up by appropriate software, then time series is a wonderful introduction to statistics. It is a good example of modelling, it has clear purpose, and the contexts can be fascinating.

Time series analysis is a clear example of the concept of a model, as there are so many different ways that it is possible to model a set of time series data. In contrast, when you teach linear regression with only one possible predictor variable, on data that is nicely behaved, there is generally one sensible model to use. This gives students the idea that you are trying to find “the right model”. This is not the case with time series, as models change, depending on how we choose to define the model.

Another selling-point for time series analysis is that its main function is forecasting. We all want to have crystal balls that can predict the future. The main reason we study a time series is to understand the patterns of data so that we can project into the future, usually for economic reasons. There is no question of “Why are we doing this, Miss?”, as the purpose of the analysis is self-evident.

There are numerous economic time series available from official statistics sites. In New Zealand I went to Infoshare and in the US there is Economagic.  Some of the series are fascinating. (I like the three peaks per year in jewellery sales in the US – December, February and May.)

Analysis can be difficult, and Excel is hideous for time series graphing and deseasonalising. There has been a free front end for R set up, called iNZight, which enables straight-forward time series analysis. One drawback is that it only allows for one model, which I fear perpetuates the “there is one model” mindset.

But the opportunities for storytelling are there. You can talk about trend, seasonality, variation, the relative contribution of each. As teachers and students are exposed to more and more time series graphs, they are better able to tell stories. The graphs of the seasonal shape are rich with story-telling potential.

To support this we have made four videos about time series analysis, and an app, which is still in the pipeline. We hope that these will help develop the confidence of teachers and students to tell stories about time series data. We also have further quizzes and step-by-step guide to writing up a time series analysis. You can get much of this for free from our Free Resources page on StatsLC.com.

For teachers where there is limited access to computer resources, I have an earlier post with some ideas of how to overcome this problem and emphasise the story in time series data: Teaching Time Series with Limited Computer access.

Understanding Time series analysis

Time Series analysis using iNZight:


How to write up a time series report:

and an example of a time series report (aimed at Year 13 students in New Zealand, but a good general framework for report writing.)

Statistics or Calculus? Do both!

This post is prompted by two 17 year old boys, Cam and Thomas, who are about to enter year 13, the final year of High school in New Zealand. They are both academically capable, with highly educated parents. And both boys are struggling with a dilemma – should they  take Calculus or Statistics at school this year. I suspect their maths teachers are pushing for calculus, whereas their parents appreciate the value of statistics.

Let’s take a look at the alternatives and see if we can help. (This makes no pretense of being a balanced view – that’s what comments are for!) Note that this is based on the New Zealand curriculum, which has a recently introduced strong emphasis on statistics. The assessment structure for this includes a full statistics subject in the final year for the first time in 2013. New Zealand is in the somewhat lonely position of leading the world in this area; statistical societies in other countries are watching. (And for you in the Northern Hemisphere who may be feeling confused, it is currently our summer holidays, and school starts back in early February.)

Take Calculus

Calculus is “proper” mathematics. It is elegant, and neat, and you get right answers. You don’t have to write sentences. Ever! Most of the problems are nice and theoretical, so you don’t have to deal with “word problems”. The teachers like Calculus, and fight over who gets to teach it. They feel confident in what they are doing. They have taught it for years and don’t need to do anything new. There are oceans of on-line videos, games and resources to help students. Khan academy videos are useful. But you don’t need to have access to the computer room to do calculus. Parents are more likely to know calculus (though well forgotten) than statistics. Calculus is needed for important subjects such as engineering, physics and… Hmm can’t think what else! Oh yes – more calculus. It is a good mental discipline that helps with problem-solving skills. It can be pretty fun if well taught. Besides people tell me that statistics is the easy option for people who can’t do calculus.

Take Statistics

Statistics relates to life. It is messy and often the answers aren’t clear, so interpretation and thinking are important. You will need to write reports and express yourself on paper. This will help you develop your critical thinking skills and communication skills. You have to understand contextual material such as biology, economics or sport.  Innovative teachers are excited about the changes in the curriculum, and are embracing the new material as an opportunity to learn and develop themselves as well as you.  As New Zealand is leading the world by introducing resampling, randomisation, bootstrapping and time series analysis at high school level, the on-line resources are few, but those extant (and in our pipeline) are focussed for your use.  Parents are not familiar with statistics, but will find what you are doing interesting.  You get to do most of your calculations on the computer, just as real statisticians do.  You will never find yourself asking “Why do we need to learn this?” because it is obvious how it is a part of your life. You will be better able to discern truth from lies on the internet. You will find yourself looking at the world differently.

Statistics is needed for many subjects: psychology, biology, engineering, management, marketing, medicine, sociology, education, geography, geology, law and journalism. It also widens the possibilities in the study of arts subjects such as History and English.

So which should Cam and Thomas take?

Here is our advice – all students who possibly can, should take statistics. Those who are planning to be engineers, physicists, maths teachers or statisticians (yes!) should take calculus as well. Simple really!

What about my own sons – the jazz pianist and the movie maker – what would I have advised them at this point? Statistics all the way. Neither one had use for calculus, nor the aptitude, but both would have benefited from statistics.

I’ll let you know what Cam and Thomas decided.

The Sound of Music meets Linear Programming

“Let’s start at the very beginning – a very good place to start. When you read you begin with A, B,C!” When you do statistics you begin with…probability? the mean? graphs?

Begin at the end

But really, is the beginning a very good place to start? Sometimes, we need to begin at the end. And sometimes we need to go back before the beginning. Always we need to think about where to begin, because it is seldom obvious, and copying what other teachers and textbooks have done is often a bad idea.

Linear programming

Take Linear Programming, the flagship technique of Operations Research. Most text books start with a simple two variable example, one that can be drawn on a Cartesian plane. They begin by defining the decision variables and the objective function. Next they formulate the constraints and explain the non-negativity conditions. Then finally they get around to solving the problem – often through a graphical approach, and applying it to the trivial real-life imaginary example they started with.

Here is a better approach, with Linear programming as the example:

First ensure all the class members have the prerequisite mathematical skills for what you propose to teach. If they are not good at drawing equations on a plane, you will need to teach them again, or use a different approach such as using Excel Solver. If students are not sure which way around > and < signs go, you will need to go over it. If English is their second language you will need to make sure you explain words like constraint, objective and optimum. This won’t hurt the native English speakers either.

Second think about your destination. When children learn to read, they generally know what the outcome is going to be. They will be able to look at words on a page and make sense of them. When you learn to drive, you know the outcome – you will be able to get safely from one place to another behind the wheel of a car. When we learn to bake cakes, we like to have pictures of the finished product so that we can see where we are headed. Yet somehow we try to teach as if it is a voyage of discovery with no vision of the end. Now discovery is good, if it pertains to how we get to or understand a process, but students need to know what they are learning. It also helps to have a purpose. Reading, driving and baking are all purposeful, with a clear outcome. The same should be true of linear programming (or confidence intervals or decision trees or fitted lines or just about anything else we are learning.)

You give the students an illustration of the completed LP model of the problem, preferably complex enough to be realistic. You show them how it can be useful, and give them a chance to explore the model. This is SO much easier now that we have Excel and Solver to look after the solving. Let students find out all about one model and then another and another, before you begin to show how to formulate. When people know what they are trying to produce, the reasoning behind the steps is more obvious.

Linear Regression

The same approach can be applied to teaching Linear Regression analysis. First we need to make sure that students understand what a fitted line on a graph is. Get them to interpret several fitted graphs, and use them to make predictions and write statements about the nature of the relationships modelled. Then show how to make the fitted graphs once they know why they need to.

In last week’s post I talked about histograms. Students should learn to interpret histograms and other graphs before they are required to make their own. Having to read off pie charts should help immunise them against their use.

I was in a computer lab with some students from another first year statistics course, and noticed that the first thing they were taught was how to calculate the mean and standard deviation, including the finite population correction. Was this really the most interesting way to get them introduced to the joys of data analysis and interpretation? Why start with the mean, one of the most difficult concepts in statistics?

Work backwards from the end

There is an interesting technique used for teaching skills to children with special needs. When you teach a blind child to tie shoelaces, you start at the end. You do all but the last part, and let them finish it off. This gives a sense of success and purpose. Then gradually you add the steps backwards, so that they start earlier on in the process. This also means that the part of the skill that is getting the most repetition is the new part, not the part already mastered. The same is true of memorisation. Memorise the last line first, then the last two lines etc. I suspect the same approach may well apply to more abstract skills. Maybe we should teach how to read and critique a statistical analysis, then how to write one, then finally how to do the analysis.

The spiral approach is popular, in which topics are revisited each year and built on.  I would like to incorporate principles of mastery learning along with that. Mastery learning is based on the premise that you must master a skill before moving on to the next one. This is difficult to implement in a classroom, with mixed level of ability, but is more easily enacted with the help of a Learning Management System.

New math had odd beginnings

I was born in the early 1960s and was in the first cohort of children to learn “new math(s)”, devised in the US as a reaction to the humiliation of seeing the Russians put Sputnik into space before them. Even in New Zealand we were not immune to the influence of the Cold War on education!  I loved our bright new textbooks,  which started with Set Theory – even at age 6. Every year the first page of the text book had diagrams of herds of sheep, prides of lions and other sundry collections.  I loved the Venn diagrams and the intersections – even cardinal numbers, but to this day I’m not sure how that connected with mathematics, and learning to add and subtract. And to this day I ask, “What were they thinking?” It appears that set theory is the foundation of all mathematics, and thus these mathematicians decided to start there, baffling teachers and parents alike, who were alienated by these words and symbols.

I have no doubt that the intention was to improve learning, but it seems ill-advised now. I wonder how our attempts will be viewed with the benefits of 40 years of hindsight. These days constructivism is a popular, though not universal, theory and approach to learning. The idea is that we create knowledge through adding new ideas and experiences onto our current knowledge. Sometimes that involves undoing erroneous or primitive knowledge.

Sometimes a good approach is historical – to imitate in the learner (in an accelerated form) the learning process through which mankind has progressed, preferably missing out the stupid bits. (Roman numerals are fun for some children, but pretty pointless once you realise the power of zero). It is certainly worth contemplating as an alternative approach.

This post has touched on ideas regarding the sequencing of a learning/teaching approach. There are many considerations and serious thought needs to go into where we start. Sometimes we need to start at the end.

Teaching experimental design

Teaching Experimental Design – a cross-curricular opportunity

The elements that make up a statistics, operations research or quantitative methods course cover three different dimensions (and more). There are:

  • techniques we wish students to master,
  • concepts we wish students to internalise, and
  • attitudes and emotions we wish the students to adopt.

Techniques, concepts and attitudes interact in how a student learns and perceives the subject. Sadly it is possible (and not uncommon) for students to master techniques, while staying oblivious to many of the concepts, and with an attitude of resignation or even antipathy towards the discipline.

Techniques

Often, and less than ideally, course design begins with techniques. The backbone is a list of tests, graphs and procedures that students need to master in order to pass the course. The course outline includes statements like:

  • Students will be able to calculate a confidence interval for a mean.
  • Students will be able to formulate a linear programming model from data.
  • Students will use Excel to make correct histograms. (Good luck with this one!)

Textbooks are organised around techniques, which usually appear in a given sequence, relying on the authors’ perception of how difficult each technique is. Textbooks within a given field are remarkably similar in the techniques they cover in an introductory course.

Concepts

Concepts are more difficult to articulate. In a first course in statistics we wish students to gain an appreciation of the effects of variation. They need to understand how data from a sample differs from population data. In all of the mathematical decision sciences students struggle to understand the nature of a model. The concept of a mathematical model is far from intuitive, but essential.

Attitudes

You can’t explicitly teach attitudes. “Today class, you are going to learn to love statistics!”. These are absorbed and formed and reformed as part of the learning process, as a result of prior experiences and attitudes. I have written a post on Anxiety, fear and antipathy for maths, stats and OR, which describes the importance of perseverance, relevance, borrowed self-efficacy and love in the teaching of these subjects. Content and problem context choices can go a long way towards improving attitudes. The instructor should know whether his or her class is more interested in the projectories of gummy bears, or the more serious topics of cancer screening and crime prevention. Classes in business schools will use different examples than classes in psychology or forestry. Whatever the context, the data should be real, so that students can really engage with it.

I was both amused and a little saddened at this quote from a very good book, “Succeed – how we can reach our goals”. The author (Heidi Grant Halvorson) has described the outcomes of some interesting experiments regarding motivation. She then says, “At this point, you may be wondering if social psychologists get a particular pleasure out of asking people to do really odd things, like eating Cheerios with chopsticks, or eating raw radishes, or not laughing at Robin Williams. The short answer is yes, we do. It makes up for all those hours spent learning statistics.” Hmmm

Experimental Design

So what does this have to do with experimental design?

I have a little confession. I’ve never taught experimental design. I wish I had. I didn’t know as much then as I do now about teaching statistics, and I also taught business students. That’s my excuse, but I regret it. My reasoning was that businesses usually use observational data, not experimental data. And it’s true, except perhaps in marketing research, and process control and possibly several other areas. Oh.

George Cobb, whom I have quoted in several previous posts, proposed that experimental design is a mechanism by which students may learn important concepts. The technique is experimental design, but taught well, it is a way to convey important concepts in statistics and decision science. The pivotal concept is that of variation. If there were no variation, there would be no need for statistics or experimentation. It would be a sad, boring deterministic world. But variation exists, some of which is explainable, and some of which is natural, some of which is due to sampling and some of which is due to bad sampling or experimental practices. I have a YouTube video that explains these four sources of variation. Because variation exists, experiments need to be designed in such a way that we can uncover as best we can the explainable variation, without confounding it with the other types of variation.

The new New Zealand curriculum for Mathematics and Statistics includes experimental design at levels 2 and 3 of the National Certificate of Educational Achievement. (The last two years of Secondary School). The assessments are internal, and teachers help students set up, execute and analyse small experiments. At level two (implemented this year) the experiments generally involve two groups which are given two treatments, or a treatment and a control. The analysis involves boxplots and informal inference. Some schools used paired samples, but found the type of analysis to be limited as a result.  At level three (to be implemented in 2013) this is taken a step further, but I haven’t been able to work out what this step is from the curriculum documents. I was hoping it might be things like randomised block design, or even Taguchi methods, but I don’t think so.

Subjects for Experimentation

Bearing in mind the number of students, many of whom wish to use other members of the class, there can be issues of time and fatigue.Here are some possibilities. It would be great if other suggestions could be added as comments to this post.

Behavioural

Some teachers are reluctant to use psychological experiments as it can be a bit worrying to use our students as guinea pigs. However, this is probably the easiest option, and provided informed and parental consent is received, it should be acceptable. All sorts have been suggested such as effects of various distractions (and legal stimulants) on task completion. There are possible experiments in Physical Education (Evaluate the effectiveness of a performance enhancing programme). Or in Music – how do people respond to different music?

I’d love to see some experiments done on time taken to solve Rogo puzzles! and what the effect of route length or number choice, or size or age is.

Biology

Anything that involves growing things takes a while and can be fraught. (My own recollection of High School biology is that all my plants died.) But things like water uptake could be possible. Use sticks of celery of different lengths and see how much water they take up in a given time. Germination times or strike rates under different circumstances using cress or mustard?  Talk to the Biology teacher. There are assessment standards in NZ NCEA at levels 2 and 3 which mesh well with the statistics standards.

Technology

Baking. There are various ingredients that could have two or three levels of inclusion – making muffins with and without egg – does it affect the height? Pretty tricky to control, but fun – maybe use uniform amounts of mixture. Talk to the Food tech teacher.

Barbie bungee jumping. How does Barbie’s weight affect how far she falls. By having Barbie with and without a backpack, you get the two treatments. The bungee cords can be made out of rubber bands or elastic.

Things flying through the air from catapaults. This has been shown to work as a teaching example. There are a number of variables to alter, such as the weight of the object, the slope of the launchpad, and the person firing.

Inject statistical ideas in application areas

John Maindonald from ANU made the following comment on a previous post: “I am increasingly attracted to the idea that the place to start injecting statistical ideas is in application areas of the curriculum.  This will however work only if the teaching and learning model changes, in ways that are arguably anyway necessary in order to make effective use of those teachers who have really good and effective mathematics and statistics and computing skills.”

How exciting is that? Teachers from different discipline areas work together! There may well be logistical issues and even problems of “turf”. But wouldn’t it be great for mathematics teachers to help students with experiments and analysis in other areas of the curriculum. The students will gain from the removal of “compartments” in their learning, which will help them to integrate their knowledge. The worth of what they are doing would be obvious.

(Note for teachers in NZ. A quick look through the “assessment matrices” for other subjects uncovered a multitude of possibilities for curricular integration if the logistics and NZQA allow. )

What Mathematics teachers need to know about statistics

My post suggesting that statistics is more vital for efficient citizens than algebra has led to some interesting discussions on Twitter and elsewhere. Currently I am beginning an exciting venture to provide support materials for teachers and students of statistics, starting with New Zealand. These two circumstances have led me to ponder about why maths teachers think that statistics is a subset of mathematics, and what knowledge and attitudes will help them make the transition to teaching statistics as a subject.

An earlier post called for mathematics to leave statistics alone. This post builds on that by providing some ways of thinking that might be helpful to mathematics teachers who have no choice but to teach statistics.

Statistics is not a subset of mathematics

Let me quote a forum post from a teacher of mathematics in New Zealand:

  • “It seems strange to me that Statistics is a small part of Mathematics (which also includes Trigonometry, Algebra, Geometry, Calculus … but for our year 13s and now our year 12s, it’s attained equal parity as one area with all the other branches of maths put together as another area.”

This is very helpful as it lets us see where the writer is coming from. To him, statistics is a subset of mathematics – a small part, and somehow it has managed to push its way to become on equal footing with “mathematics.”

I disagree.

I think we need to take a look at the role of compulsory schooling. It is popular among people who go to university, and even more so among those who never leave (having become academics themselves) to think that the main, if not only role of school is to prepare students for university. If the students somehow have not gained the skills and knowledge that the university lecturers believe are necessary, then the schools have failed – or worse still, the system has failed. Again I disagree.

The vision for the young people of New Zealand is stated in the official curriculum.

“Our vision is for young people:

  • who will be creative, energetic, and enterprising
  • who will seize the opportunities offered by new knowledge and technologies to secure a sustainable social, cultural, economic, and environmental future for our country
  • who will work to create an Aotearoa New Zealand in which Māori and Pākehā recognise each other as full Treaty partners, and in which all cultures are valued for the contributions they bring
  • who, in their school years, will continue to develop the values, knowledge, and competencies that will enable them to live full and satisfying lives
  • who will be confident, connected, actively involved, and lifelong learners.”

It doesn’t actually mention preparing people for university.

My view is that school is about preparing young people for life, while helping them to enjoy the journey.

What teachers need to know about statistics

Statistics for life can be summed up as C, D, E, standing for Chance, Data and Evidence.

Chance

Students need to understand about the variability in their world. Probability is a mathematical way of modelling the inherent uncertainty around us. The mathematical part of probability includes combinatorics and the ability to manipulate tables. You can use Venn diagrams and trees if you like, and tables can be really useful too. The most difficult part for many students is converting the ideas into mathematical terminology and making sense of that.

Bear in mind that perceptions of chance have cultural implications. Some cultures play board games and other games of chance from a young age, and gain an inherent understanding of the nature of uncertainty as provided by dice. However there are other cultures for whom all things are decided by God, and nothing is by chance. There are many philosophical discussions which can be had regarding the nature of uncertainty and variability. The work of Tversky and Kahnemann and others have alerted us to the misconceptions we all have about chance.

An area where the understanding of probabilities and relative risk is vital is that of medical screening. Studies among medical practitioners have shown that many of them cannot correctly estimate the probability of a false positive, or the probability of a true positive, given that the result of a test is positive. This is easily conveyed through contingency tables, which are now part of the NZ curriculum.

Data

When people talk about “statistics”, more often they are talking about data and information than the discipline of statistical analysis. Just about everyone is interested in some area of statistics. Note the obsession of the media for reporting the road toll and comparing with previous years or holiday periods. Sports statistics occupy many people’s thoughts, and can fill in the (long) gaps between the action in a cricket commentary. Weather statistics are vital to farmers, planners, environmentalists. Hospitals are now required to report various statistics. The web is full of statistics. It is difficult to think of an area of life which does not use statistics.The second thing we want to know about a new-born baby is its weight.

Just because data contains numbers does not make it mathematics. There are arithmetic skills, such as adding and dividing, which can be practised using data. But that’s about it when it comes to mathematics and data. These days we have computer packages which can calculate all sorts of summary values, and create graphs for better or worse, so the need for mathematical or numeracy skills is much diminished. What is needed is the ability to communicate ideas using numbers and diagrams; by communication I mean production and interpretation of reports and diagrams.

The area of data also includes the collection of data. This is taught at all levels of the NZ curriculum. Students are taught to think about measurement, both physical and through questionnaires. Eventually students learn to design experiments to explore new ideas. Some might see this as science or biology, social studies or psychology, technology and business. There are even applications in music where students explore people’s music preferences. Data occurs in all subjects, and really the skills of data analysis should be taught in context. But until the current generation of students become the teachers, we may need to rely on the teachers of statistics to provide support. There are wonderful opportunities for collaboration between disciplines, if our compartmentalised school system would allow them.

Evidence

Much data is population data and conclusions can easily be drawn from it. However we also use samples to draw conclusions about populations. Inferential statistics has been developed using theoretical probability distributions to help us use samples to draw conclusions about populations. Unfortunately the most popular form of inference, hypothesis testing, is counter-intuitive at best. Many teachers do not truly understand the application of inferential statistics – and why should they – they may never have performed a real statistical analysis. It is only through repeated application of techniques to multiple contexts that most people can start to feel comfortable and get some understanding of what is happening. The beauty is that today the technology makes it possible for students to perform multiple analyses so that they can learn the specific from the general.

The New Zealand school system has taken the courageous* step to introduce the use of resampling, also known as bootstrapping or randomisation, for the generation of confidence intervals. This is contentious and is causing teachers concern. I will dedicate a whole post to the ideas of resampling and why they may be preferable to more traditional approaches. I empathise with the teachers who are feeling out of their depth, and hope that our materials, along with the excellent ones provided by “Census at School” can be of help.

I have no doubt that educators all over the world are watching to see how this goes before attempting similar moves in their own countries. Yet again New Zealand gets to lead the world. Watch this space!

*In the popular British television show, “Yes Minister”, the public servant, Sir Humphrey, would use the term “courageous” to describe a proposal which was probably right, but also likely to lose votes.

Rounding is about communication

Rounding is more difficult than first appears. It appears straight-forward. To round a number you decide how many decimal places or significant figures you need then you look one digit further to see whether the final digit stays the same or goes up. Presto – there is rounding in a nutshell. Yet my university students struggle with rounding to a surprising degree. I did a Youtube search on rounding for a video to help them, but to no avail.

I wrote a script for such a video. I’m afraid it won’t be appearing any time soon as I now have to work for my living (as opposed to being an academic ;) ) but the exercise was interesting. What I realised is that rounding is about communication. It has nothing to do with mathematics and everything to do with expression.

The problem is one of judgment. There is no black and white answer. Firstly the rule is that you DON’T round during a calculation, and you DO round at the end. Too many students manage to either round early in the calculation and increase error in their calculations, or keep every decimal place, forever! Whatever number of decimal places their calculator gives them, that is what is reported.

Even at honours level we find students reporting with spurious precision: “the savings from this approach will be $45,923.” My colleague (the one who has a thing about titles) likes to ask “Are you sure about that? Could it be $45,922?” She is, of course, not serious, but trying to point out the unnecessary non-zero digits in their estimation.

But then my mathematically-minded colleague points out that there is nothing inherently wrong with saying $45,923. In fact by rounding to $46,000 they are moving away from the central value of their estimate, and making it worse. And mathematically that is true. But rounding isn’t about the mathematics – it is about communication. When we state $46,000 everyone knows we don’t mean exactly $46,000. There is an implied level of variation of about $500 either way. Or it could be $250 either way, because maybe they would round $45,523 to $45,500. It is this horrible greyness that abstract mathematicians have escaped, which pervades real-life subjects like statistics and operations research.

This is where the judgment call comes in. In science there are rules about the number of decimal places used, depending on the number of decimal places in the values or measurements used in the calculation. But in statistics the rules are fuzzier. And when we are dealing with money there are certain unwritten rules. In New Zealand we usually state money values to either two decimal places or none. Even though we have divested ourselves of coinage smaller than 10 cents, most prices are given to the nearest cent. Some restaurants are starting to give prices to one decimal place, but that is the exception.

Here is an example that intrigues me. On our bathroom wall we have a card teaching about CPR. It was obviously originally written in “Imperial land”. The distance of chest compression is given as 2.54cm. Do you think it used to say one inch? And did they really mean exactly one inch? I doubt it. This should have been changed to 2cm or 3cm. Maybe 2.5cm if they really think someone can estimate that accurately when trying to keep their loved-one alive by pressing on their chest.

These days with spreadsheets, the mechanical aspect of rounding is simpler. We teach the students to use the little decimal place button in Excel to do the rounding. That way the number will be correctly rounded. We also try to teach them that the underlying number in the cell remains at a higher level of precision, and that only the appearance has changed.

However, the spreadsheet does not remove the need for the decision about how many decimal places to use. What are we communicating? The role of a confidence interval is to express an estimation, so stating a confidence interval to high precision is laughable. Stating, for example, that we are 95% confident that the mean of the population lies between 22.478 and 35.721 indicates lack of  understanding of the nature of a confidence interval. I would use one decimal place at most and give the confidence interval as (22.5, 35.7). Somehow to use no decimal places seems cavalier, though really it would be more sensible.

Recently I received a report about student progress, given in percentages to three decimal places. To me this undermined the validity of the report, as the person who wrote it clearly did not know about sensible rounding. This made me wonder about the validity of the rest of the report. Rounding is about communication, and this lack of sensible rounding communicated a message I am sure the author did not intend.

Statistics Textbooks suck out all the fun.

Do the textbook writers like the students?

In 1987 George Cobb published a paper evaluating statistics textbooks. I am very grateful for it, as it alerted me to the problems with textbooks, and introduced me to the man himself, whose work I greatly admire. Cobb explains that statistics is an inherently interesting and practical subject, but that many textbooks seem to have missed that, or concealed it from the students.

The discipline of statistics is inherently fascinating, applied and important. So why do so many textbooks make it seem mechanistic and abstract? I have been examining textbooks, and wonder if the writers even like their subject matter, or the students they are supposed to be reaching.

I am particularly interested in textbooks for non-mathematicians. The majority of students of statistics are not mathematicians, and are not planning to take any more statistics than they are required to. These students don’t like mathematics. They feel uneasy about taking the course. They are required to take a statistics course as part of their business, psychology or health sciences major. They aren’t even sure why they need to take the course, and hope to get it over and done with and forget about the experience as soon as possible. A previous post talks about how to help students who are feeling negatively towards the course. A textbook for these students needs to get the tone and content right.

Tone

A friendly, but authoritative tone is important. Some go too far and become corny in their chattiness. It’s nice to be friendly, but it can be a bit tiresome and the examples can be too cute. But most are just too dry – and have too many words. And far too many equations and algorithms. They seemed bent on protectionism rather than empowerment.

Content

Even more important is the choice of content, and I find this fascinating. I wonder what course some textbooks are designed for. A telling chapter is regression. Regression is an important statistical technique. But what do we tell them about regression? Here is how I have recently seen it done. Provide an example of real data taken from the web. Introduce the problem, then let them wait until the end to find out where you are going. Give the mathematical way of expressing a line, using greek letters. Derive the least squares method of line fitting. Calculate the line by hand. Interpret the slope and the intercept. Calculate the coefficient of determination by hand. Interpret it. Define the residuals, and calculate them. Calculate the F-statistic and t-statistics. Interpret them. Then finish off the story you started at the beginning of the chapter (not that anyone cares anymore).

Some of you may be wondering what is wrong with that. Good – it means I am not preaching to the choir.

Students need to see the whole picture from the beginning. If you absolutely MUST do the mathematics, put it at the end of the chapter for the keen students, but don’t do the maths in the body of the text and scare the others. Do not assume the readers know how to interpret a line. Most don’t. Start with some examples that explain the context, show the line, and explain and apply the model equation. Next work through one example thoroughly, using computer output. Explain the different values and talk about what applies to the sample, and what helps us to generalize to the population. Then provide some more examples, making sure many of them are not statistically significant, some have negative slopes, and all are solving a problem using a sufficiently large sample of real data. Then give them a template for writing up a regression, explaining the different parts. Finally, if you must, you can give them the mathematics. This may keep the instructors happy so that they will buy your book.

There are differing views on finding the mean for ordinal data.

Another telling bit of content is a textbook’s approach to ordinal data. In my video about types of data two instructors argue over whether it is permissible to calculate the mean for ordinal data. It ends with them calling each other “nit-picking mathematician” and “sloppy social scientist”. My approach is to take the middle ground. It is not ideal mathematically to calculate a mean for ordinal data, but much of the time people do, so it is best to know why it may cause problems and that there is an issue, rather than pretending that it never happens. Look in the textbook. I would be wary of any text that states categorically that you cannot find the mean for ordinal data.

There is also the issue of the purpose of the text, both its place in the course, and in the lives of the students. Textbooks can take different roles in courses, largely as a function of the confidence and competence of the instructor. A novice instructor, unsure of the material is well-advised to stick closely to the textbook. But an experienced and engaged instructor will find the text less and less important and more a peripheral second opinion and source of homework exercises. The internet and Wikipedia have replaced the textbook as the source of background knowledge. We suspect a textbook is used more as an expensive combination of talisman and doorstop by the students.

“Judge a book by its exercises and you cannot go far wrong,”  said George Cobb. All exercises in statistics should have context. There is no place for fitting a line by hand calculation to a set of five points with no context. Leave that to mathematics courses. Statistics is about context, and all examples need to reflect that. The data should be real data, so that an interesting result is authentic, not just something dreamed up by the instructor. The data should occasionally be dirty even! (but not too early in the course, without warning). And there should be enough data. Don’t perpetuate bad habits by using too few data.

Having said all this, I do wonder what the role of textbooks is in the education of the future. On-line materials, which can be frequently updated, and crowd-sourced explanations such as found on Wikipedia and elsewhere can fill the place of a textbook.

Or there is always our app – AtMyPace: statistics, which uses video and interactive lessons to teach some important concepts. We are now working to bring this to the web so all can use it. And then maybe I should write a textbook. ;)