Statistical software for worried students

Statistical software for worried students: Appearances matter

Let’s be honest. Most students of statistics are taking statistics because they have to. I asked my class of 100 business students who choose to take the quantitative methods course if they did not have to. Two hands went up.

Face it – statistics is necessary but not often embraced.

But actually it is worse than that. For many people statistics is the most dreaded course they are required to take. It can be the barrier to achieving their career goals as a psychologist, marketer or physician. (And it should be required for many other careers, such as journalism, law and sports commentator.)

Choice of software

Consequently, we have worried students in our statistics courses. We want them to succeed, and to do that we need to reduce their worry. One decision that will affect their engagement and success is the choice of computer package. This decision rightly causes consternation to instructors. It is telling that one of the most frequently and consistently accessed posts on this blog is Excel, SPSS, Minitab or R. It has been  viewed 55,000 times in the last five years.

The problem of which package to use is no easier to solve than it was five years ago when I wrote the post. I am helping a tertiary institution to re-develop their on-line course in statistics. This is really fun – applying all the great advice and ideas from ”
Guidelines for Assessment and Instruction in Statistics” or GAISE. They asked for advice on what statistics package to use. And I am torn.

Requirements

Here is what I want from a statistical teaching package:

  • Easy to use
  • Attractive to look at (See “Appearances Matter” below)
  • Helpful output
  • Good instructional materials with videos etc (as this is an online course)
  • Supports good pedagogy

If I’m honest I also want it to have the following characteristics:

  • Guidance for students as to what is sensible
  • Only the tests and options I want them to use in my course – not too many choices
  • An interpretation of the output
  • Data handling capabilities, including missing values
  • A pop up saying “Are you sure you want to make a three dimensional pie-chart?”

Is this too much to ask?

Possibly.

Overlapping objectives

Here is the thing. There are two objectives for introductory statistics courses that partly overlap and partly conflict. We want students to

  • Learn what statistics is all about
  • Learn how to do statistics.

They probably should not conflict, but they require different things from your software. If all we want the students to do is perform the statistical tests, then something like Excel is not a bad choice, as they get to learn Excel as well, which could be handy for c.v. expansion and job-getting. If we are more concerned about learning what statistics is all about, then an exploratory package like Tinkerplots or iNZight could be useful.

Ideally I would like students to learn both what statistics is all about and how to do it. But most of all, I want them to feel happy about doing statistical analysis.

Appearances matter

Eye-appeal is important for overcoming fear. I am confident in mathematics, but a journal article with a page of Greek letters and mathematical symbols, makes me anxious. The Latex font makes me nervous. And an ugly logo puts me off a package. I know it is shallow. But it is a thing, and I suspect I am far from alone. Marketing people know that the choice of colour, word, placement – all sorts of superficial things effect whether a product sells. We need to sell our product, statistics, and to do that, it needs to be attractive. It may well be that the people who design software are less affected by appearance, but they are not the consumers.

Terminal or continuing?

This is important: Most of our students will never do another statistical analysis.

Think about it :

Most of our students will never do another statistical analysis.

Here are the implications: It is important for the students to learn what statistics is about, where it is needed, potential problems and good communication and critique of statistical results. It is not important for students to learn how to program or use a complex package.

Students need to experience statistical analysis, to understand the process. They may also discover the excitement of a new set of data to explore, and the anticipation of an interesting result. These students may decide to study more statistics, at which time they will need to learn to operate a more comprehensive package. They will also be motivated to do so because they have chosen to continue to learn statistics.

Excel

In my previous post I talked about Excel, SPSS, Minitab and R. I used to teach with Excel, and I know many of my past students have been grateful they learned it. But now I know better, and cannot, hand on heart recommend Excel as the main software. Students need to be able to play with the data, to look at various graphs, and get a feel for variation and structure. Excel’s graphing and data-handling capabilities, particularly with regard to missing values, are not helpful. The histograms are disastrous. Excel is useful for teaching students how to do statistics, but not what statistics is all about.

SPSS and Minitab

SPSS was a personal favourite, but it has been a while since I used it. It is fairly expensive, and chances are the students will never use it again. I’m not sure how well it does data exploration. Minitab is another nice little package. Both of these are probably overkill for an introductory statistics course.

R and R Commander

R is a useful and versatile statistical language for higher level statistical analysis and learning but it is not suitable for worried students. It is unattractive.

R Commander is a graphical user interface for R. It is free, and potentially friendlier than R. It comes with a book. I am told it is a helpful introduction to R. R Commander is also unattractive. The book was formatted in Latex. The installation guide looks daunting. That is enough to make me reluctant – and I like statistics!

The screenshot displayed on the front page of R Commander

iNZight and iNZight Lite

I have used iNZight a lot. It was developed at the University of Auckland for use in their statistics course and in New Zealand schools. The full version is free and can be installed on PC and Mac computers, though there may be issues with running it on a Mac. The iNZight lite, web-based version is fine. It is free and works on any platform. I really like how easy it is to generate various plots to explore the data. You put in the data, and the graphs appear almost instantly. IiNZIght encourages engagement with the data, rather than doing things to data.

For a face-to-face course I would choose iNZight Lite. For an online course I would be a little concerned about the level of support material available. The newer version of iNZight, and iNZight lite have benefitted from some graphic design input. I like the colours and the new logo.

Genstat

I’ve heard about Genstat for some time, as an alternative to iNZight for New Zealand schools, particularly as it does bootstrapping. So I requested an inspection copy. It has a friendly vibe. I like the dialog box suggesting the graph you might like try. It lacks the immediacy of iNZight lite. It has the multiple window thing going on, which can be tricky to navigate. I was pleased at the number of sample data sets.

NZGrapher

NZGrapher is popular in New Zealand schools. It was created by a high school teacher in his spare time, and is attractive and lean. It is free, funded by donations and advertisements. You enter a data set, and it creates a wide range of graphs. It does not have the traditional tests that you would want in an introductory statistics course, as it is aimed at the NZ school curriculum requirements.

Statcrunch

Statcrunch is a more attractive, polished package, with a wide range of supporting materials. I think this would give confidence to the students. It is specifically designed for teaching and learning and is almost conversational in approach. I have not had the opportunity to try out Statcrunch. It looks inviting, and was created by Webster West, a respected statistics educator. It is now distributed by Pearson.

Jasp

I recently had my attention drawn to this new package. It is free, well-supported and has a clean, attractive interface. It has a vibe similar to SPSS. I like the immediate response as you begin your analysis. Jasp is free, and I was able to download it easily. It is not as graphical as iNZight, but is more traditional in its approach. For a course emphasising doing statistics, I like the look of this.

Data, controls and output from Jasp

Conclusion

So there you have it. I have mentioned only a few packages, but I hope my musings have got you thinking about what to look for in a package. If I were teaching an introductory statistics course, I would use iNZight Lite, Jasp, and possibly Excel. I would use iNZight Lite for data exploration. I might use Jasp for hypothesis tests, confidence intervals and model fitting. And if possible I would teach Pivot Tables in Excel, and use it for any probability calculations.

Your thoughts

This is a very important topic and I would appreciate input. Have I missed an important contender? What do you look for in a statistical package for an introductory statistics course? As a student, how important is it to you for the software to be attractive?

Advertisements

Excel, SPSS, Minitab or R?

I often hear this question: Should I use Excel to teach my class? Or should I use R? Which package is the best?

Update in April 2018: I have written a further post, covering other aspects and other packages.

It depends on the class

The short answer is: It depends on your class. You have to ask yourself, what are the attitudes, skills and knowledge that you wish the students to gain in the course. What is it that you want them to feel and do and understand?

If the students are never likely to do any more statistics, what matters most is that they understand the elementary ideas, feel happy about what they have done, and recognise the power of statistical analysis, so they can later employ a statistician.

If the students are strong in programming, such as engineering or computer science students, then they are less likely to find the programming a barrier, and will want to explore the versatility of the package.

If they are research students and need to take the course as part of a research methods paper, then they should be taught on the package they are most likely to use in their research.

Over the years I have taught statistics using Excel, Minitab and SPSS. These days I am preparing materials for courses using iNZight, which is a specifically designed user interface with an R engine. I have dabbled in R, but never had students who are suitable to be taught using R.

Here are my pros and cons for each of these, and when are they most suitable.

Excel

I have already written somewhat about the good and bad aspects of Excel, and the evils of Excel histograms. There are many problems with statistical analysis with Excel. I am told there are parts of the analysis toolpak which are wrong, though I’ve never found them myself. There is no straight-forward way to do a hypothesis test for a mean. The data-handling capabilities of the spreadsheet are fantastic, but the toolpak cannot even deal well with missing values. The output is idiosyncratic, and not at all intuitive. There are programming quirks which should have been eliminated many years ago. For example when you click on a radio button to say where you wish the output to go, the entry box for the data is activated, rather than the one for the output. It requires elementary Visual Basic to correct this, but has never happened. Each time Excel upgrades I look for this small fix, and have repeatedly been disappointed.

So, given these shortcomings, why would you use Excel? Because it is there, because you are helping students gain other skills in spreadsheeting at the same time, because it is less daunting to use a familiar interface. These reasons may not apply to all students. Excel is the best package for first year business students for so many reasons.

PivotTables in Excel are nasty to get your head around, but once you do, they are fantastic. I resisted teaching PivotTables for some years, but I was wrong. They may well be one of the most useful things I have ever taught at university. I made my students create comparative bar charts on Excel, using Pivot-Tables. One day Helen and I will make a video about PivotTables.

Minitab

Minitab is a lovely little package, and has very nice output. Its roots as a teaching package are obvious from the user-friendly presentation of results. It has been some years since I taught with Minitab. The main reason for this is that the students are unlikely ever to have access to Minitab again, and there is a lot of extra learning required in order to make it run.

SPSS

Most of my teaching at second year undergraduate and MBA and Masters of Education level has been with SPSS. Much of the analysis for my PhD research was done on SPSS. It’s a useful package, with its own peculiarities. I really like the data-handling in terms of excluding data, transforming variables and dealing with missing values. It has a much larger suite of analysis tools, including factor analysis, discriminant analysis, clustering and multi-dimensional scaling, which I taught to second year business students and research students.  SPSS shows its origins as a suite of barely related packages, in the way it does things differently between different areas. But it’s pretty good really.

R

R is what you expect from a command-line open-source program. It is extremely versatile, and pretty daunting for an arts or business major. I can see that R is brilliant for second-level and up in statistics, preferably for students who have already mastered similar packages/languages like MatLab or Maple. It is probably also a good introduction to high-level programming for Operations Research students.

iNZight

This brings us to iNZight, which is a suite of routines using R, set in a semi-friendly user interface. It was specifically written to support the innovative New Zealand school curriculum in statistics, and has a strong emphasis on visual representation of data and results. It includes alternatives that use bootstrapping as well as traditional hypothesis testing. The time series package allows only one kind of seasonal model. I like iNZight. If I were teaching at university still, I would think very hard about using it. I certainly would use it for Time Series analysis at first year level. For high school teachers in New Zealand, there is nothing to beat it.

It has some issues. The interface is clunky and takes a long time to unzip if you have a dodgy computer (as I do). The graphics are unattractive. Sorry guys, I HATE the eyeball, and the colours don’t do it for me either. I think they need to employ a professional designer. SOON! The data has to be just right before the interface will accept it. It is a little bit buggy in a non-disastrous sort of way. It can have dimensionality/rounding issues. (I got a zero slope coefficient for a linear regression with an r of 0.07 the other day.)

But – iNZight does exactly what you want it to do, with lots of great graphics and routines to help with understanding. It is FREE. It isn’t crowded with all the extras that you don’t really need. It covers all of the New Zealand statistics curriculum, so the students need only to learn one interface.

There are other packages such as Genstat, Fathom and TinkerPlots, aimed at different purposes. My university did not have any of these, so I didn’t learn them. They may well be fantastic, but I haven’t the time to do a critique just now. Feel free to add one as a comment below!