Statistical software for worried students

Statistical software for worried students: Appearances matter

Let’s be honest. Most students of statistics are taking statistics because they have to. I asked my class of 100 business students who choose to take the quantitative methods course if they did not have to. Two hands went up.

Face it – statistics is necessary but not often embraced.

But actually it is worse than that. For many people statistics is the most dreaded course they are required to take. It can be the barrier to achieving their career goals as a psychologist, marketer or physician. (And it should be required for many other careers, such as journalism, law and sports commentator.)

Choice of software

Consequently, we have worried students in our statistics courses. We want them to succeed, and to do that we need to reduce their worry. One decision that will affect their engagement and success is the choice of computer package. This decision rightly causes consternation to instructors. It is telling that one of the most frequently and consistently accessed posts on this blog is Excel, SPSS, Minitab or R. It has been  viewed 55,000 times in the last five years.

The problem of which package to use is no easier to solve than it was five years ago when I wrote the post. I am helping a tertiary institution to re-develop their on-line course in statistics. This is really fun – applying all the great advice and ideas from ”
Guidelines for Assessment and Instruction in Statistics” or GAISE. They asked for advice on what statistics package to use. And I am torn.

Requirements

Here is what I want from a statistical teaching package:

  • Easy to use
  • Attractive to look at (See “Appearances Matter” below)
  • Helpful output
  • Good instructional materials with videos etc (as this is an online course)
  • Supports good pedagogy

If I’m honest I also want it to have the following characteristics:

  • Guidance for students as to what is sensible
  • Only the tests and options I want them to use in my course – not too many choices
  • An interpretation of the output
  • Data handling capabilities, including missing values
  • A pop up saying “Are you sure you want to make a three dimensional pie-chart?”

Is this too much to ask?

Possibly.

Overlapping objectives

Here is the thing. There are two objectives for introductory statistics courses that partly overlap and partly conflict. We want students to

  • Learn what statistics is all about
  • Learn how to do statistics.

They probably should not conflict, but they require different things from your software. If all we want the students to do is perform the statistical tests, then something like Excel is not a bad choice, as they get to learn Excel as well, which could be handy for c.v. expansion and job-getting. If we are more concerned about learning what statistics is all about, then an exploratory package like Tinkerplots or iNZight could be useful.

Ideally I would like students to learn both what statistics is all about and how to do it. But most of all, I want them to feel happy about doing statistical analysis.

Appearances matter

Eye-appeal is important for overcoming fear. I am confident in mathematics, but a journal article with a page of Greek letters and mathematical symbols, makes me anxious. The Latex font makes me nervous. And an ugly logo puts me off a package. I know it is shallow. But it is a thing, and I suspect I am far from alone. Marketing people know that the choice of colour, word, placement – all sorts of superficial things effect whether a product sells. We need to sell our product, statistics, and to do that, it needs to be attractive. It may well be that the people who design software are less affected by appearance, but they are not the consumers.

Terminal or continuing?

This is important: Most of our students will never do another statistical analysis.

Think about it :

Most of our students will never do another statistical analysis.

Here are the implications: It is important for the students to learn what statistics is about, where it is needed, potential problems and good communication and critique of statistical results. It is not important for students to learn how to program or use a complex package.

Students need to experience statistical analysis, to understand the process. They may also discover the excitement of a new set of data to explore, and the anticipation of an interesting result. These students may decide to study more statistics, at which time they will need to learn to operate a more comprehensive package. They will also be motivated to do so because they have chosen to continue to learn statistics.

Excel

In my previous post I talked about Excel, SPSS, Minitab and R. I used to teach with Excel, and I know many of my past students have been grateful they learned it. But now I know better, and cannot, hand on heart recommend Excel as the main software. Students need to be able to play with the data, to look at various graphs, and get a feel for variation and structure. Excel’s graphing and data-handling capabilities, particularly with regard to missing values, are not helpful. The histograms are disastrous. Excel is useful for teaching students how to do statistics, but not what statistics is all about.

SPSS and Minitab

SPSS was a personal favourite, but it has been a while since I used it. It is fairly expensive, and chances are the students will never use it again. I’m not sure how well it does data exploration. Minitab is another nice little package. Both of these are probably overkill for an introductory statistics course.

R and R Commander

R is a useful and versatile statistical language for higher level statistical analysis and learning but it is not suitable for worried students. It is unattractive.

R Commander is a graphical user interface for R. It is free, and potentially friendlier than R. It comes with a book. I am told it is a helpful introduction to R. R Commander is also unattractive. The book was formatted in Latex. The installation guide looks daunting. That is enough to make me reluctant – and I like statistics!

The screenshot displayed on the front page of R Commander

iNZight and iNZight Lite

I have used iNZight a lot. It was developed at the University of Auckland for use in their statistics course and in New Zealand schools. The full version is free and can be installed on PC and Mac computers, though there may be issues with running it on a Mac. The iNZight lite, web-based version is fine. It is free and works on any platform. I really like how easy it is to generate various plots to explore the data. You put in the data, and the graphs appear almost instantly. IiNZIght encourages engagement with the data, rather than doing things to data.

For a face-to-face course I would choose iNZight Lite. For an online course I would be a little concerned about the level of support material available. The newer version of iNZight, and iNZight lite have benefitted from some graphic design input. I like the colours and the new logo.

Genstat

I’ve heard about Genstat for some time, as an alternative to iNZight for New Zealand schools, particularly as it does bootstrapping. So I requested an inspection copy. It has a friendly vibe. I like the dialog box suggesting the graph you might like try. It lacks the immediacy of iNZight lite. It has the multiple window thing going on, which can be tricky to navigate. I was pleased at the number of sample data sets.

NZGrapher

NZGrapher is popular in New Zealand schools. It was created by a high school teacher in his spare time, and is attractive and lean. It is free, funded by donations and advertisements. You enter a data set, and it creates a wide range of graphs. It does not have the traditional tests that you would want in an introductory statistics course, as it is aimed at the NZ school curriculum requirements.

Statcrunch

Statcrunch is a more attractive, polished package, with a wide range of supporting materials. I think this would give confidence to the students. It is specifically designed for teaching and learning and is almost conversational in approach. I have not had the opportunity to try out Statcrunch. It looks inviting, and was created by Webster West, a respected statistics educator. It is now distributed by Pearson.

Jasp

I recently had my attention drawn to this new package. It is free, well-supported and has a clean, attractive interface. It has a vibe similar to SPSS. I like the immediate response as you begin your analysis. Jasp is free, and I was able to download it easily. It is not as graphical as iNZight, but is more traditional in its approach. For a course emphasising doing statistics, I like the look of this.

Data, controls and output from Jasp

Conclusion

So there you have it. I have mentioned only a few packages, but I hope my musings have got you thinking about what to look for in a package. If I were teaching an introductory statistics course, I would use iNZight Lite, Jasp, and possibly Excel. I would use iNZight Lite for data exploration. I might use Jasp for hypothesis tests, confidence intervals and model fitting. And if possible I would teach Pivot Tables in Excel, and use it for any probability calculations.

Your thoughts

This is a very important topic and I would appreciate input. Have I missed an important contender? What do you look for in a statistical package for an introductory statistics course? As a student, how important is it to you for the software to be attractive?

Advertisements

Teaching a service course in statistics

Teaching a service course in statistics

Most students who enrol in an initial course in statistics at university level do so because they have to. I did some research on attitudes to statistics in my entry level quantitative methods course, and fewer than 1% of the students had chosen to be in that course. This is a little demoralising, if you happen to think that statistics is worthwhile and interesting.

Teaching a service course in statistics is one of the great challenges of teaching. A “Service Course” is a course in statistics for students who are majoring in some other subject, such as Marketing or Medicine or Education. For some students it is a terminating course – they will never have to look at a p-value again (they hope). For some students it is the precursor to further applied statistics such as marketing research or biological research. Having said that, statistics for citizens is important and interesting and engaging if taught that way. And we might encourage some students to carry on.

Yet the teachers and textbook writers seem to do their best to remove the joy. Statistics is a difficult subject to understand. Often the way the instructor thinks is at odds with the way the students think and learn. The mathematical nature of the subject is invested with all sorts of emotional baggage.

Here are some of the challenges of teaching a statistics service course.

Limited mathematical ability

It is important to appreciate how limited the mathematical understanding is of some of the students in service courses. In my first year quantitative methods course, I made sure my students knew basic algebra, including rearranging and solving equations. This was all done within a business context. Even elementary algebra  was quite a stumbling block to some students, for whom algebra had been a bridge too far at school. There were students in a postgrad course I taught who were not sure which was larger, out of 0.05 and 0.1, and talked about crocodiles with regard to greater than and less than signs. And these were schoolteachers! Another senior maths teacher in that group had been teaching the calculation of confidence intervals, without actually understanding what they were.

The students are not like statisticians. Methods that worked to teach statisticians and mathematicians are unlikely to work for them. I wrote about this in my post about the Golden Rule, and how it applies at a higher level for teaching.

I realised a few years ago that I am not a mathematician. I do not have the ability to think in the abstract that is part of a true mathematician. Operations Research was my thing, because I was good at mathematics, but my understanding was concrete. This has been a surprising gift for me as a teacher, as it has meant that I can understand better what the students find difficult. Formulas do not tell them anything. Calculating by hand does not lead to understanding. It is from this philosophy that I approach the production of my videos. I am particularly pleased with my recent video about confidence intervals, which explains the ideas, with nary a formula in sight, but plenty of memorable images.

Software

One of my more constantly accessed posts is  Excel, SPSS, Minitab or R?. This consistent interest indicates that the course of software is a universal problem.  People are very quick to say how evil Excel is, and I am under no illusions as to many of the shortcomings. The main point of my post was, however, that it depends on the class you are teaching.

As I have taught mainly business students, I still hold that for them, Excel is ideal. Not so much for the statistical aspects, but because they learn to use Excel. Last Saturday the ideas for today’s posts were just forming in my mind when the phone rang, and despite my realising it was probably a telemarketer (we have caller ID on our phone) I answered it. It was a nice young woman asking me to take part in a short survey about employment opportunities for women in the Christchurch Rebuild. After I’d answered the questions, explaining that I was redundant from the university because of the earthquakes and that I had taught statistics, she realised that I had taught her. (This is a pretty common occurrence for me in our small town-city – even when I buy sushi I am served by ex-students). So I asked her about her experience in my course, and she related how she would never have taken the course, but enjoyed it and passed. I asked about Excel, and she told me that she had never realised what you could do with Excel before, and now still used it. This is not an isolated incident. When students are taught Excel as a tool, they use it as a tool, and continue to do so after the course has ended.

When business students learn using Excel, it has the appearance of relevance. They are aware that spreadsheets are used in business. It doesn’t seem like time wasted. So I stand by my choice to use Excel. However if I were still teaching at University, I would also be using iNZight. And if I taught higher levels I would continue to use SPSS, and learn more about R.

Textbooks

As I said in a previous post Statistics Textbooks suck out all the fun. Very few textbooks do no harm. I wonder if this site could provide a database of statistics texts and reviews. I would be happy to review textbooks and include them here. My favourite elementary textbook is, sadly, out of print. It is called “Taking the Fear out of Data Analysis”, by the fabulously named Adamantis Diamantopoulos and Bodo Schlegelmilch. It takes a practical approach, and has a warm, nurturing style. It lacks exercises. I have used extracts from it over the years. The choice of textbook, like the choice of software, is “horses for courses”, but I think there are some horses that should not be put anywhere near a course. I do wonder how many students use textbooks as anything other than a combination lucky charm and paper weight.

In comparison with the plethora of college texts of varying value, at high-school level the pickings for textbooks are thin. This probably reflects the newness of the teaching of statistics at high-school level.  A major problem with textbooks is that they are so quickly out of date, and at school level it is not practical to replace class sets too often.

Perhaps the answer is online resources, which can be updated as needed, and are flexible and give immediate feedback.  😉

Emotional baggage

I was less than gentle with a new acquaintance in the weekend. When asked about my business, I told him that I make on-line materials to help people teach and learn statistics. He proceeded to relate a story of a misplaced use of a percentage as a reason why he never takes any notice of statistics. I have tired of the “Lies, damned lies, and statistics” jibe and decided not to take it lying down. I explained that the world is a better place because of statistical analysis. Much research, including medical would not be possible in the absence of methods for statistical analysis. An understanding of the concepts of statistics is a vital part of intelligent citizenship, especially in these days of big and ubiquitous data.

I stopped at that point, but have pondered since. What is it that makes people so quick to denigrate the worth of statistics? I suspect it is ignorance and fear. They make themselves feel better about their inadequacies by devaluing the things they lack. Just a thought.

This is not an isolated instance. In fact I was so surprised when a lighthouse keeper said that statistics sounded interesting and wanted to know more, that I didn’t really know what to say next! You can read about that in a previous post. Statistics is an interesting subject – really!

But the students in a service course in statistics may well be in the rather large subset of humanity who have yet to appreciate the worth of the subject. They may even have fear and antipathy towards the subject, as I wrote about previously. Anxiety, fear and antipathy for maths, stats and OR.

People are less likely to learn if they have negative attitudes towards the subject. And when they do learn it may well be “learning to pass” rather than actual learning which is internalised.

So what?

Keep the faith! Statistics is an important subject. Keep trying new things. If you never have a bad moment in your teaching, you are not trying enough new things. And when you hear from someone whose life was changed because of your teaching, there is nothing like it!

Confidence Intervals: informal, traditional, bootstrap

Confidence Intervals

Confidence intervals are needed because there is variation in the world. Nearly all natural, human or technological processes result in outputs which vary to a greater or lesser extent. Examples of this are people’s heights, students’ scores in a well written test and weights of loaves of bread. Sometimes our inability or lack of desire to measure something down to the last microgram will leave us thinking that there is no variation, but it is there. For example we would check the weights of chocolate bars to the nearest gram, and may well find that there is no variation. However if we were to weigh them to the nearest milligram, there would be variation. Drug doses have a much smaller range of variation, but it is there all the same.

You can see a video about some of the main sources of variation – natural, explainable, sampling and due to bias.

When we wish to find out about a phenomenon, the ideal would be to measure all instances. For example we can find out the heights of all students in one class at a given time. However it is impossible to find out the heights of all people in the world at a given time. It is even impossible to know how many people there are in the world at a given time. Whenever it is impossible or too expensive or too destructive or dangerous to measure all instances in a population, we need to take a sample. Ideally we will take a sample that gives each object in the population an equal likelihood of being chosen.

You can see a video here about ways of taking a sample.

When we take a sample there will always be error. It is called sampling error. We may, by chance, get exactly the same value for our sample statistic as the “true” value that exists in the population. However, even if we do, we won’t know that we have.

The sample mean is the best estimate for the population mean, but we need to say how well it is estimating the population mean. For example, say we wish to know the mean (or average) weight of apples in an orchard. We take a sample and find that the mean weight of the apples in the sample  is 153g. If we only took a few apples, it is only a rough idea and we might say we are pretty sure the mean weight of the apples in the orchard is between 143g and 163g. If someone else took a bigger sample, they might be able to say that they are pretty sure that the mean weight of apples in the orchard is between 158g and 166g. You can tell that the second confidence interval is giving us better information as the range of the confidence interval is smaller.

There are two things that affect the width of a confidence interval. The first is the sample size. If we take a really large sample we are getting a lot more information about the population, so our confidence interval will be more exact, or smaller. It is not a one-to-one relationship, but a square-root relationship.  If we wish to reduce the confidence interval by a factor of two, we will need to increase our sample size by a factor of 4.

The second thing to affect the width of a confidence interval is the amount of variation in the population. If all the apples in the orchard are about the same weight, then we will be able to estimate that weight quite accurately. However, if the apples are all different sizes, then it will be harder to be sure that the sample represents the population, and we will have a larger confidence interval as a result.

Three ways to find confidence intervals

Traditional (old-fashioned?) Approach

The standard way of calculating confidence intervals is by using formulas developed on the assumptions of normality and the Central Limit Theorem. These formulas are used to calculate the confidence intervals of means, proportions and slopes, but not for medians or standard deviations. That is because there aren’t nice straight-forward formulas for these. The formulas were developed when there were no computers, and analytical methods were needed in the absence of computational power.

In terms of teaching, these formulas are straight-forward, and also include the concept of level of confidence, which is part of the paradigm. You can see a video teaching the traditional approach to confidence intervals, using Excel to calculate the confidence interval for a mean.

Rule of Thumb

In the New Zealand curriculum at year 12, students are introduced to the concept of inference using an informal method for calculating a confidence interval. The formula is median +/-  1.5 times the interquartile range divided by the square-root of the sample size. There is a similar formula for proportions.

Bootstrapping

Bootstrapping is a very versatile way to find a confidence interval. It has three strengths:

  1. It can be used to calculate the confidence interval for a large range of different parameters.
  2. It uses ALL the information the sample gives us, rather than the summary values
  3. It has been found to aid in understanding the concepts of inference better than the traditional methods.

There are also some disadvantages

  1. Old fogeys don’t like it. (Just kidding) What I mean is that teachers who have always taught using the traditional approach find it difficult to trust what seems like a hit-and-miss method without the familiar theoretical underpinning.
  2. Universities don’t teach bootstrapping as much as the traditional methods.
  3. The common software packages do not include bootstrap confidence intervals.

The idea behind a bootstrap confidence interval is that we make use of the whole sample to represent the population. We take lots and lots of samples of the same size from the original sample. Obviously we need to sample with replacement, or the samples would all be identical. Then we use these repeated samples to get an idea of the distribution of the estimates of the population parameter. We chop the tails off at a given point, and we give the confidence interval.  Voila!

Answers to the disadvantages (burn the straw man?)

  1. There is a sound theoretical underpinning for bootstrap confidence intervals. A good place to start is a previous blog about George Cobb’s work. Either that or – “Trust me, I’m a Doctor!” (This would also include trusting far more knowledgeable people such as Chris Wild and Maxine Pfannkuch, and the team of statistical educators led by Joan Garfield.
  2. We have to start somewhere. Bootstrap methods aren’t used at universities because of inertia. As an academic of twenty years I can say that there is NO PAY OFF for teaching new stuff. It takes up valuable research time and you don’t get promoted, and sometimes you even get made redundant. If students understand what confidence intervals are, and the concept of inference, then learning to use the traditional formulas is trivial. Eventually the universities will shift. I am aware that the University of Auckland now teaches the bootstrap approach.
  3. There are ways to deal with the software package problem. There is a free software interface called “iNZight” that you can download. I believe Fathom also uses bootstrapping. There may be other software. Please let me know of any and I will add them to this post.

In Summary

Confidence intervals involve the concepts of variation, sampling and inference. They are a great way to teach these really important concepts, and to help students be critical of single value estimates. They can be taught informally, traditionally or using bootstrapping methods. Any of the approaches can lead to rote use of formula or algorithm and it is up to teachers to aim for understanding. I’m working on a set of videos around this topic. Watch this space.

Statistical Story-telling with time series data

Statistics is about story-telling.

For people who understand them, graphs tell a story. To the initiated, even a p-value, and some summary statistics can help to tell a story. Part of the role of a statistician is to extract the story from the data. The role of a statistics teacher is to enable students first to recognise that there is a story, then to enable them to tell the story through the tools of analysis and communication.

This idea of statistics as story-telling is explained in an award-winning paper byPfannkuch, Regan, Wild and Horton,Telling Data Stories: Essential Dialogues for Comparative Reasoning, which won  the inaugural Journal of Statistics Education Best Paper Award.

Time series data, especially seasonal time series data, yields its story abundantly. For this reason I changed my mind about the teaching of time series analysis at high school. I used to think that it was far too complex for high school students and should be left to higher education. In a way that is true, but if you stick to the basic concepts, it is a contextually rich area of study.

Time series data is full of little hazards, not the least being auto-correlation. We can use moving averages to take out the bumps and exponential smoothing to be more responsive to more recent data. We can deseasonalise and fit a trend line, predict and then put the seasonality back in. There are weighty (in more ways than one) volumes dedicated to time series analysis and the various discoveries and inventions that have helped us draw meaning from the past and forecast the future.

Because of the inherent complexity of time series analysis, I used to think that time series was not an appropriate part of the high school curriculum.

However, if a storytelling approach is used, backed up by appropriate software, then time series is a wonderful introduction to statistics. It is a good example of modelling, it has clear purpose, and the contexts can be fascinating.

Time series analysis is a clear example of the concept of a model, as there are so many different ways that it is possible to model a set of time series data. In contrast, when you teach linear regression with only one possible predictor variable, on data that is nicely behaved, there is generally one sensible model to use. This gives students the idea that you are trying to find “the right model”. This is not the case with time series, as models change, depending on how we choose to define the model.

Another selling-point for time series analysis is that its main function is forecasting. We all want to have crystal balls that can predict the future. The main reason we study a time series is to understand the patterns of data so that we can project into the future, usually for economic reasons. There is no question of “Why are we doing this, Miss?”, as the purpose of the analysis is self-evident.

There are numerous economic time series available from official statistics sites. In New Zealand I went to Infoshare and in the US there is Economagic.  Some of the series are fascinating. (I like the three peaks per year in jewellery sales in the US – December, February and May.)

Analysis can be difficult, and Excel is hideous for time series graphing and deseasonalising. There has been a free front end for R set up, called iNZight, which enables straight-forward time series analysis. One drawback is that it only allows for one model, which I fear perpetuates the “there is one model” mindset.

But the opportunities for storytelling are there. You can talk about trend, seasonality, variation, the relative contribution of each. As teachers and students are exposed to more and more time series graphs, they are better able to tell stories. The graphs of the seasonal shape are rich with story-telling potential.

To support this we have made four videos about time series analysis, and an app, which is still in the pipeline. We hope that these will help develop the confidence of teachers and students to tell stories about time series data. We also have further quizzes and step-by-step guide to writing up a time series analysis.

For teachers where there is limited access to computer resources, I have an earlier post with some ideas of how to overcome this problem and emphasise the story in time series data: Teaching Time Series with Limited Computer access.

Excel, SPSS, Minitab or R?

I often hear this question: Should I use Excel to teach my class? Or should I use R? Which package is the best?

Update in April 2018: I have written a further post, covering other aspects and other packages.

It depends on the class

The short answer is: It depends on your class. You have to ask yourself, what are the attitudes, skills and knowledge that you wish the students to gain in the course. What is it that you want them to feel and do and understand?

If the students are never likely to do any more statistics, what matters most is that they understand the elementary ideas, feel happy about what they have done, and recognise the power of statistical analysis, so they can later employ a statistician.

If the students are strong in programming, such as engineering or computer science students, then they are less likely to find the programming a barrier, and will want to explore the versatility of the package.

If they are research students and need to take the course as part of a research methods paper, then they should be taught on the package they are most likely to use in their research.

Over the years I have taught statistics using Excel, Minitab and SPSS. These days I am preparing materials for courses using iNZight, which is a specifically designed user interface with an R engine. I have dabbled in R, but never had students who are suitable to be taught using R.

Here are my pros and cons for each of these, and when are they most suitable.

Excel

I have already written somewhat about the good and bad aspects of Excel, and the evils of Excel histograms. There are many problems with statistical analysis with Excel. I am told there are parts of the analysis toolpak which are wrong, though I’ve never found them myself. There is no straight-forward way to do a hypothesis test for a mean. The data-handling capabilities of the spreadsheet are fantastic, but the toolpak cannot even deal well with missing values. The output is idiosyncratic, and not at all intuitive. There are programming quirks which should have been eliminated many years ago. For example when you click on a radio button to say where you wish the output to go, the entry box for the data is activated, rather than the one for the output. It requires elementary Visual Basic to correct this, but has never happened. Each time Excel upgrades I look for this small fix, and have repeatedly been disappointed.

So, given these shortcomings, why would you use Excel? Because it is there, because you are helping students gain other skills in spreadsheeting at the same time, because it is less daunting to use a familiar interface. These reasons may not apply to all students. Excel is the best package for first year business students for so many reasons.

PivotTables in Excel are nasty to get your head around, but once you do, they are fantastic. I resisted teaching PivotTables for some years, but I was wrong. They may well be one of the most useful things I have ever taught at university. I made my students create comparative bar charts on Excel, using Pivot-Tables. One day Helen and I will make a video about PivotTables.

Minitab

Minitab is a lovely little package, and has very nice output. Its roots as a teaching package are obvious from the user-friendly presentation of results. It has been some years since I taught with Minitab. The main reason for this is that the students are unlikely ever to have access to Minitab again, and there is a lot of extra learning required in order to make it run.

SPSS

Most of my teaching at second year undergraduate and MBA and Masters of Education level has been with SPSS. Much of the analysis for my PhD research was done on SPSS. It’s a useful package, with its own peculiarities. I really like the data-handling in terms of excluding data, transforming variables and dealing with missing values. It has a much larger suite of analysis tools, including factor analysis, discriminant analysis, clustering and multi-dimensional scaling, which I taught to second year business students and research students.  SPSS shows its origins as a suite of barely related packages, in the way it does things differently between different areas. But it’s pretty good really.

R

R is what you expect from a command-line open-source program. It is extremely versatile, and pretty daunting for an arts or business major. I can see that R is brilliant for second-level and up in statistics, preferably for students who have already mastered similar packages/languages like MatLab or Maple. It is probably also a good introduction to high-level programming for Operations Research students.

iNZight

This brings us to iNZight, which is a suite of routines using R, set in a semi-friendly user interface. It was specifically written to support the innovative New Zealand school curriculum in statistics, and has a strong emphasis on visual representation of data and results. It includes alternatives that use bootstrapping as well as traditional hypothesis testing. The time series package allows only one kind of seasonal model. I like iNZight. If I were teaching at university still, I would think very hard about using it. I certainly would use it for Time Series analysis at first year level. For high school teachers in New Zealand, there is nothing to beat it.

It has some issues. The interface is clunky and takes a long time to unzip if you have a dodgy computer (as I do). The graphics are unattractive. Sorry guys, I HATE the eyeball, and the colours don’t do it for me either. I think they need to employ a professional designer. SOON! The data has to be just right before the interface will accept it. It is a little bit buggy in a non-disastrous sort of way. It can have dimensionality/rounding issues. (I got a zero slope coefficient for a linear regression with an r of 0.07 the other day.)

But – iNZight does exactly what you want it to do, with lots of great graphics and routines to help with understanding. It is FREE. It isn’t crowded with all the extras that you don’t really need. It covers all of the New Zealand statistics curriculum, so the students need only to learn one interface.

There are other packages such as Genstat, Fathom and TinkerPlots, aimed at different purposes. My university did not have any of these, so I didn’t learn them. They may well be fantastic, but I haven’t the time to do a critique just now. Feel free to add one as a comment below!