I often hear this question: Should I use Excel to teach my class? Or should I use R? Which package is the best?

# It depends on the class

The short answer is: It depends on your class. You have to ask yourself, what are the attitudes, skills and knowledge that you wish the students to gain in the course. What is it that you want them to feel and do and understand?

If the students are never likely to do any more statistics, what matters most is that they understand the elementary ideas, feel happy about what they have done, and recognise the power of statistical analysis, so they can later employ a statistician.

If the students are strong in programming, such as engineering or computer science students, then they are less likely to find the programming a barrier, and will want to explore the versatility of the package.

If they are research students and need to take the course as part of a research methods paper, then they should be taught on the package they are most likely to use in their research.

Over the years I have taught statistics using Excel, Minitab and SPSS. These days I am preparing materials for courses using iNZight, which is a specifically designed user interface with an R engine. I have dabbled in R, but never had students who are suitable to be taught using R.

Here are my pros and cons for each of these, and when are they most suitable.

# Excel

I have already written somewhat about the good and bad aspects of Excel, and the evils of Excel histograms. There are many problems with statistical analysis with Excel. I am told there are parts of the analysis toolpak which are wrong, though I’ve never found them myself. There is no straight-forward way to do a hypothesis test for a mean. The data-handling capabilities of the spreadsheet are fantastic, but the toolpak cannot even deal well with missing values. The output is idiosyncratic, and not at all intuitive. There are programming quirks which should have been eliminated many years ago. For example when you click on a radio button to say where you wish the output to go, the entry box for the data is activated, rather than the one for the output. It requires elementary Visual Basic to correct this, but has never happened. Each time Excel upgrades I look for this small fix, and have repeatedly been disappointed.

So, given these shortcomings, why would you use Excel? Because it is there, because you are helping students gain other skills in spreadsheeting at the same time, because it is less daunting to use a familiar interface. These reasons may not apply to all students. Excel is the best package for first year business students for so many reasons.

PivotTables in Excel are nasty to get your head around, but once you do, they are fantastic. I resisted teaching PivotTables for some years, but I was wrong. They may well be one of the most useful things I have ever taught at university. I made my students create comparative bar charts on Excel, using Pivot-Tables. One day Helen and I will make a video about PivotTables.

# Minitab

Minitab is a lovely little package, and has very nice output. Its roots as a teaching package are obvious from the user-friendly presentation of results. It has been some years since I taught with Minitab. The main reason for this is that the students are unlikely ever to have access to Minitab again, and there is a lot of extra learning required in order to make it run.

# SPSS

Most of my teaching at second year undergraduate and MBA and Masters of Education level has been with SPSS. Much of the analysis for my PhD research was done on SPSS. It’s a useful package, with its own peculiarities. I really like the data-handling in terms of excluding data, transforming variables and dealing with missing values. It has a much larger suite of analysis tools, including factor analysis, discriminant analysis, clustering and multi-dimensional scaling, which I taught to second year business students and research students. SPSS shows its origins as a suite of barely related packages, in the way it does things differently between different areas. But it’s pretty good really.

# R

R is what you expect from a command-line open-source program. It is extremely versatile, and pretty daunting for an arts or business major. I can see that R is brilliant for second-level and up in statistics, preferably for students who have already mastered similar packages/languages like MatLab or Maple. It is probably also a good introduction to high-level programming for Operations Research students.

# iNZight

This brings us to iNZight, which is a suite of routines using R, set in a semi-friendly user interface. It was specifically written to support the innovative New Zealand school curriculum in statistics, and has a strong emphasis on visual representation of data and results. It includes alternatives that use bootstrapping as well as traditional hypothesis testing. The time series package allows only one kind of seasonal model. I like iNZight. If I were teaching at university still, I would think very hard about using it. I certainly would use it for Time Series analysis at first year level. For high school teachers in New Zealand, there is nothing to beat it.

It has some issues. The interface is clunky and takes a long time to unzip if you have a dodgy computer (as I do). The graphics are unattractive. Sorry guys, I HATE the eyeball, and the colours don’t do it for me either. I think they need to employ a professional designer. SOON! The data has to be just right before the interface will accept it. It is a little bit buggy in a non-disastrous sort of way. It can have dimensionality/rounding issues. (I got a zero slope coefficient for a linear regression with an r of 0.07 the other day.)

But – iNZight does exactly what you want it to do, with lots of great graphics and routines to help with understanding. It is FREE. It isn’t crowded with all the extras that you don’t really need. It covers all of the New Zealand statistics curriculum, so the students need only to learn one interface.

There are other packages such as Genstat, Fathom and TinkerPlots, aimed at different purposes. My university did not have any of these, so I didn’t learn them. They may well be fantastic, but I haven’t the time to do a critique just now. Feel free to add one as a comment below!

Stata is a good option as well. Has a GUI which is very helpful when you first use it, as well as programming code when you are getting into more serious work. Not too hard to learn.

Thanks – what sort of class are you recommending Stata for, and how much does it cost?

I think student price is about $US180 for the IC version which is fine for a majority of general users. Can be used for a variety of purposes as it is a flexible package and does some great graphs and figures, as well as some of the more substantial analyses.

WEKA, PYTHON are also very good options for data analysis

Hi Nicola. Another element to think about is where your students will be employed. For example, biostatistians and epidemiologists seem to learn Stata, a number of government jobs in Wellington will expect SAS skills. Thanks for writing this blog, long time lurker, first time poster. 🙂

Hi Michelle. Thanks for writing. That is a really good point. My colleague cynically referred to “Cv expansion”. It looks good to know (or be familiar with) a number of different packages.

A lot of biostatisticians use SAS as well. Epidemiologists & social sciences use SPSS too

As Peta writes above, Stata is very good; easy to learn, cheap for students to purchase, plus students can learn a lot as they go about using this. I am biased towards R, and R itself now has a programming interface included with it – as the version 3 will come up, it’s going to be even more versatile. I think by far the best interface, hands-down, for R is RStudio (rstudio.org), look no further.

Hi Arin, thanks for writing. What kind of students are you talking about? I find it hard to believe there is one package suitable for every class. I taught people with very little maths, often many years ago. They are obviously different from students of mathematics or engineering.

It may be useful to add that R is free as well as open source, which students (and others) find an attractive feature. It should also be seen not as a “stats package” but as a “software environment” in which you can make things. In particular, there are several front-ends that many people find useful in giving a more familiar statistical interface, such as R-commander and deduceR for beginners and RStudio for others who like, ah, that kind of thing…

Thanks Bill – I said iNZight was free, but forgot to mention that R was as well. Software Environment has a nice sound to it. It’s difficult to know what to call things.

Are the front ends you list also free?

Yes, R-commander and deduceR are both R packages (and there are others) and RStudio is a free external software device. All three run on Windows, Mac OS and Linux, by the way.

I will first of all admit that I am favourably biased towards R. I would however never inflict the level of programming skills required for R on inexperienced high school students while trying to teach statistics. I could see it being a neat tool to be able to generate sample data, plots of distributions, and get concepts across, but preferably with the teacher driving the program and using it as a visual aid.

“[R is]…pretty daunting for an arts or business major”

Well, my background is in psychology and R is the first real programming language I learnt. It was *far* easier for me to pick up than SPSS’s frankly bizarre macro language. Learning R will also teach you basic programming concepts (indexing, arrays, functions) that you can translate to other languages.

Regarding Excel, I think it is really quite a good spreadsheet tool. Using it to eyeball data, as an interactive system to enter and clean up data, or to produce some basic line/bar plots – great. For any graphing not explicitly pre-defined (cough.. histograms) or statistical analysis, I just wouldn’t bother. Pivot tables for analysis are just asking for mistakes like forgetting to expand the ‘data source’ range when adding cases. It was never designed in the first place to be a statistical analysis tool, so in my opinion it shouldn’t be shoehorned into being used as one.

SPSS – not really my bag, but I used it for several years and it is alright. The “verbose output + pick out relevant bits” model of analysis never particularly thrilled me.

The iNZight system looks quite reasonable. One of it’s strong points by the look of it is that it is not ‘fancy’ and seems to focus on giving basic results and clear graphs not cluttered by ‘chart junk.’ I hope they don’t let a designer near it and end up with drop shadows and 3d effects.

I’ve formed the impression that SPSS is pushing people to learn Python, for example in data cleaning/preparation. Did you have a sense of what is happening with respect to expectations around SPSS users also needing to learn Python?

I’ve been out of the world of regular SPSS users for the last two or three years, so I can’t comment with much authority. I know SPSS has added some interfaces to languages like Python and R. Last time I used the R integration it was fairly clunky though, and didn’t allow things like adding a new column/variable without passing the whole dataset/data.frame back and forth (things may have improved).

I get the impression that a large chunk of SPSS’s userbase rarely use the in-built ‘syntax’ and that Python integration would be only used by a further subset of that subgroup. I have however heard good things about NumPy and SciPy in regards to stand-alone use of Python for statistical analysis.

In general, I think Excel is extremely limited even for first year statistics. It does not encourage exploratory Data Analysis looking at many variables at the time, it copes poorly with missing values and it provides no way to track how things were done (poor repeatability). Despite this, we use Excel in some quantitative courses, but not statistics, because it’s such a common package in industry.

I’d say that it would be easier to teach at an entry level using SPSS or Stata. The latter is a good one in econometrics and biostatistics; and as an advantage one can start using the menus and learning the code generated by them (a similar approach is used by Genstat). Both of them are commonly available in universities, but I don’t know how often they are used in industry.

The common problem with R and SAS is that one is simultaneously teaching programming and statistics, which can be confusing for many (often more than half of the) students. Students that will end up heavily using stats should probably be acquainted with both of them. Employers like Stats NZ provide training in SAS if required; so understanding concepts is seen as more important than the specifics of a given language.

One of the main advantages of R (even with front ends like R-Commander or iNZight) is that students can get a copy and install it in their own computers. This is particularly good for students coming from poor countries, where buying software is beyond reach many times (here Genstat is great, as it has a free version for developing countries). Finally, if you don’t like the colors or default graphs in R (or derived front-ends) you can just change them.

In Canterbury we use R (using R-Commander / R-Studio, depending on the student) for basic experimental design and regression analyses courses (STAT201/202). It’s some times a struggle, but most students will get there.

I second every positive thing that has been said about Stata. I teach it at postgrad level to classes of political science students that have no background in math or stats; 24 hours later, they can all write up a fairly decent regression analysis based on 300 lines or so of code. It works for (almost) everyone, from those who never want to do stats again to those who will continue dabbling with it in the advanced class, where they learn more complex things than OLS.

As someone recently emailed me, Stata is easier than R, just like three-wheel bicyles are easier than real ones. I am also teaching an R class at the moment — with a selected group of ambitious undergrads who know quite a lot of math. I therefore endorse your general view, different software for different needs and levels of proficiency. I would even consider teaching Excel because it’s relevant in some places (1), but its numerical inaccuracy bothers me.

(1) https://news.ycombinator.com/item?id=5198187

No mention of SAS?

I do not agree that we should use excell as a statistical package just because it is already there for free on Windows machines. It really is not suitable for any meaningful statistical analysis. It is better to use a real statistical package in your course to help teach statistical concepts. If a student never needs statistics later, ok, no loss. However if some student finds the need to do statistics later in his/her career, he/she will know that there is a real software package available. I find Minitab very easy for students to use with pull-down menus and dialog boxes. R has too steep a learning curve to be used in an introductory course.

Bill

I can’t think of any other human activity where one would justify teaching poor tools or bad practice “because they are there”. IT also seems to be the last area where someone with a basic and sketchy knowledge is left to make their own way. Much of the blame I put on senior executives and politicians, who themselves have no knowledge and probably a fear of technology. So they don’t ask the basic questions (eg, you say you’re setting up a database – what is a “database”?), but puff themselves up by signing a big (blank) cheque. We need to be far more clear whether we are teaching (in any particular setting): concepts of statistics with some illustrative calculations; or statistical methods with tools to apply them; or computer programming as a generic system for solving problems that may be statistical. Most of all, we need to teach everyone (students, executives, politicians) that whatever they learn today will need to be revised (and maybe unlearned) several times during their working lives. “Preparing students for the workplace” still means training them to learn, not sending them out with exact skills to “hit the ground running”. Where do I pick up these cliches? (And why can’t I type cliche with the accent on this interface – rhetorical)

I second this sentiment. Part of the job of teaching should be to exert pressure on the workplace to improve rather than just prepare new fodder for a bad system.

If Excel is used, then the students should certainly be warned about its dangers: http://www.burns-stat.com/documents/tutorials/spreadsheet-addiction/

I taught statistics for a few years in a large US public university, with class sizes of about 30-40 students who were willing to get their college diploma, but frankly were not the highest ability students. I have used Stata for business majors, and it went reasonably well. I developed the system that would generate and semi-automatically grade students’ individual assignments, which made them work together explaining the concepts to one another, rendering verbatim copying pointless at the same time. The students liked that I tried mimicking the work environment with a little bit of data cleaning, little bit of data analysis, and some report writing. I tried using R for engineering students, but it did not click with them. I am not sure how psychology students would survive the stress of learning R, although I can’t rule out that a very enthusiastic and talented teacher could manage it. I am far more enthusiastic about Stata, as I found it to be easier to learn due to a combination of a point-and-click interface (which I have never used, and my students discovered and utilized it on their own account) and enforcement of the common syntax (which is a problem for the statistical packages that are built through piling up the modules… and that’s how essentially things are done in every other package mentioned here — SPSS, SAS and R).

I don’t rule out that NZ folks really have better resources for learning R than anybody else, though, with 20% of Ph.D. statisticians in the country working on R development. So there may be some geography biases in what’s the best package to use in the classroom.

I am also deeply convinced that R or Stata or (God forbid) SAS should not be the first programming environment for a student to learn. Programming should be taught by computer scientists, not by statisticians (and I likewise refuse to re-teach calculus when students in my class cannot do \int x^2 \exp(-x) dx). Learn Python or Ruby in codeacademy.com; learn Scratch from MIT, for God’s sake. But mounting the challenge of figuring out what loops and regular expressions are on top of leaning when to use the normal distribution vs. Student distribution — that’s simply not fair for the students.

GenStat has a schools/university version that is free (GenStat for Teaching and Learning – GTL). This is menu driven like Minitab and is being successfully used in lots of NZ schools. Jeanette Chapman at Otago Girls HS has developed notes for the NZ Year 13 Mathematics with Statistics course, and the school are also using it for Year 13 Biology and classes down to year 9 level. The schools version has a simplified menu system, removing menus outside the syllabus, but a switch allows the full university menus to be displayed. You can request a copy of GTL at http://www.vsni.co.uk/software/free-to-use/teaching/genstat-teaching. John Harraway at the University of Otago has put together 19 video resources on applications using statistics, most with a GTL lesson plan (http://www.maths.otago.ac.nz/videos/statistics/).

Coming back to the programming/coding point which has been raised a couple of times, surely loops and regular expressions are used to clean the data prior to analysis? So if you’re teaching with datasets that are designed to show a particular outcome (Fisher’s iris data anyone?), then the only code the students need to learn is:

1. how to get the data into the package (and this can be helped by giving them native data) and

2. the code for the statistical analysis they’re being taught?

Most of my programming IRL is shaping data up ahead of analysis (I have a 90/10 heuristic for how much code I need to do for preparation vs. analysis). Limiting student coding to only undertaking the statistical queries of interest reduces the amount of coding complexity down very dramatically. Stata, SPSS, SAS, Genstat, Statistica, Minitab all have dropdown menus to use, which means the coding requirement can be reduced to 0, but given that the syntax for many simpler statistical analyses is very short (e.g. box plots, linear regression, ANOVA) I would have assumed that the coding requirements for students would not be very daunting.

Doesn’t it boil down to what do you want to teach: how to code (btw this is some statistics) or how to use/interpret statistics (btw this is some syntax)?

(1) I’ve heard good things about JMP, but interested to note that nobody here has mentioned it.

(2) I’m interested in opinions about what the best GUI environment is for using R. Several are mentioned here, but not much detail and no real recommendations. I think R has a lot to recommend it if the ease-of-use issue can be solved for non-technical students.

(3) One issue with Excel is the general difficulty of auditing models to ensure that the instantiation is correct and the data is clean. That problem arises with spreadsheet optimization models as well. It seems that it might be worth the effort to teach some programming skills to get students thinking about model validity and verification. (That said, I’ve been teaching with Minitab and the Web-based program StatCrunch, mainly because our mainline course covers so much ground there’s no time to spend on programming skills. Our business stat course uses Excel at the insistence of the B-school.)

We use R commander as GUI because it covers most material for basic courses and it’s cross-platform; we have an increasing number of students using macs, although Windows still is the majority. It isn’t pretty but does the job. I’ve tried other GUI but they often have problems for working in multiple operating systems.

When coding I’m trying to move students to RStudio, which provides decent syntax highlighting and project management.

I find interesting that B-school insists on Excel; that does mean that students are limited in their learning by Excel’s features?

I haven’t actually taught that version of the course, but now that I look at the syllabus, it almost seems unfair to call it a statistics course. It uses McClave, Sinchich, and Mendenhall and covers the first six chapters (almost all descriptive stats and probability up through sampling distributions and central limit) and Chapter 10 (ANOVA). So I’d guess the answer is yes (but my guess is not definitive).

You have missed the best of all!

I have taught Stats with JMP and it is great.

It actually can cover business stats to masters level. It supports a way of working that makes teaching more logical – why?

Because all the y vs x(s) methods are unified as a platform which covers for example linear models and generalised LMs as options.

And it has a great, modern scripting language which treats everything as an object – even reports.

There are also some great teaching demos eg the linear regression one..

I recommend exploring it further.

Pingback: The article mentions iNZight, which gives a nice GUI to R, so it can be used more…

Pingback: Teaching a service course in statistics | Learn and Teach Statistics and Operations Research

The state university I taught at in the U.S. had Minitab available for students free on campus computers, so it was used in the service statistics courses. I used it also in teaching Quality Management because it was available to students (also a limited-day (30?) trial was free and that was long enough for our project, so many put it on their laptops). I was taught stats in industry Six Sigma courses using Minitab, and it’s my understanding that many companies, at least in the U.S., use this software, so at least here, it’s something they have a good chance to use again, depending on the company. (Note: I worked in an development engineering department of a manufacturing company.)

My exposure to SPSS is limited. I think I used it a bit for my master’s work, and I’ve been tutoring a friend through her quantitative research methods courses in a college of education setting this year. They use it, and I’ve heard it’s more common for behavioral sciences and education research than for the hard sciences. Some of my business teaching colleagues used this for their research analyses, but I don’t know of anyone using it to teach business-related stats or operations, even at the masters level.

I agree with one of the commenters that JMP is a powerful tool. I was exposed to this as a Ph.D. student, and I know one of the developers. It continues to advance in usability and the ability to handle difficult problems. Some U.S. industries choose this as their stats tool. (A semiconductor company I worked for had this instead of Minitab.) I think it can do more than is needed for undergrad courses or even MBAs, so while I like it, I wouldn’t likely use it much as a teaching tool unless I was teaching graduate-level engineers or statisticians.

I appreciate your review and comments!

I love SPSS! I can help you with it. Check my blog for further details http://www.attyguideblog.wordpress.com

They use RStudio with the psych students here at my uni. Most of them are skeptical at first. But there are some serious advantages they like once I point them out:

1. It’s free and works on whatever kind of computer they happen to have, which means they can take it home and practise. This appeals especially to the students who can’t come to campus during normal hours.

2. You can put specific commands in the actual lecture notes/ prac instructions, and all that is required to see what they do is cut and paste. Things like SPSS and Excel have a problem where the instructions have a whole lot of menus + dialogue box information and therefore don’t make much sense on paper.

3. You can put your own commands in a simple text file to save for later. Then all you do is highlight and run. No need to type everything again (and no need to run through a whole lot of menus to do it all again!). And if you want to change a variable, just change one bit of the text. Once you show them how to do this, the students are very impressed and see the time-saving benefits of a text-based system.

Those are the three main benefits that tend to win over the psych students — and the majority of them are VERY maths-adverse!

Of course no computer program will save you if you don’t actually teach them how to use it effectively. With R, you have to take the time to explain how R thinks about its data and results as objects that you can pull pieces out of. And explain how each bit of a command is telling R to do something in particular.

And NO computer program will save you if you don’t take the time to help the students learn what the stats is trying to acheive and how to tell if you have acheived it!

I am a demand planner who generates forecast for a living. My question is which do all feel is the best software to generate forecast, or can they all perform this task. I currently use enterprise systems like SAP APO while working at my current employer, but I am doing a great deal of consulting work with Solver and were wondering if there are any other better alternatives like spss, sas, r,etc. I would prefer software which is easiest to use to genreate future monthly/weekly forecast and also for inventory optimizatio, leading indicators, and intermittent demand

Pingback: Those who can, teach statistics | Learn and Teach Statistics and Operations Research

Beware. If UR are using STATCRUNCH for a National University Class, they will not help you understand how to use it and expect you to figure out all the problems on your own. Statcrunch instructional videos are horrible. They do not help people who have no experience using statcrunch.

It just depends on the type of Subject you are teaching. If you are teaching Statistics then R is better then the Excel because R is free of cost , also R is well-documented ,and it will create better graphics then EXCEL…

So i will recommend you R instead of EXCEL…