What can you do with ordinal data? Or more to the point, what shouldn’t you do with ordinal data?

First of all, let’s look at what ordinal data is.

It is usual in statistics and other sciences to classify types of data in a number of ways. In 1946, Stanley Smith Stevens suggested a theory of levels of measurement, in which all measurements are classified into four categories, Nominal, Ordinal, Interval and Ratio. This categorisation is used extensively, and I have a popular video explaining them. (Though I group Interval and Ratio together as there is not much difference in their behaviour for most statistical analysis.)

Nominal is pretty straight-forward. This category includes any data that is put into groups, in which there is no inherent order. Examples of nominal data are country of origin, sex, type of cake, or sport. Similarly it is pretty easy to explain interval/ratio data. It is something that is measured, by length, weight, time (duration), cost and similar. These two categorisations can also be given as qualitative and quantitative, or non-parametric and parametric.

## Ordinal data

But then we come to ordinal level of measurement. This is used to describe data that has a sense of order, but for which we cannot be sure that the distances between the consecutive values are equal. For example, level of qualification has a sense of order

- A postgraduate degree is higher than
- a Bachelor’s degree,which is higher than
- a high-school qualification, which is higher
- than no qualification.

There are four steps on the scale, and it is clear that there is a logical sense of order. However, we cannot sensibly say that the difference between no qualification and a high-school qualification is equivalent to the difference between the high-school qualification and a bachelor’s degree, even though both of those are represented by one step up the scale.

Another example of ordinal level of measurement is used extensively in psychological, educational and marketing research, known as a Likert scale. (Though I believe the correct term is actually Likert item – and according to Wikipedia, the pronunciation should be Lick it, not Like it, as I have used for some decades!). A statement is given, and the response is given as a value, often from 1 to 5, showing agreement to the statement. Often the words “Strongly agree, agree, neutral, disagree, strongly disagree” are used. There is clearly an order in the five possible responses. Sometimes a seven point scale is used, and sometimes the “neutral” response is eliminated in an attempt to force the respondent to commit one way or the other.

The question at the start of this post has an ordinal response, which could be perceived as indicating how quantitative the respondent believes ordinal data to be.

What prompted this post was a question from Nancy under the YouTube video above, asking:

“Dr Nic could you please clarify which kinds of statistical techniques can be applied to ordinal data (e.g. Likert-scale). Is it true that only non-parametric statistics are possible to apply?”

## Well!

As shown in the video, there are the purists, who are adamant that ordinal data is qualitative. There is no way that a mean should ever be calculated for ordinal, data, and the most mathematical thing you can do with it is find the median. At the other pole are the practical types, who happily calculate means for any ordinal data, without any concern for the meaning (no pun intended.)

So the answer to Nancy would depend on what school of thought you belong to.

## Here’s what I think:

All ordinal data is not the same. There is a continuum of “ordinality” if you like.

There are some instances of ordinal data which are pretty much nominal, with a little bit of order thrown in. These should be distinguished from nominal data, only in that they should always be graphed as a bar chart (rather than a pie-chart)* because there is inherent order. The mode is probably the only sensible summary value other than frequencies. In the examples above, I would say that “level of qualification” is only barely ordinal. I would not support calculating a mean for the level of qualification. It is clear that the gaps are not equal, and additionally any non-integer result would have doubtful interpretation.

Then there are other instances of ordinal data for which it is reasonable to treat it as interval data and calculate the mean and median. It might even be supportable to use it in a correlation or regression. This should always be done with caution, and an awareness that the intervals are not equal.

Here is an example for which I believe it is acceptable to use the mean of an ordinal scale. At the beginning and the end of a university statistics course, the class of 200 students is asked the following question: How useful do you think a knowledge of statistics is will be to you in your future career? Very useful, useful, not useful.

Now this is not even a very good Likert question, as the positive and negative elements are not balanced. There are only three choices. There is no evidence that the gaps between the elements are equal. However if we score the elements as 3,2 and 1, respectively and find that the mean for the 200 students is 1.5 before the course, and 2.5 after the course, I would say that there is meaning in what we are reporting. There are specific tests to use for this – and we could also look at how many students changed their minds positively or negatively. But even without the specific test, we are treating this ordinal data as something more than qualitative. What also strengthens the evidence for doing this is that the test is performed on the same students, who will probably perceive the scale in the same way each time, making the comparison more valid.

So what I’m saying is that it is wrong to make a blanket statement that ordinal data can or can’t be treated like interval data. It depends on meaning and number of elements in the scale.

# What do we teach?

And again the answer is that it depends! For my classes in business statistics I told them that it depends. If you are teaching a mathematical statistics class, then a more hard line approach is justified. However, at the same time as saying, “you should never calculate the mean of ordinal data”, it would be worthwhile to point out that it is done all the time! Similarly if you teach that it is okay to find the mean of some ordinal data, I would also point out that there are issues with regard to interpretation and mathematical correctness.

## Please comment!

### Foot note on Pie charts

*Yes, I too eschew pie-charts, but for two or three categories of nominal data, where there are marked differences in frequency, if you really insist, I guess you could possibly use them, so long as they are not 3D and definitely not exploding. But even then, a barchart is better. – perhaps a post for another day, but so many have done this.

I would like to mention a couple of downloadable articles relevant to this post.

Warren Sarle has a very comprehensive FAQ on “Measurement Theory”.

ftp://ftp.sas.com/pub/neural/measurement.html

A more sceptical view of the area is given by Paul Velleman and Leland Wilkinson, each a statistical package developer, in the article “Nominal, Ordinal, Interval, and Ratio Typologies are Misleading”, an expansion of a 1993 American Statistician article available from

http://www.cs.uic.edu/~wilkinson/Publications/stevens.pdf

Sarle confronts some of Velleman and Wilkinson’s 1993 objections in his article.

Dr Nic suggests that there are not many statistical consequences of a difference between the Interval and Ratio levels. Maybe one might be that a log transformation is often appropriate with Ratio level data, but with merely Interval data y = log(x + alpha), with alpha estimated either formally or informally, might be required.

Thanks Murray. I will read the articles.

At the level I generally aim at, which is introductory statistics, the students are unlikely to need to know the difference, but it is worth bearing in mind.

Sure Nic – yes the level of these articles is well ahead of what a student could cope with directly but it helps for a teacher to be a step or two ahead of the class. I read both articles in earlier versions some time ago. It looks like both articles have been updated in response to feedback over the years. The kind of reader that they will help most is the statistician who consults with social scientists, especially psychologists. Despite (or because of) the fact that the articles take opposing positions I remember feeling that I came away feeling that I understood the main issues fairly well as a result of reading them.

Murray

On 8/07/2013 2:01 p.m., Learn and Teach Statistics and Operations Research wrote: > WordPress.com >

It’s a shame the dot-plot doesn’t get a look in in the video, or your comments. The authoritative tone of the video in particular gives the impression that the bar chart is the acknowledged king for ordinal data, but all statistical graphics gurus I have come across prefer the dot plot.

Hi Peter

That is a very good point. I really like dot-plots too and used them all the time when I taught using Minitab.

I suspect the problem was that I was writing the video for a course that used Excel for producing graphs, and of course Excel doesn’t do dot-plots. I will rectify that if I redo the video!

Pingback: Surveys | Pearltrees

Another common example of ordinal data at the “high” end of your scale is grouped interval data, i.e. where interval data has been grouped into a frequency table. This data can be displayed as a histogram and numerical summaries such as the grouped mean and grouped standard deviation can be calculated. In the pre-computer age this was a common exercise for students and there are still introductory Statistics textbooks around that contain these formulae (the early editions of Black’s Business Statistics spring to mind). It can be argued that with today’s computing power the need for discussing grouped data has gone, but nowadays a lot of “real” data is only published in frequency tables

Pingback: Oh Ordinal data, what do we do with you? | Con ...

“qualitative and quantitative, or non-parametric and parametric” – PLEASE don’t repeat an old canard. There are no such things as parametric and non-parametric data. Those terms apply to the model applied to the data.

On ordinal data, it depends on whether you impute an underlying scale or have a set of nominal categories that are ordered but not on any measured dimension – defining the person with higher qualifications as cleverer seems a circular definition. There is skill and experience in this data analysis game – it’s not just do the sums and out pops THE answer.

Pie charts – are popular and have a place as presentation devices. Like any rhetorical device, they can be used to give emphasis so that the message is clarified or can be abused to distort the plain message from the data. Hence I agree in loathing 3-D pies because they introduce an uncontrolled distortion but have no issue with exploded segments: “among the categories shown we highlight … .” Graphs are misused because people are not taught to draw or read them – it’s assumed to be inate.

Dear Dr Nic

You ask what do we teach? I’d like to quote from Statistics at Square One 11th ed (Campbell and Swinscow, 2009, p22)

“… the mean from ordered categorical variables can be more useful than the median, if the ordered categories can be given meaningful scores. For example, a lecture might be rated as 1 (poor) to 5 (excellent). The usual statistic for summarising the result would be the mean. ”

Our research (into quality of life measures) shows that in many cases the mean of a set of ordinal data gives a ‘good enough’ result, although obviously care is needed!

Mike

Thanx Dr. Nic for the wonderful lesson, I am just curious if the gaps between ordinal data are equal, should we still avoid the mean?

It depends on the context. If you can guarantee that the gaps between the ordinal data are equal, you in fact have interval data, and it is fine to calculate the mean. The trick is that the gaps can look equal, but if you think hard about what they are actually measuring, we cannot guarantee that they are.

Likert is considered as interval scale instead of ordinal scale. Prof. naresh Malhotra in marketing research also of same opiniion

So far, as I am beginning in this stats world, I have found your explanation the most helpful! What I am struggling with is how to make customer satisfaction data more meaningful than “you have a 96% satisfaction rate.” (on a scale of 1-5).

Pingback: Oh Ordinal data, what do we do with you? | Random Talks

Pingback: Nominal, Ordinal, Interval, Schmordinal | Learn and Teach Statistics and Operations Research

Lets not dumb it down and perpetuate bad practice. Here is what I say

If the data is ordinal one is usually interested in the proportion of respondents givign a response ‘at least as high as some value’. E.g if responses are like a lot, like, neutral dislike, dislike a lot then onew ould like to now how many chose like or like a lot. thes choices may be albelled 1- 5, but never take mean

So descriptively show CUMULATIVE graph of proprotion choosing: 1; 1 or 2; 1,2,or 3;1,2,3,or4.

There is a procedure for comparing groups on ordinal data called ordinal regression. It is available in all commonly used statistical packages. What is compared is a transformation of cumulative proportion for each group or treatment. The transformation is either logistic or normal. The reason for the transformation is that raw proportions may have floor or ceiling effects.

this is within comprehesnion of businessa and pscyhology students in my experience

If non-parametric comparison means rank methods, eg. mann-whitney, wilcoxon etc, then these are as inappropriate as normal based methods. they assume that all groups have the SAME shape distribution, which is impossible for scales with few options, e.g. typical Likert ITEMS and unlikley fro scales with floor or ceiling effects.

However, normal based methods and means are routinely used for Likert SCALES composed by summing scores on many Likert items. This will not be too. It misleading if the sum scores are reasonably normally distributed. It will be misleading if there are strong floor or ceiling effects, as occurs for many diagnostic scales. Fro example the Beck depression score has more tha 20 items but the majority of ‘non-depressed’ have scores below 5.

Cumulative proportion is the best summary statistic for any ordinal scale. Ordinal regression is the appropriate analysis. Now avaialble in most statistical packages for within as well as between group comparisons.

These methods are rarely described in introductory text, of course.

e.g. Dr. Nic does not mention cumulative proportions or ordinal r logisitc regression. Why?

This needs to CHANGE. The concepts are the same as for normal based methods, so it is a samll extention to ordinal methods, as the packages are availalble

Nice article. People tend to seem dislike dealing with this sticky issue. And even among those who say they are “purists” you see them commonly treat types of data which are ordinal (like IQ, which is really more of a ranking and certainly not quantitative) as if it was quantitative.

I’m running into this issue right now dealing with a statistical problem involving the average of numerical rankings for a group of institutions. It’s not obvious which way to correctly consider the data, and I guess in the end, statistics being the empirical science that it is, the answer is “whatever works best”

Pingback: Oh Ordinal data, what do we do with you? | SPSS...

Comment

You said “… if we score the elements as 3, 2 and 1, respectively and find that the mean for the 200 students is 1.5 before the course, and 2.5 after the course, I would say that there is meaning in what we are reporting.”

Some question in your statement are:

• How do we can rate “very useful=3”, “useful=2”, and “not useful=1?”

• Why not “very useful=1000”, “useful=100”, and “not useful=10?” OR “very useful=A”, “useful=B”, and “not useful=C (assume that A better than B, and B better than C)? ”

How can calculate an average of 200 students, whereas 1, 2, and 3 is not a number but just coding or label that has increasing order like c, b, and a? During my study of mathematics and statistics have never been no such thing as algebra of label. If the calculation of the average is justified by mathematics and a score of 1 to 3 to form a continuous continuum of “not useful” to “very useful”, get you explain a “measure unit” of this measure? How is it “useful” twice “not useful”? Is “useful” plus “not useful” same with “very useful?” If your goal just want to see the development of student achievement (performance), the “signed test” of change from “before” to “after” of the intervention is enough, and does not need to justify the ordinal rules for various reasons. Your reason and justification may have not only mislead the people who do not understand the measure theory but invites people to justify the wrong rules in life sciences.

Thank you.

Budi Hari Priyanto

Statistician

Hi Budi

Thank you for your contribution.

I just voted for ‘never’ quoting a mean for ordinal data. Then I read the article and yes, we do occasionally average things that look ordinal. But … what have we (and in your example, you) averaged? Usually, it is not the raw values, except by coincidence; ordinal raw values are not really numbers at all. ‘not useful’, ‘useful’, ‘very useful’ clearly do not have an arithmetic mean. But what the example did was, consciously or otherwise, rank them and then average the ranks. Rank is an interval scale; and we can do a reasonable collection of traditional statistics on ranks. It’s not hard to defend tests on mean rank as an indication of improved ranking. We can’t easily put the mean rank back on the original scale – is 2.4 really closer to ‘useful’ than ‘very useful’? – but we can test pretty reliably for an improvement between two treatments. Quite a lot of ‘score’ averaging used on ordinal data is just proxy rank averaging.

One other comment; be wary of lumping ratio and interval scale data together. Not long ago I caught a lab staff member, based on a moderately defensible habit developed from concentration data, calculating the relative standard deviation for a cold room temperature thermometer near 4 degrees Celsius, then using that to infer the dispersion at room temperature. Lab thermometer readings do _not_ change in precision by a factor of five over a 16 Celsius range….

Thanks Steve. As is usual in Statistics, the answer is ‘it depends’. I see your point about Interval/Ratio. Something to be aware of.

I strongly agree using non-parametric to analyze the ordinal data because the rating or rank scale such as 1, 5, 10, 20, 40 or 1, 10, 100, 1000, 10000 have the same meaning with the scale of 1, 2, 3, 4 and 5. Rank transformation will change the ordinal into interval “relative.” I said “relative” because it depends on the sample size. As an alternative, the ordinal data can be analyzed using frequency comparison. Whatever the reason, ordinal data can not be treated as interval/ratio scales (see Fraenkel et al 2012: How to design and evaluate research in education 8th edition; Glass & Hopkins (1996): Statistical methods in education and psychology 3rd edition; Sheskind, D. J. (2004) Handbook of parametric and nonparametric statistical procedure 3rd edition; Pagano, R. R (2013): Understanding statistics in the behavioral sciences 10th edition; etc.).

Thanks for this very informative article.. I think those practicing in statistics should read this one first before doing in analysis..

Thanks! They could watch the video too.

I am in a dilemma when it comes to voting on ordinal data (scores):

Scoring options for an event

Score 1 = no error was made

Score 2 = possible error but need to see trend (track and trend)

Score 3 = an error was made

10 voters with the following scores:

Score of 1 – 3 votes

Score of 2 – 3 votes

Score of 3 – 4 votes

There is no majority. To me you would continue to discuss and vote until you get a majority – or – to save time you can use the median. Using a mode (pleurality) does not make sense to me in this situation. I can’t find any literature on this.

Thank you

Hi Doug

That is a really interesting problem. I agree that mode makes no sense. If you had 11 voters, 5 of which voted 1, and 6 voted 3 would give a mode of 3, which does not summarise what is happening. In that case the median would also be 3 – so that doesn’t make much sense either. I agree discussing is important. I hate to admit it, but there is a case for a mean. (waits for people to throw fruit!).