# The median outclasses the mean

## The median suffers from poor marketing.

All my time at school the “average” was always calculated as the arithmetic mean, by adding up all the scores and then dividing by the number of scores. When we were taught about the median, it seemed like an inferior version of the mean. It was the thing you worked out when you weren’t smart enough to add and divide. It was used for house prices, and that was about it. Of course the mean was the superior product! Why wouldn’t you use the mean?

I’ve been preparing resources for teaching the fabulous new New Zealand curriculum, and have been brought face-to-face with my prejudices. It strikes me that the median has had very poor representation.

# Public opinion of the median and mean

I put a question on Facebook and Twitter to see what people felt about the mean and the median. I briefly explained what each was, then asked which one they thought was better. Some people had no idea what I was talking about, but most felt that the mean was the superior statistic. The following are a selection of responses:

The mean, but I don’t know why.. maybe that’s just what we were taught to use when I was back in school (a long time ago!) lol

When I think of “average” I always think of the mean. I don’t know if it’s actually better though

well the median is a real pain to work out. you have to make a list of all the numbers, in order, and then count how many they are and then go to the middle. PAIN IN THE BUM. the average… well that is somewhat quicker to do, no? and i don’t see the point in the median at all. unless well no, there is just no need for it. who cares what the15th person in the class got on a test? the lowes and highes is much more interesting. As i remember it, the mode is the most commonly occuring number out of a set of numbers… i think of this as the “mode” or in English (not French), the ‘fashionable” number. oh and it stresses me how all 3 start with Ms cos that is confusing. which is why i like to use the word average.

The mean, which I’m guessing is the same as the average? When the media refer to real estate stats they always use median price, which can distort reality, we would prefer the average price. (From a real estate agent)

I don’t really think it’s a case of which is better. They’re two different things aren’t they? I think it’s usually easier to work out the average.

A number of my Facebook friends did know about statistics, and responded in favour of the median in most cases. This was an interesting comment:

“It depends. Everyone who proof read my thesis was like why on earth are you using the median – no one uses it. And most of the other similar primate studies I’ve read use the mean (except one, that was published by my associate supervisor). But my means were off their rocker, and I’m pretty sure my medians were a much better representation of reality in this case. It makes making comparisons between studies a little awkward though.

# Why NOT use the median all the time?

I am hard pressed to find an instance where the mean is actually a better measure of central tendency than the median. The purpose of the mean or median (or mode) is to provide a one number summary of a set of data. The whole idea of the mean is actually quite tricky, as you can read in one of my early posts about explaining what the mean is. Generally the summary value is used to compare with another sample or population.

In my lectures I often illustrated times when the median is a better summary measure of a sample or population than the mean. This is quite common in notes and YouTube videos. Never once did I show where the mean was preferred to the median! So why were/are we so loyal to the mean, bringing out the median for special occasions and real estate?

I think there are two answers, both of them no longer valid. It is a question of legacy.

# Time and ease to calculate

Despite first appearances, for anything larger than a trivial sample the mean is actually easier to calculate than the median. Putting a set of 100 values in order by hand is no easy task. (Pain in the bum, as my friend so elegantly expressed it.) Adding up scores and dividing by 100 is a walk in the park in comparison.  In the early 1980s when I learned programming (in Fortran, Pascal and Cobol), writing a sorting program was far from trivial and a large set of numbers would take a large amount of time to sort. Only in later years, as computing power has expanded, has it been possible to get a computer to calculate a median.

# Formulas for confidence intervals

Means behave nicely and give nice mathematical results when manipulated. Because of this we can calculate confidence intervals using a nifty little formula and statistical tables. Until bootstrapping by computer  became do-able on a large and small scale, there was no practical way to perform inference on a number of very useful statistics, including the median and the inter-quartile range.

# Conclusion: the median is better

A median is intrinsically understandable. It is the middle number when the values are put in order. End of story. – Well not quite – you do have that slightly tricky thing where the sample is even and you have to average the middle two terms, but apart from that it is easy!

A median is not affected by outliers. I learned a new term for this when I was reading up in preparation for writing this post. The term is “resistant” and I learned it from one of Mr Tarrou’s videos for AP Statistics. I found these videos after my tirade against videos on confidence intervals. Tarrou’s videos are long and a bit more mathematical than I would like. (He can’t help it – he is a maths teacher and the AP Statistics syllabus seems to have been devised by mathematical statisticians trying to put students off ever taking the subject again.) But they are GOOD. Tarrou’s videos are sound, and interesting and well put together. I will be recommending them as complementary to my own offerings. (Because I sure as heck don’t want to have to do all that icky mathsy stuff).

But I digress. The median is “resistant” because it is not at the mercy of outliers. There are lots of great examples, including in Mr Tarrou’s video. If you have a median of 5 and then add another observation of 80, the median is unlikely to stray far from the 5. However a mean is a fickle beast, and easily swayed by a flashy outlier.

The main disadvantage I can see for the median is that it can be a bit jumpy in small samples made up of discrete values. I guess if you have two well-behaved populations that are very similar and you want to see precise differences then the means might just be better – but even then you would possibly be over-interpreting small differences.

I have found it very interesting observing the behaviour of confidence intervals for the difference of two medians, compared with confidence intervals for the difference in two means. While I was preparing materials for our on-line resource, I performed nine such tests on different real data taken from students at university. The scores are very jumpy, and the differences between the medians often include exactly zero. Consequently the confidence intervals of the difference of two medians quite often have zero as their lower bound. This provides a challenge in interpretation, as I had not met this often when looking at the differences between means. However, it also illuminates the odd relationship we have with zero. Just because a confidence interval for a difference of two means is (-0.13, 3.98) and includes a zero, it is tempting to conclude that there is no significant difference. But is -0.13 really any different from zero in practical terms? The other point is that we should be leaving the confidence interval as it is, rather than stretching it into further inference.

# Word on the web

I did a little surfing to see what the word on the web was.  To find out who said what, drop the entire phrase into Google. (Ah ‘tis a wonderful we live in, indeed)

• “The mean is the one to use with symmetrically distributed data; otherwise, use the median.” Hmm – but if the data is symmetric, surely the mean = the median?
• “An important property of the mean is that it includes every value in your data set as part of the calculation. In addition, the mean is the only measure of central tendency where the sum of the deviations of each value from the mean is always zero. “ Ok – hard to argue with that.
• “Calculation of medians is a popular technique in summary statistics and summarizing statistical data, since it is simple to understand and easy to calculate, while also giving a measure that is more robust in the presence of outlier values than is the mean.” Totally!
• “However, when the sample size is large and does not include outliers, the mean score usually provides a better measure of central tendency. “(Then goes on to give an example of when the median is better.)
• “Use the median to describe the middle of a set of data that does have an outlier. Advantages of the median: Extreme values (outliers) do not affect the median as strongly as they do the mean, useful when comparing sets of data, it is unique – there is only one answer.
Disadvantages of the median:  Not as popular as mean.(Not as popular??!)

Sorry median  – you do not win X-Factor for summary statistics. You may be more robust, and less fickle, not to mention easier to understand, but you just aren’t as popular!

I can feel a video coming on – the median has been relegated to the periphery long enough!

## Update in 2018

Here is our video about different summary statistics, which also addresses the relative merits of mean and median, and why they even matter!

## 32 thoughts on “The median outclasses the mean”

1. Nice article. As a statistician, I’m a huge fan of the median. I’ve worked on a large simulation study, and even small departures from non-normality result in the median being a better estimate of central tendency (even in quite large sample sizes).

Most software gives a p-value for non-parametric tests, such as the Wilcoxon Rank Sum Test (WRST). What a lot of people don’t know is a neat trick to work out a confidence interval. If you have two samples, say two treatments in a clinical trial, if you add a constant to all the values in one sample, and the p-value from the WRST is non-significant, then it’s within the confidence interval. By either playing around, or adding an increasing range if constants to one sample, you can EASILY get the confidence interval.

Lastly, there’s one case where I do think it makes sense to use the mean, even when distributions aren’t normal. That’s where you are analysing amounts of money (say the cost of an illness). Quite often, distributions of cash are highly skewed, but there is interest in TOTAL spend. In these cases, the mean can be more relevant.

• Thanks Kevin – great to hear from a practitioner. That is a really good point about the total.
Someone on Twitter pointed out “Depends on the application. Median is good for giving a “typical” value, but median speed won’t help me predict my travel time”
This is another case where what we really want is related to the total, rather than the average. Or something like that.

2. The mean does have a smaller sampling error thatn the median and it is important to the calculation of the variance and standard deviation.

3. One number (mean, median etc) is seldom enough to describe a set of numbers, a standard deviation helps. But a plot of the distribution (cumulative?) is the often best answer.

• So true, but the sad fact is that often only one number is given. Box plots and dotplots are emphasised in the new NZ curriculum, over single value summaries. In the world of politics and economics, however, we are usualy fed only the mean. I wish we were told the standard deviation more often. (As in ever!)

4. Regression is the main tool in the statitistian’s tool box and that based all around averages e.g. if Y is height and X is an indicator 0,1 for men and women respectively, than the estimates in a regression give you the average male hieght and the difference between the average male height and average female height. Since regression (and it’s generalisation) are pervasive in the applied literature then it’s quite hard to change. (I quite like quantile regression but it doesn’t give unique solutions.)

I think it’s also revolves around this theorem too (whose name I’ve forgotten) – “The most powerfull test of size alpha is the Likelihood Ratio Test” and most estimators coming out of maximising Likelihoods are means or functions of means.

• Thanks for that. Makes sense.

5. Hi Dr. Nic,

To the best of my knowledge, bootstrap DOES NOT WORK for the median. There are asymptotic methods that involve estimating the density and there are non-parametric methods based on the median – basically inverting the sign test which leads to intevals whose endpoints are qunaitles (as I am sure you know).

In business, the mean is more “mean”ingful than the median! Would any business person really care about median monthly profit? Mean monthly profit means much more, because it you multiply it by 12 you get the total profit. Ditto for sport. The Aussie cricket team members might have a higher batting median than another team, but this would say little about the probability of winning. Batting averages on the other hand predict long-run team score.

Paul Swank. No, the median does not have a smaller sampling error than the mean. it depends on the udnerlying distribution. For normal data you are correct. The median is abou 67% efficient. For heavy tailed distributions, the median gets ebtter. But for laplace errors, it is more efficient than eny other estimator – it is the MLE!

• “To the best of my knowledge, bootstrap DOES NOT WORK for the median. ” Is’nt that an over-statement? See for example Biometrika (2001) 88 (2): 519-534. (Brown, Hall and Young). “Even in one dimension the sample median exhibits very poor performance when used in conjunction with the bootstrap. For example, both the percentile‐t bootstrap and the calibrated percentile method fail to give second‐order accuracy when applied to the median.” Much depends on the distribution.

The bootstrap is certainly unsatisfactory for extreme quantiles.

I agree the business wouldn’t, but because the business actually cares about the total itself. Calculting a mean provides a cosmetic benefit over the total. The mean (“average monthly”) reveals nothing new. The mean is merely a more memorable/familiar unit.
Similarly, measuring in a metric unit rather than a less familiar unit doesn’t change the ‘true” quantity.

6. Hi Nic
The mean has the property of being the best linear unbiased predictor, as long as the distribution of the data is reasonably well behaved. A lot of stats analysis, as opposed to description, is geared towards being able to predict things, so the mean is therefore preferable. The distribution issue is not usually a problem when using the mean, as long as data is not really sparse, because of the Central Limit Theorem.
I have never personally found a use for the mode. However, I guess it must have a place in descriptive stats applied in some areas.

• Thanks for that. I appreciate getting the balance of the argument.
I’ve always wondered about the point of the mode.

• Hi Nic,

I thought the mode was only useful for ordinal data but otherwise was rather pointless as a measure of ‘central tendency’.

7. It is instructive that most of the comments relate to the purpose to which the estimate of mean/median will be used. Real statistical applications rarely have as their purpose the estimation of such a parameter, it is merely a step in the process. Unfortunately much teaching – high and low level – ignores this context.

This issue extends beyond the mean/median debate. For example, skewed data from many applications such as geochemistry is best approximated by the log normal. However it may not make sense to consider means on a log scale since they lose the additivity that may be fundamental to the application.

Orthogonal to this is the mathematical context. Medians are ugly mathematically and complex analyses based on them can be even more ugly (think of Tukey’s median polish, essentially iterative proportional scaling with medians – easy to describe, messy to do, impossible to really understand). While we should not over constrain any analysis to match our mathematical limitations, we should not ignore the enormous benefit we can get from applying mathematical understanding.

• Thanks for that really helpful comment. As my interest is very much at the beginner/consumer level of statistical education it is great to have people provide a more advanced perspective.

8. Technically and as a Statistician, I prefer the Median to the Mean. The median is robust and resistance to outliers in the dataset unlike the mean which is highly affected/influence by extreme observations in the dataset.
Thanks

9. Perhaps it’s worth noting that in survival analysis, although things like “mean survival” are defined, they are very rarely used – the median, and other quantiles, reign supreme. Censoring of survival times means that the calculation of a mean involves extrapolation; the highly skew nature of survival times removes much of the interpretable meaning of the mean.

10. Please see MDST242 “Statistics in Society” – an Open University course. That uses median + quartiles + deciles + extremes. You can select from these and get measures of dispersion, skewness, kurtosis etc.. Confidence interval on the median is also a cinch.

11. Hi Nic,

In slight defense of the mean (ie, when you’re forced to present only 1 number – such as on a balance sheet)… generally it is better to present the mean in respect of liabilities rather than the median. Typically liabilities are skewed (if things go bad, they go very bad), so an estimate that responds to outliers is actually handy here. 🙂

For instance, when estimating the ‘long-tail liability losses’ for insurance companies the total losses are very skewed. Although the median is a better estimate of the centre of the distribution, the mean is a better choice for presentation in a balance sheet as it is much more conservative (and as a slight bonus, everyone ‘understands’ a mean so they tend to request it). In practice, you need to ensure that much more capital is available than the mean – assuming you want to stay in business.

It comes back, as several respondents have mentioned, to the purpose of the statistic. Generally I want at least three numbers: mean, standard deviation, and skewness.

Incidentally, another reason the mean may be preferred computationally over the median is that you can calculate it while only keeping track of three numbers: SUM(X) [ie, the sum of x to n-1], N, x. The median is harder to restrict in this way.

12. I think no-one has yet mentioned the question of multidimensional data, and defining a “location” for these. Suppose data consist of locations of lightning strikes, expressed as 2 dimensional coordinates. Then “nice” properties for a location might include that the selected point should not depend on the coordinate system used (invariance to rotation). In more general cases this might be extended to invariance to affine transformations. A co-ordinatewise median point does not satify these. Of course there is a multivariate version of the median that can cope with the rotation invariance, and there are others, some based on counting shells.But still, these seem to fail the requirement, which might be either an extension or a reversal of the aove, in that one might not want the per-coordinate location of the multidimesional location to depend on what other dimensions have been measured or are being considered.

Thus, in a multivariate setting, the mean is invariant to linear transformations, while the co-ordinatewise median is not. Neither the mean or co-ordinatewise median are affected by dropping variables that are not of interest. Still stating the co-ordinatewise medians may still be prefereable to the mean, or not, depending on what purpose you think the summary might be put. For example, suppose data consist of daily values of (weight of) sediment transported past a point in a river … then the mean daily value is immediately informative about the total weight transported in, say, a month, while the median daily value is not.

Other comments have emphasized describing more of the distribution than just the location. This applies even more to multivariate data.

13. In your post you said:

“The mean is the one to use with symmetrically distributed data; otherwise, use the median.” Hmm – but if the data is symmetric, surely the mean = the median?

The point here should be that: while “the mean = the median”, the properties of the sample mean and the sample median are different (and of couse their values are usually different). So your question really splits into two parts… what thing should you be trying to estimate as a summary of the data, and then how should you estimate that quantity.

14. I don’t see why you want to introduce “resiliance” in place of “robustness”. The latter term is standard statistical terminology.

As pointed out previously, calculating any statistic is either a step in a wider inference or is a quick summary. You are right that given any collection of numbers most people’s instinct is to add them up! The mean may not be a “meaningful” summary. Apart from skewed data, multimodal data may deliver a mean that has no relevance to the measured variable..The mean salary in a company is bloated by the CEO’s plundering. Regarding modes, tailors seem to plan for the modal number of legs, not the mean.

Someone mentioned lognormal distributions, and I work with these a lot. When any distribution can be transformed to symmetry, the median and the mode of the transformed values become similar, so the log-mean or geomean of a skewed distribution is reasonably estimated by the median.

I too am a fan of the median, but as one of Tukey’s five-value summary, hence the median rather than the mean is usually shown on boxplots. Means are sometimes added as symbols.

Someone else mentioned SD. Mean and SD are sufficient statistics to describe a normal distribution. If you *assume* your sample comes from a normal distribution, Mean and SD are sensible summary values.

Sorting is of course a well-studied computing problem (cf Knuth). It’s very odd to have people writing as if we were still in the 1950s when computer power was scarce and expensive. For any non-trivial stats calculations, get a proper stats package. Then you type in the values (or scan, or download), and get the whole slew of summary statistics (mean, median, mode, quartiles, min max, SD, trimmed mean, …) to examine for data cleaning before deciding which to report as *information*.

Allan

15. Several respondents have mentioned the difficulty of working with the median for purposes of statistical theory. Classically, expectations have been central; they are theoretical means. The theory is a good approximation, in modest sized samples, only if data is on the scale that is not badly asymmetric. Where a monotonic transformation is used, often log(), that leads to a roughly symmetric distribution, does one need to apply a correction to correct for the bias induced on the original scale? Not if the medians (which are unaffected by monotone transformation) are the appropriate measures.

Modern computational abilities free us somewhat from the constraints of an expectation-tied theory. ‘Somewhat’ is the key word here. As an aside, the limitations of that theory, and the common role of recourse to empirical approaches that may involve heavy computation, become even more important when dealing with the dependence that is often present in the observational data with which most statisticians work most of the time.

16. “The median is ‘resistant’ because it is not at the mercy of outliers.”
I love it! I’ll try it out on my students tomorrow.

17. I was surprised not to see a mention of the trimmed mean given that the median is the 50% trimmed mean and the mean is the 0% trimmed mean. Most statistical packages have a trimmed mean function.
The median and the trimmed mean lead on to the general area of Robust Statistics, which has been studied extensively since the 70s. Robust Statistics have not made much of an impact in applied statistics, at least beyond the area of descriptive statistics. Why is this? Well much of applied statistics is built around the linear model and the ANOVA decomposition, which is built on the properties of the L2 norm which underlies the mean and variance.
Another chunk of applied statistics is based on statistical models and the likelihood function. Many of the MLEs least to statistics that are not robust. One way to keep both the likelihood function and robustness is to mix the statistical model with a parameter-free heavy-tailed component intended to diminish the influence of outliers on model parameters. Rohan Maheswaran studied this approach in his thesis.

18. This is a great post, Dr. Nic, and I appreciate everyone’s discussions about it.

1) In summarizing or describing a data set, it’s always a good idea to use multiple summary statistics and visualizations. I like both the mean and the median, so I like the use both. In fact, I like the 5-number summary, the mean, the variance, and a plot of the data (histogram, bar chart, scatter plot).

2) The mean often shows up as a sufficient statistic for the parameters of many distributions. Thus, it is often used to find an estimator with a lower variance using the Rao-Blackwell Theorem.

3) The mean is often the maximum likelihood estimator for the parameters of many distributions. Beyond its use as a point estimator, it also has many nice large-sample properties that can be used for inference.

4) Echoing Peter Lane’s comment, a nice thing about the mean is the ease of using it for inference; its sampling distribution can be easily found based on the Central Limit Theorem!

Eric Cai – The Chemical Statistician
http://chemicalstatistician.wordpress.com