# The Ninther — Approximating Medians

The main benefit of using the median as oppose to other average approximation aggregates like the mean, is because it is less skewed by extremely large or small values. It offers a better approximation of what is called a typical value of the data.

To estimate the median of a dataset, you must read all the elements into memory, sort them and then find the middle value; a process which is inefficient and often impractical if the dataset is very large.

Ideally, what we want is to estimate a value for the median that uses as few comparisons and arithmetic computations as possible.

I recently came across a simple and effective median estimation technique, written by Tukey from 1978 called “median of medians”.

It works as follows. Given a list of data points **S**, we partition **S **into buckets and calculate the median for each bucket. We then take those medians and calculate the median of them (in his paper, Tukey calls this value the *ninther*).

If **S** is small, as in the following demonstration, then finding the median is simple, but as the number of elements in **S **gets very large, the technique becomes more powerful.

Suppose **S**={4,7,1,3,20,8,3,9,2}. If we split **S **into 3 equal(ish) sets. So we might end up with the following sets:

**Sa**={1,3,8}; median = 3**Sb**={4,7,20}; median = 7**Sc**={2,3,9}; median = 3

The median of {**Sa**, **Sb**, **Sc**} is 3. The actual median of **S **is 4.

Note that if the data were already ordered, then the median of the medians is also the median. In practise, this is generally not the case.

There you have it, an efficient, powerful, robust and above all simple technique to estimate the median of your dataset.

For analysis of performance and robustness, check out Tukey’s original paper “*The Ninther, a Technique for Low-Effort Robust (Resistant) Location in Large Samples*”