Breakdown Point
The breakdown point of an estimator (or a related procedure like
a test or a confidence interval) is the fraction of the data that can
be complete junk without destroying the estimator. More precisely, it is the
fraction of data that can be dragged to infinity
with the estimator
remaining bounded.
Breakdown point is one measure of robustness.
High breakdown point means highly robust (which is good).
For the one-sample location estimators we have the following
estimator | breakdown point |
---|---|
sample mean | 0% |
sample median of Walsh averages | 29.3% |
sample median | 50% |
Efficiency
The asymptotic relative efficiency (ARE) of two estimators is the ratio of sample sizes needed to get equal accuracy. It is inversely proportional to the ratio of asymptotic variances. Associated tests and confidence intervals have the same ARE as the estimators.
Efficiency depends on the the true unknown distribution of the data. Thus we never know in practice what the efficiency is.
One interesting family of distributions to consider consists of the Student t distributions and their limit as the degrees of freedom go to infinity, which is the normal distribution.
The table below gives the ARE for various estimators and various true population distributions against the maximum likelihood estimator, which is the most efficient asymptotically.
Normal | t(30) | t(20) | t(10) | t(5) | t(4) | t(3) | t(2) | t(1) | |
---|---|---|---|---|---|---|---|---|---|
sample mean | 1.000 | 0.993 | 0.986 | 0.945 | 0.800 | 0.700 | 0.500 | 0.000 | 0.000 |
sample median of Walsh averages | 0.955 | 0.975 | 0.983 | 0.996 | 0.993 | 0.981 | 0.950 | 0.867 | 0.608 |
sample median | 0.637 | 0.666 | 0.680 | 0.716 | 0.769 | 0.788 | 0.811 | 0.833 | 0.811 |
Another interesting comparison is to choose population distributions for which the various estimators are fully efficient.
- The sample mean is fully efficient (it is the MLE) for the normal distribution.
- The sample median is fully efficient (it is the MLE) for the Laplace (also called double exponential) distribution.
- The Hodges-Lehmann estimator (sample median of Walsh averages) is fully efficient for the logistic distribution. Interestingly, it is not the MLE, although asymptotically equivalent to the MLE (almost the same for large sample sizes).
- The hyperbolic secant distribution is thrown in as an interesting example where the sample mean and median are equally efficient (neither fully efficient).
We won't give the exact formulas (the curious can follow the links above)
rather just note that the normal distribution has very light tails
[proportional to exp(− x2 / 2)] and the other three have
the same moderately light tails
[proportional to exp(- |x|)].
So the differences among the other three are not tail behavior
but
the precise details of their densities.
Normal | Laplace | Logistic | Hyperbolic Secant | |
---|---|---|---|---|
sample mean | 1.000 | 0.500 | 0.912 | 0.811 |
sample median of Walsh averages | 0.955 | 0.750 | 1.000 | 0.986 |
sample median | 0.637 | 1.000 | 0.750 | 0.811 |
The Rweb below graphs the densities of the distributions in the table above.