This page still under construction. Don't look yet. Or you can look, but be warned it's unfinished.
The breakdown point of an estimator (or a related procedure like
a test or a confidence interval) is the fraction of the data that can
be complete junk without destroying the estimator. More precisely, it is the
fraction of data that can be dragged to infinity
with the estimator
remaining bounded.
Breakdown point is one measure of robustness.
High breakdown point means highly robust (which is good).
For the one-sample location estimators we have the following
estimator | breakdown point |
---|---|
sample mean | 0% |
sample median of Walsh averages | 29.3% |
sample median | 50% |
The asymptotic relative efficiency (ARE) of two estimators is the ratio of sample sizes needed to get equal accuracy. It is inversely proportional to the ratio of asymptotic variances. Associated tests and confidence intervals have the same ARE as the estimators.
Efficiency depends on the the true unknown distribution of the data. Thus we never know in practice what the efficiency is.
One interesting family of distributions to consider consists of the Student t distributions and their limit as the degrees of freedom go to infinity, which is the normal distribution.
The table below gives the ARE for various estimators and various true population distributions against the maximum likelihood estimator, which is the most efficient asymptotically.
Normal | t(30) | t(20) | t(10) | t(5) | t(4) | t(3) | t(2) | t(1) | |
---|---|---|---|---|---|---|---|---|---|
sample mean | 1.000 | 0.993 | 0.986 | 0.945 | 0.800 | 0.700 | 0.500 | 0.000 | 0.000 |
sample median of Walsh averages | 0.955 | 0.975 | 0.983 | 0.996 | 0.993 | 0.981 | 0.950 | 0.867 | 0.608 |
sample median | 0.637 | 0.666 | 0.680 | 0.716 | 0.769 | 0.788 | 0.811 | 0.833 | 0.811 |
Another interesting comparison is to choose population distributions for which the various estimators are fully efficient.
We won't give the exact formulas (the curious can follow the links above)
rather just note that the normal distribution has very light tails
[proportional to exp(- x2 / 2)] and the other three have
the same moderately light tails
[proportional to exp(- |x|)].
So the differences among the other three are not tail behavior
but
the precise details of their densities.
Normal | Laplace | Logistic | Hyperbolic Secant | |
---|---|---|---|---|
sample mean | 1.000 | 0.500 | 0.912 | 0.811 |
sample median of Walsh averages | 0.955 | 0.750 | 1.000 | 0.986 |
sample median | 0.637 | 1.000 | 0.750 | 0.811 |
The Rweb below graphs the densities of the distributions in the table above.