Statistics 5601 (Geyer, Spring 2006) Examples: Efficiency and Breakdown Point

Contents

Breakdown Point

The breakdown point of an estimator (or a related procedure like a test or a confidence interval) is the fraction of the data that can be complete junk without destroying the estimator. More precisely, it is the fraction of data that can be dragged to infinity with the estimator remaining bounded.

Breakdown point is one measure of robustness.

High breakdown point means highly robust (which is good).

For the one-sample location estimators we have the following

estimator breakdown point
sample mean 0%
sample median of Walsh averages 29.3%
sample median 50%

Efficiency

The asymptotic relative efficiency (ARE) of two estimators is the ratio of sample sizes needed to get equal accuracy. It is inversely proportional to the ratio of asymptotic variances. Associated tests and confidence intervals have the same ARE as the estimators.

Efficiency depends on the the true unknown distribution of the data. Thus we never know in practice what the efficiency is.

One interesting family of distributions to consider consists of the Student t distributions and their limit as the degrees of freedom go to infinity, which is the normal distribution.

The table below gives the ARE for various estimators and various true population distributions against the maximum likelihood estimator, which is the most efficient asymptotically.

Normal t(30) t(20) t(10) t(5) t(4) t(3) t(2) t(1)
sample mean 1.000 0.993 0.986 0.945 0.800 0.700 0.500 0.000 0.000
sample median of Walsh averages 0.955 0.975 0.983 0.996 0.993 0.981 0.950 0.867 0.608
sample median 0.637 0.666 0.680 0.716 0.769 0.788 0.811 0.833 0.811

Another interesting comparison is to choose population distributions for which the various estimators are fully efficient.

We won't give the exact formulas (the curious can follow the links above) rather just note that the normal distribution has very light tails [proportional to exp(− x2 / 2)] and the other three have the same moderately light tails [proportional to exp(- |x|)]. So the differences among the other three are not tail behavior but the precise details of their densities.

Normal Laplace Logistic Hyperbolic Secant
sample mean 1.000 0.500 0.912 0.811
sample median of Walsh averages 0.955 0.750 1.000 0.986
sample median 0.637 1.000 0.750 0.811

The Rweb below graphs the densities of the distributions in the table above.