Next: rotate() Up: MacAnova Help File Previous: rnorm()   Contents

robust()

Usage:
robust([Model] [,print:F or silent:T, trunc:c, maxiter:m, epsilon:eps, \
  fstats:T, pvals:T, marginal:T]), Model a CHARACTER scalar,
  c and eps positive REAL scalars, m a positive integer.



Keywords: glm, anova, regression
robust(Model) computes a robust linear fit of the model specified in the
quoted string or CHARACTER variable Model.  An approximate analysis of
variance (ANOVA) table is printed, as well as a robust estimate of
sigma.  In the case of normal errors with possible outliers arising from
contamination, the square of the estimated sigma is approximately
unbiased for the variance of the uncontaminated errors.

See topic 'models' for information on specifying Model.

When Model is omitted (robust() or robust(,...), the most recent GLM
model is assumed, or, if there have not been previous GLM commands, the
model specified in variable STRMODEL, if any.

Side effect variables created by robust() are DF, SS, HII, RESIDUALS,
and WTDRESIDUALS.  WTDRESIDUALS is not really weighted residuals;
instead it is set to the vector of modified residuals
sigmahat*psi(residuals/sigmahat).  See below for details.

Inference based on the ANOVA table should be used with caution.
Especially when there are cases with high leverage (large value of HII),
or when the ratio of model degrees of freedom to error degrees of
freedom is too large, the approximation can be misleading.  Interpret
with similar caution the results of secoefs() and contrast() after
robust().

Keyword phrase  Default  Meaning
  trunc:c         .75    Positive REAL c is used as a "truncation point"
                         in the fitting algorithm.  See discussion
                         below.  The larger the value, the less
                         "robustifying."

  maxiter:m       50     Positive integer m is the maximum number of
                         iterations that will be allowed in fitting

  epsilon:eps    1e-6    Small positive REAL specifying relative error
                         in objective function required to end iteration

See topic 'glm_keys' for information on keyword phrases 'print:F',
'silent:T', 'fstats:T', 'pvals:T' and 'marginal:T'.  Any P values
printed by fstats:T and pvals:T are only approximate.  Keyword phrase
'coefs:F' is not legal.

The default values for fstats and pvals can be changed by
setoptions(fstats:T) and/or setoptions(pvals:T).  See setoptions(),
options.

The algorithm used is based on Sec. 7.8 of "Robust Statistics" (Wiley
1981) and Sec. 14 of "Robust Statistical Procedures" (SIAM 1977), both
by Peter J. Huber.  Coefficients and scale sigma are estimated by
minimizing a certain function of sigma and the residuals r = y - fit
(Huber 1981, eq. 7.7.9 and 7.7.14).

Effectively the algorithm deemphasizes cases whose apparent residual
r[i] satisfies abs(r[i]) > c * sigma, where c is a constant "truncation
point".  The default value of c is .75 but it can be set by keyword
'trunc'.

After running robust(), WTDRESIDUALS contains the modified residuals
rmod = sigmahat*psi(rhat/sigmahat), where sigmahat is the estimated
scale, rhat is the vector of estimated residuals, and psi(z) = z when
abs(z) < c, psi(z) = c for z >=c, psi(z) = -c, z <= -c.

Following a suggestion of Huber, the ANOVA table is computed from
pseudo-data obtained as fithat + K*rmod, where fithat = y - rhat is the
vector of estimated predicted values and K is a constant (see p. 40 of
Huber 1977).  K = (1 + (k/n)(1-mu)/mu)/mu, where n is the sample size, k
is the number of model degrees of freedom, and mu = m/n where m is the
number of non-truncated rhat[i], that is, which satisfy
abs(rhat[i]/sigmathat) < c.  The robust estimate of sigma is
sqrt(mse/beta)/K, where mse is the error mean square from the ANOVA
table and beta is a constant computable by beta = cumchi(c^2,3) +
2*c^2*(1 - cumnor(c)) = E[psi(z^2)] for N(0,1) z.

If you want ANOVA results with a different order of terms, you do not
need to redo robust(); you can use anova() with the pseudo-data in the
preceding paragraph as dependent variable.  You can determine m needed
to compute K as the number of cases for which WTDRESIDUALS and RESIDUALS
are equal except for rounding error.


Gary Oehlert 2003-01-15