robust([Model] [,print:F or silent:T, trunc:c, maxiter:m, epsilon:eps, \ fstats:T, pvals:T, marginal:T]), Model a CHARACTER scalar, c and eps positive REAL scalars, m a positive integer. |

robust(Model) computes a robust linear fit of the model specified in the quoted string or CHARACTER variable Model. An approximate analysis of variance (ANOVA) table is printed, as well as a robust estimate of sigma. In the case of normal errors with possible outliers arising from contamination, the square of the estimated sigma is approximately unbiased for the variance of the uncontaminated errors. See topic 'models' for information on specifying Model. When Model is omitted (robust() or robust(,...), the most recent GLM model is assumed, or, if there have not been previous GLM commands, the model specified in variable STRMODEL, if any. Side effect variables created by robust() are DF, SS, HII, RESIDUALS, and WTDRESIDUALS. WTDRESIDUALS is not really weighted residuals; instead it is set to the vector of modified residuals sigmahat*psi(residuals/sigmahat). See below for details. Inference based on the ANOVA table should be used with caution. Especially when there are cases with high leverage (large value of HII), or when the ratio of model degrees of freedom to error degrees of freedom is too large, the approximation can be misleading. Interpret with similar caution the results of secoefs() and contrast() after robust(). Keyword phrase Default Meaning trunc:c .75 Positive REAL c is used as a "truncation point" in the fitting algorithm. See discussion below. The larger the value, the less "robustifying." maxiter:m 50 Positive integer m is the maximum number of iterations that will be allowed in fitting epsilon:eps 1e-6 Small positive REAL specifying relative error in objective function required to end iteration See topic 'glm_keys' for information on keyword phrases 'print:F', 'silent:T', 'fstats:T', 'pvals:T' and 'marginal:T'. Any P values printed by fstats:T and pvals:T are only approximate. Keyword phrase 'coefs:F' is not legal. The default values for fstats and pvals can be changed by setoptions(fstats:T) and/or setoptions(pvals:T). See setoptions(), options. The algorithm used is based on Sec. 7.8 of "Robust Statistics" (Wiley 1981) and Sec. 14 of "Robust Statistical Procedures" (SIAM 1977), both by Peter J. Huber. Coefficients and scale sigma are estimated by minimizing a certain function of sigma and the residuals r = y - fit (Huber 1981, eq. 7.7.9 and 7.7.14). Effectively the algorithm deemphasizes cases whose apparent residual r[i] satisfies abs(r[i]) > c * sigma, where c is a constant "truncation point". The default value of c is .75 but it can be set by keyword 'trunc'. After running robust(), WTDRESIDUALS contains the modified residuals rmod = sigmahat*psi(rhat/sigmahat), where sigmahat is the estimated scale, rhat is the vector of estimated residuals, and psi(z) = z when abs(z) < c, psi(z) = c for z >=c, psi(z) = -c, z <= -c. Following a suggestion of Huber, the ANOVA table is computed from pseudo-data obtained as fithat + K*rmod, where fithat = y - rhat is the vector of estimated predicted values and K is a constant (see p. 40 of Huber 1977). K = (1 + (k/n)(1-mu)/mu)/mu, where n is the sample size, k is the number of model degrees of freedom, and mu = m/n where m is the number of non-truncated rhat[i], that is, which satisfy abs(rhat[i]/sigmathat) < c. The robust estimate of sigma is sqrt(mse/beta)/K, where mse is the error mean square from the ANOVA table and beta is a constant computable by beta = cumchi(c^2,3) + 2*c^2*(1 - cumnor(c)) = E[psi(z^2)] for N(0,1) z. If you want ANOVA results with a different order of terms, you do not need to redo robust(); you can use anova() with the pseudo-data in the preceding paragraph as dependent variable. You can determine m needed to compute K as the number of cases for which WTDRESIDUALS and RESIDUALS are equal except for rounding error.

Gary Oehlert 2003-01-15