Fall Seminar Series November 2, 2006
University of Minnesota
School of Statistics
College of Liberal Arts
Model Evaluation and Model
Selection Based on Prediction Error for Various Outcomes
Tianxi
Cai
Department of Biostatistics
Harvard University
Thursday, November 2, 2006
3:30 PM, 115
Ford Hall
Minneapolis, East Bank Campus
Social at 3:00 PM, 300 Ford Hall
Abstract
The
construction of a reliable, practically useful prediction rule for
future responses is heavily dependent on the ``adequacy" of the fitted
regression model. In this research, we consider the absolute prediction
error, the expected value of the absolute difference between the future
and predicted responses, as the model evaluation criterion and as a
basis for evaluating the accuracy of a given prediction rule. This
prediction error has the same scale as the observed outcome and thus
has better interpretation than the average squared error and the
R-square. When the outcome is binary, the absolute prediction error is
is equivalent to the mis-classification error. When the outcome is
censored event time, we propose classification rules for predicting the
t-year survival status and compare the classification accuracy of
prediction rules constructed based on various working models. We show
that the distributions of the apparent error type estimators and their
cross-validation counterparts are approximately normal even under a
misspecified fitted model. When the prediction rule is ``unsmooth", the
variance of the above normal distribution can be estimated well via a
perturbation-resampling method. We also show how to approximate the
distribution of the difference of the estimated prediction errors from
two competing models. Through real data examples and simulation
studies, we demonstrate that the resulting interval estimates for
prediction errors provide much more information about model adequacy
than the point estimates alone.