Fall Seminar Series  November 2, 2006
University of Minnesota
School of Statistics
College of Liberal Arts 

Model Evaluation and Model Selection Based on Prediction Error for Various Outcomes

Tianxi Cai
Department of Biostatistics
Harvard University

Thursday, November 2, 2006
3:30 PM, 115 Ford Hall
Minneapolis, East Bank Campus
Social at 3:00 PM, 300 Ford Hall

 

Abstract

The construction of a reliable, practically useful prediction rule for future responses is heavily dependent on the ``adequacy" of the fitted regression model. In this research, we consider the absolute prediction error, the expected value of the absolute difference between the future and predicted responses, as the model evaluation criterion and as a basis for evaluating the accuracy of a given prediction rule. This prediction error has the same scale as the observed outcome and thus has better interpretation than the average squared error and the R-square. When the outcome is binary, the absolute prediction error is is equivalent to the mis-classification error. When the outcome is censored event time, we propose classification rules for predicting the t-year survival status and compare the classification accuracy of prediction rules constructed based on various working models. We show that the distributions of the apparent error type estimators and their cross-validation counterparts are approximately normal even under a misspecified fitted model. When the prediction rule is ``unsmooth", the variance of the above normal distribution can be estimated well via a perturbation-resampling method. We also show how to approximate the distribution of the difference of the estimated prediction errors from two competing models. Through real data examples and simulation studies, we demonstrate that the resulting interval estimates for prediction errors provide much more information about model adequacy than the point estimates alone.