Student Seminar Series - December 14, 2006
University of Minnesota
School of Statistics
College of Liberal Arts



Model Combining and Its Applications: Longitudinal and

Semiparametric Models


Song Liu


Thursday, December 14, 2006
  9:00 AM, 300 Ford Hall
Minneapolis, East Bank Campus

Refreshments at 8:30 AM
300 Ford Hall


Abstract


In this dissertation, I present my work on model selection diagnostics and model combining in thecontext of longitudinal data analysis and semiparametric regression.

Of course, model selection delivers both simple interpretability and accuracte information if it is reliable.  However, if there is large amount of uncertainty involved in model selection, insisting on simple interpretability is not appropriate because it usually leads to overly optimistic or misleading results. In such a situation, a more complicated approach such as model averaging/combining should be considered.  Although model selection uncertainty has been well recognized, proper diagnostics of model selection, a key for choosing between model selection and model combining, has not been seriously studied. In this work we propose measures that characterize uncertainty in model selection from different perspectives.  BIS (bootstrap instability in selection) is a measure about selection instability using bootstrap resampling method, and PIS (perturbation instability in selection) measures the same quantity but using a data perturbation technique. We also notice that some quantities of interest are sensitive to the model selection process, others are not. BIE and PIE (bootstrap/perturbation instability in estimation) are sensible measures for the estimation instability due to selection. And SDV S and SDES (variable selection standard deviation and estimation standard deviation) are the corresponding measures for uncertainty in variable selection and in estimation with respect to a certain weighting distribution on the candidate models. In general, a high value is an indication of severe uncertainty and in such a situation model combining can be helpful to obtain more reliable results.

We propose new methods for combining models. One is suitable for longitudinal data where observations within the same subject are correlated, which requires us to take care of the covariance structure when combining models. The other is appropriate for semiparametric models where the parametric part of the model is of much interest. We investigate the properties of our combining strategy both empirically and theoretically. Our theorems guarantee that the combined estimator achieves the optimal rate of convergence without knowing which individual model works the best. The empirical studies confirm that the combined estimator has a better prediction performce than the single selected model when model selection uncertainty is severe.  Based our results, we suggest that model selection diagnostics should be done when model selection is involved and model combining should be applied only when it is necessary.