Student Seminar Series - December 14, 2006
University of Minnesota
School of Statistics
College of Liberal Arts
Model
Combining and Its Applications: Longitudinal and
Semiparametric Models
Song
Liu
Thursday, December 14, 2006
9:00 AM, 300
Ford Hall
Minneapolis, East Bank Campus
Refreshments
at 8:30 AM
300 Ford Hall
Abstract
In this
dissertation, I present my work on model
selection diagnostics and model combining in thecontext of
longitudinal data analysis and
semiparametric regression.
Of course,
model selection delivers both simple
interpretability and accuracte information if it is reliable. However, if there is large amount of
uncertainty involved in model selection, insisting on simple
interpretability
is not appropriate because it usually leads to overly optimistic or
misleading
results. In such a situation, a more complicated approach such as model
averaging/combining should be considered.
Although model selection uncertainty has been well recognized,
proper
diagnostics of model selection, a key for choosing between model
selection and
model combining, has not been seriously studied. In this work we
propose
measures that characterize uncertainty in model selection from
different
perspectives. BIS (bootstrap instability
in selection) is a measure about selection instability using bootstrap
resampling method, and PIS (perturbation instability in selection)
measures the
same quantity but using a data perturbation technique. We also notice
that some
quantities of interest are sensitive to the model selection
process, others are not. BIE and PIE
(bootstrap/perturbation instability in estimation) are sensible
measures for the estimation instability due to
selection. And SDV S and SDES
(variable selection standard
deviation and estimation
standard deviation) are the corresponding measures for uncertainty in
variable selection
and in estimation with respect to a certain weighting distribution on
the
candidate models. In general, a high value is an indication of severe
uncertainty and in such a situation model combining can be helpful to
obtain
more reliable results.
We
propose new methods for combining models. One is
suitable for longitudinal data where observations within the same
subject are
correlated, which requires us to take care of the covariance structure
when combining
models. The other is appropriate for semiparametric models where the
parametric
part of the model is of much interest. We investigate the properties of
our
combining strategy both empirically and theoretically. Our theorems
guarantee
that the combined estimator achieves the optimal rate of convergence
without
knowing which individual model works the best. The empirical studies
confirm
that the combined estimator has a better prediction performce than the
single
selected model when model selection uncertainty is severe.
Based our results, we suggest that model
selection diagnostics should be done when model selection is involved
and model
combining should be applied only when it is necessary.