Consistent Model Selection and Data-driven Smooth Tests for Clustered Data
An important problem
facing marginal regression analysis of clustered data, as in the method
of
generalized estimating equations, is how to choose a marginal
regression model from
a number of candidate models. Although several methods have been
suggested in
the literature for practical use, theoretical investigation of the
large sample
theory is still lacking. We propose a new BIC-type model selection
criterion in
this paper, and prove that with probability approaching one it selects
the most
parsimonious correct model. The model selection criterion uses a
recently
proposed quadratic inference function and does not need to specify the
full
likelihood or quasilikelihood. This model selection procedure also
motivates a
data-driven Neyman-type smooth test for checking the goodness-of-fit of
a
conjectured model. Compared to the classical tests which require the
specification of an alternative, such as the GEE Z-test, the new test
selects a
data-driven alternative based on model selection and leads to increased
power
performance in general. Numerical simulations and data analysis will be
discussed to illustrate the application. (Joint work with Annie
Qu)