Student Seminar Series – December 4, 2007
University of Minnesota
School of Statistics
College
of Liberal Arts

 

Consistent Model Selection in High Dimensional but Low-Sample Data



Yongli Zhang


Tuesday, December 4, 2007
9:00 AM,
300 Ford Hall
Minneapolis, East Bank Campus

Refreshments at 8:30 AM
300 Ford Hall

Abstract



It has been argued for a long time that the variation or uncertainty in the process of model selection should be taken into account in the following statistical inferences. We suggest the covariance penalty as a measure of variation or uncertainty of a modeling procedure. The data perturbation method is proposed to estimate the covariance penalty, and theoretical and simulations show that the data perturbation estimates are consistent under some assumptions. Accordingly we define the risk of a modeling procedure. An adaptive model selection criterion is suggested based on the definition of modeling procedure risk. Approximate results of the covariance penalty are derived by studying the asymptotic behavior of the covariance penalty.

 

In this dissertation, we give a lower bound of the probability of selecting the smallest true model under a very general setting in the parametric case. The lower bound is useful in proving and investigating consistency of model selection procedures.

 

High dimension and sparse true model proposes new challenges in model selection since the excessive number of predictors make model selection procedures infeasible because of overwhelming computing workload. The dissertation suggests an adaptive model selection procedure combining the methods of LASSO and the covariance penalty. This adaptive procedure is consistent by assuming that the LASSO solution path includes the smallest true model. We show that this procedure is consistent in theory and simulations without too much requirements on computing.