Student Seminar Series - October 25, 2005
University of Minnesota
School of Statistics
College of Liberal Arts

Penalized Regression Methods and Validation, with Particular Focus on Chemometric Data


Jessica Kraker


Tuesday, October 25, 2005
3:15 PM, 300 Ford Hall
Minneapolis, East Bank Campus

Refreshments at 2:45 PM
300 Ford Hall


Abstract

Quantitative Structure Activity/Property Relationship (QSAR/QSPR) models are general methods used in the area of chemometrics to predict a biological activity or property (such as toxicity) of a compound based on $p$ chemical descriptors of various types. In this context of underdetermined models ($p > n$, where $n$ is the number of observed responses) with possibly large multicollinearities, there are two basic goals: (1) obtaining a predictive model and (2) effect quantification. Various regression models exist to address such goals: methods for coefficient variance reduction (such as PCR, PLS and ridge regression) and predictor thinning methods (such as the LASSO and elastic net). All these models require the selection of one or more ``meta-parameters''; the utilization of cross-validation will be discussed. I will present an overview of the various applicable methods for underdetermined models, along with a general approach to the penalized regression model (which includes ridge regression, LASSO and elastic net as particular cases). I also propose a penalized regression model utilizing the L1-norm loss function, designed to address some of the particular aspects of common chemical descriptors (in particular, sparsity and/or high multicollinearity).