Student Seminar Series - October 25, 2005
University of Minnesota
School of Statistics
College of Liberal Arts
Penalized
Regression Methods and Validation, with Particular Focus on Chemometric
Data
Jessica Kraker
Tuesday, October 25, 2005
3:15 PM, 300
Ford Hall
Minneapolis, East Bank Campus
Refreshments at 2:45 PM
300 Ford Hall
Abstract
Quantitative
Structure Activity/Property Relationship (QSAR/QSPR) models are general
methods used in the area of chemometrics to predict a biological
activity or property (such as toxicity) of a compound based on $p$
chemical
descriptors of various types.
In this context of underdetermined models
($p > n$, where $n$ is the number of observed responses) with
possibly large multicollinearities, there are two basic goals: (1)
obtaining a predictive model and (2) effect quantification.
Various regression models exist to address such goals: methods for
coefficient variance reduction (such as PCR, PLS and ridge regression)
and predictor thinning methods
(such as the LASSO and elastic net).
All these models require the selection of one or more
``meta-parameters''; the utilization of cross-validation will be
discussed.
I will present an
overview of the various applicable methods for underdetermined models,
along with a general approach to the penalized regression model (which
includes ridge
regression, LASSO and elastic net as particular cases). I also propose
a
penalized regression model utilizing the L1-norm loss function,
designed to address some of the particular aspects of common chemical
descriptors (in
particular, sparsity and/or high multicollinearity).