Student Seminar Series - September 15, 2005
University of Minnesota
School of Statistics
College of Liberal Arts
Multi-block
Relationships in High Dimensions
Despina Stefan
Thursday, September 15, 2005
10:00 AM, 116
Folwell Hall
Minneapolis, East Bank Campus
Refreshments at 9:30 AM
300 Ford Hall
Abstract
Very
large data sets have emerged in numerous contexts, often consisting of
a large number of variables and a relatively small number of
observations. When a dependent variable is present, an underlying
dimensionality needs to be discovered in the data for prediction
purposes. Multivariate data analysis techniques applicable to data
consisting of two groups of variables have been developed and some have
been encompassed under a general method called Continuum Regression
(CR). For some large data sets, additional grouping of the variables
may be inherently present. Thus, methods as the Canonical Correlation
Analysis and Partial Least Squares regression have been extended to
settings with three or more groups of variables. We have now extended
the continuum regression idea for the setting in which data are
composed of more than two groups of variables and obtained a new
method, Continuum Regression Partial Least Squares (CR PLS). This
method outperforms the multi-block PLS in the sense that, when finding
the latent variables for prediction, it also takes into consideration
the relationships among the blocks of predictors. We also address the
problem of validation of such models.