Student Seminar Series - September 15, 2005
University of Minnesota
School of Statistics
College of Liberal Arts

Multi-block Relationships in High Dimensions


Despina Stefan


Thursday, September 15, 2005
10:00 AM, 116 Folwell Hall
Minneapolis, East Bank Campus

Refreshments at 9:30 AM
300 Ford Hall


Abstract

Very large data sets have emerged in numerous contexts, often consisting of a large number of variables and a relatively small number of observations. When a dependent variable is present, an underlying dimensionality needs to be discovered in the data for prediction purposes. Multivariate data analysis techniques applicable to data consisting of two groups of variables have been developed and some have been encompassed under a general method called Continuum Regression (CR). For some large data sets, additional grouping of the variables may be inherently present. Thus, methods as the Canonical Correlation Analysis and Partial Least Squares regression have been extended to settings with three or more groups of variables. We have now extended the continuum regression idea for the setting in which data are composed of more than two groups of variables and obtained a new method, Continuum Regression Partial Least Squares (CR PLS). This method outperforms the multi-block PLS in the sense that, when finding the latent variables for prediction, it also takes into consideration the relationships among the blocks of predictors. We also address the problem of validation of such models.