Student Seminar Series - November 30, 2006
University of Minnesota
School of Statistics
College of Liberal Arts



Multi-block Relationships in High Dimensions


Despina Stefan


Thursday, November 30, 2006
10:00 AM, 300 Ford Hall
Minneapolis, East Bank Campus

Refreshments at 9:30 AM
300 Ford Hall


Abstract


Very large data sets, often consisting of a large number of variables and a relatively small number of observations, have emerged in numerous contexts.  When a 
dependent variable is present, an underlying dimensionality needs to be discovered in the data for prediction purposes.  Multivariate data analysis techniques
applicable to data consisting of two groups of variables have been developed and have been encompassed under a general method called Continuum Regression
(CR).
 
For some large data sets, additional grouping of the variables may be inherently present. Canonical correlation (CC) analysis has been extended to settings with
three or more groups of variables, and a multi-block partial least squares (MB PLS) algorithm has also been developed. We have now extended the continuum
regression idea for the setting in which data are composed of more than two groups of variables and obtained new methods, such as Continuum Regression
Canonical Correlation (CR CC) or Continuum Regression Partial Least Squares (CR PLS).
 
We also address the important problem of validation of such models showing that, under certain circumstances, our proposed models perform better (have 
a better predictive power when it comes to new data) than the existing ones (eg., than the MB PLS or CC). Also, it is shown that CR CC and CR PLS bring
great improvement over ordinary least squares (OLS) in settings where OLS can be used.