Statistics 8931, Fall 2011

Dimension Reduction

Course Instructor

            R. D. Cook, 397 Ford (5-7732)

            email:  dennis@stat.umn.edu

            Office Hours: 2:30-3:30 MW, Ford 397, and by appointment.

Lectures

            1:25—2:15 Ford 127.  An alternate time for some Friday lectures may be scheduled.

Text

            None, but the following book may be useful for reference: "Regression Graphics: Ideas

for Studying Regressions through Graphics"  by R. D. Cook.  The web page for the text is at http://www.stat.umn.edu/RegGraph/.  This book is on reserve in the math library and is available from the statistics library as well.   

 

Course Web Page:  http://www.stat.umn.edu/~dennis/Stat8931F11/.

Homework

            Homework is a required part of the course.  There will be homework assignments throughout the semester, portions of which will be graded.

Grading

            A grade of "B" requires satisfactory completion of the homework problems and reading assignments, along with regular attendance and participation in classroom discussion.   A grade of "A" requires completion of a class project involving detailed study of some aspect of the course material.  Projects, which must be approved in advance, should be underway by mid November.  Project suggestions will be given in class from time to time.  You should expect to spend about ¼ of your time on the project.

Exam

None planned at present.  Some project presentations might be scheduled during finals week.

Incompletes

            Grades of "I" will be given only in extraordinary circumstances, and then only by written agreement between the instructor and the student. 

Computing

            Matlab will be the primary computing platform for this course.  Some methods are available in R via Weisberg's dr package, but many of the novel methods have been written only in Matlab.  The Matlab code and documentation are available at http://liliana.forzani.googlepages.com/ldr-package.

Coverage

 

            This course will consider both traditional and modern methods of dimension reduction, and attempt to construct a common framework that may suggest new theory and methods.  Traditional methods to be discussed include principal components and partial least squares. More modern methods include several methods that fall under the heading of "sufficient dimension reduction".  Emphasis will be placed on contrasting historical and modern foundations.  There will likely be more questions than answers.

 

Reduction of the dimensionality of the predictor vector is the primary goal in regressions with a univariate response.  There are several reasons why dimension reduction may be useful in this context, including the possibilities of mitigating the effects of collinearity, facilitating model specification by allowing visualization of the data in low dimensions, providing a relatively small set of predictors on which to base prediction or interpretation, and dealing usefully with large-p-small-n problem.  When the response is multivariate, reduction of the response vector and the predictor vector may be considered separately or simultaneously.

 

Most of classical Twentieth-century Fisherian statistics focused on problems where the number of unknowns "p" was small and, in particular, much smaller than the number of observations or experimental units.  However, with advances in computing and the emergence of applications with relatively large p, the practical environment has changed dramatically over the past 20 years.  The statistical community has not yet decided how to deal effectively with related issues.  This course is intended in part to be a contribution to the discussion.

Assignment 1 

            Reading: C.J.C. Burges, "Dimension Reduction: A Guided Tour", Foundations and Trends in Machine Learning, Vol. 2, No. 4, 275-365, 2010.  Writing: As discussed in class and in Problem 1.1 of the distributed notes.   To avoid overlap, your choice must be approved by the instructor prior to writing (an email will be sufficient).    Due:  Friday, Sept. 23.

 

 

DISABILITY ACCESS STATEMENT: This publication/material is available in alternative formats upon request.  Please contact Dana, School of Statistics, 313 Ford Hall, 625-8046.