Spring Seminar Series - February 23, 2005
University of Minnesota
School of Statistics
College of Liberal Arts

Principal Components Analysis for High Dimensional Data

Debashis Paul
Department of Statistics
Stanford University

Wednesday, February 23, 2005
3:30 PM, B10 Ford Hall
Minneapolis, East Bank Campus
Social at 3:00 PM, 300 Ford Hall

Abstract

Suppose we have i.i.d. observations from a multivariate Gaussian distribution with mean $\mu$ and covariance matrix $\Sigma$. We consider the problem of estimating the leading eigenvectors of $\Sigma$ when the dimension $p$ of the observation vectors increases with the sample size $n$. We work under the setup where the covariance matrix is a finite rank perturbation of identity. We show that even though the ordinary principal components analysis may fail to yield a consistent estimator of the eigenvectors, if the data can be sparsely represented in some known basis, then a scheme based on first selecting a set of significant coordinates and then applying PCA to the submatrix of sample covariance matrix corresponding to the selected coordinates, gives better estimates. Under suitable sparsity restrictions, we show that the risk of the proposed estimator has the optimal rate of convergence when measured in a squared-error type loss. We also state some results about the behavior of the eigenvalues and eigenvectors of sample covariance matrix when $p/n$ converges to a positive constant.