Spring Seminar Series - February 23, 2005
University of Minnesota
School of Statistics
College of Liberal Arts
Principal Components Analysis for High Dimensional Data
Debashis Paul
Department of Statistics
Stanford University
Wednesday, February 23, 2005
3:30 PM, B10
Ford Hall
Minneapolis, East Bank Campus
Social at 3:00 PM, 300 Ford Hall
Abstract
Suppose
we have i.i.d. observations from a multivariate Gaussian
distribution with mean $\mu$ and covariance matrix $\Sigma$. We
consider
the problem of estimating the leading eigenvectors of $\Sigma$ when the
dimension $p$ of the observation vectors increases with the sample
size $n$. We work under the setup where the covariance matrix is a
finite
rank perturbation of identity. We show that even though the ordinary
principal
components analysis may fail to yield a consistent estimator of the
eigenvectors, if the data can be sparsely represented in some known
basis,
then a scheme based on first selecting a set of significant coordinates
and then applying PCA to the submatrix of sample covariance matrix
corresponding to the selected coordinates, gives better estimates.
Under
suitable sparsity restrictions, we show that the risk of the proposed
estimator has the optimal rate of convergence when measured in a
squared-error type loss. We also state some results about the behavior
of
the eigenvalues and eigenvectors of sample covariance matrix when
$p/n$ converges to a positive constant.