Fall Seminar Series - November 13, 2003
University of Minnesota
School of Statistics
College of Liberal Arts

The benefits of assuming independence in classification when there are many more variables than observations

Liza Levina
Department of Statistics
University of Michigan

Thursday, November 13, 2003
4:00 PM, 115 Ford Hall
Minneapolis, East Bank Campus
Social at 3:30 PM, 300 Ford Hall

Abstract

  While general statistical intuition tells us that using all the available dependence information is better than not using it, we show just the opposite in the case when there are many more variables than observations (the "large p, small n" scenario). This phenomenon is well known in machine learning practice, and will be demonstrated on examples from texture classification and gene expression data. Analytically, we consider the issue in the classical context of discriminating between two normal populations, and prove that the "naive Bayes" classifier based on the independence assumption greatly outperforms the discriminant rule which attempts to estimate the full covariance structure. We also show how in practice shrinkage can further improve on Naive Bayes. For the special case of stationary covariance structure, we introduce a class of rules spanning the range between independence and arbitrary dependence and prove they achieve Bayes optimality for the Gaussian colored noise model.