Spring Seminar Series - May 1, 2003
University of Minnesota
School of Statistics
College of Liberal Arts
BUEHLER-MARTIN DISTINGUISHED LECTURER SERIES

Testing for Equality of Distributions in Very High Dimensions

Peter Hall
Australian National University

Thursday, May 1, 2003
4:00 PM, 115 Ford Hall
Minneapolis, East Bank Campus
Social at 3:30 PM, 300 Ford Hall

Abstract

  Suppose we are given two datasets from respective populations of random functions, and wish to test the null hypotheses that the distributions of the populations are identical, against the complementary alternative. Motivated by applications such as these, as well as by more conventional multivariate settings, we suggest a general permutation test of the hypotheses that two sampled distributions are identical. The test is based on a measure of distance between data; the distance function should be symmetric but need not satisfy the triangle inequality. The test is constructed by counting the number of data from one sample that are among the j closest, in the pooled dataset and in the sense of the distance measure, to a given datum in the other sample; and accumulating, over all possible values of the latter datum and of j, the departures of these counts from their expected values under the null hypothesis. A permutation argument enables a formal test to be constructed with concisely known significance level, conditional on the set of all pairwise distances. The test is able to distinguish distribution differences that are of order only n^{-1/2} from the null hypothesis, where n denotes either of the two sample sizes.