Spring Seminar Series - May 1, 2003
University of Minnesota
School of Statistics
College of Liberal Arts
BUEHLER-MARTIN DISTINGUISHED LECTURER SERIES
Testing for Equality of Distributions in Very High
Dimensions
Peter Hall
Australian National University
Thursday, May 1,
2003
4:00 PM, 115
Ford Hall
Minneapolis, East
Bank Campus
Social at 3:30
PM, 300
Ford Hall
Abstract
Suppose we are given two datasets from respective populations of random
functions, and wish to test the null hypotheses that the distributions of
the populations are identical, against the complementary alternative. Motivated
by applications such as these, as well as by more conventional multivariate
settings, we suggest a general permutation test of the hypotheses that two
sampled distributions are identical. The test is based on a measure of
distance between data; the distance function should be symmetric but need
not satisfy the triangle inequality. The test is constructed by counting
the number of data from one sample that are among the j closest, in the
pooled dataset and in the sense of the distance measure, to a given datum
in the other sample; and accumulating, over all possible values of the latter
datum and of j, the departures of these counts from their expected values
under the null hypothesis. A permutation argument enables a formal test
to be constructed with concisely known significance level, conditional
on the set of all pairwise distances. The test is able to distinguish distribution
differences that are of order only n^{-1/2} from the null hypothesis, where
n denotes either of the two sample sizes.