A New Framework for Large-Scale
Multiple Testing: Compound Decision Theory and Data-Driven Procedures
With recent advances in technology, it has become increasingly common
in practice to test a large number of hypotheses simultaneously. In this talk,
I formulate the large-scale multiple testing problem in a compound decision
theoretic framework and discuss oracle and asymptotically optimal data-driven
procedures for false discovery rate (FDR) control. My presentation is divided
into three parts: the first part develops oracle and adaptive compound decision
rules for independent tests, the second part considers large-scale multiple
testing under dependency, and the third part discusses simultaneous testing of
grouped hypotheses. A key goal is to show that conventional FDR procedures,
which are mostly p-value based, can
be substantially improved by our new data-driven procedures that adaptively
exploit the distributional, structural and external information of the sample.
I also discuss results of simulations studies, as well as microarray data
analyses from a human immunodeficiency study and a breast cancer study, for
illustration of our methods and their comparison with alternative procedures.