Spring Seminar Series March 22, 2007
University of Minnesota
School of Statistics
College of Liberal Arts
Bayesian Mixed Membership Models for Soft Clustering
Stephen
E. Fienberg
Department of Statistics
Carnegie Mellon University
Thursday,
March 22, 2007
3:30 PM, 115
Ford Hall
Minneapolis, East Bank Campus
Social at 3:00 PM, 300 Ford Hall
Abstract
In many problem settings involving clustering and classification, units can conceivably belong to multiple groups. Bayesian mixed
membership models provide a natural way to address such "soft" clustering and classification problems. These models typically
rely on four levels of assumptions: population, subject, latent variable, and sampling scheme. Population level assumptions describe
a general structure of the population that is common to all subjects. Subject level assumptions specify the distribution of observable
responses given the population structure and individual membership scores. Membership scores are usually unknown and hence can
also be viewed as latent variables which can be treated as fixed or random in the model. Finally, the last level of assumptions specifies
the number of distinct observed characteristics (attributes) and the number of replications for each characteristic. We describe four
applications of mixed membership modeling: (i) to disability indicators from the National Long Term Care Survey, (ii) abstracts and
bibliographies of research reports in The Proceedings of the National Academy of Sciences, (iii) genetic SAGE libraries, and
(iv) protein-protein interactions in yeast (this involves extensions that incorporate stochastic block-modeling). Our methods include
the computation of full posterior distributions as well as various forms of variational approximations. In the examples, we also discuss
issues of model assessment and specification.