Spring Seminar Series  March 22, 2007

University of Minnesota
School of Statistics
College of Liberal Arts

Bayesian Mixed Membership Models for Soft Clustering

Stephen E. Fienberg
Department of Statistics
Carnegie Mellon University

Thursday, March 22, 2007
3:30 PM, 115 Ford Hall
Minneapolis, East Bank Campus
Social at 3:00 PM, 300 Ford Hall

Abstract

In many problem settings involving clustering and classification, units can conceivably belong to multiple groups.  Bayesian mixed 
membership models provide a natural way to address such "soft" clustering and classification problems.  These models typically
rely on four levels of assumptions: population, subject, latent variable, and sampling scheme.  Population level assumptions describe
a general structure of the population that is common to all subjects. Subject level assumptions specify the distribution of observable
responses given the population structure and individual membership scores. Membership scores are usually unknown and hence can
also be viewed as latent variables which can be treated as fixed or random in the model. Finally, the last level of assumptions specifies
the number of distinct observed characteristics (attributes) and the number of replications for each characteristic.   We describe four
applications of mixed membership modeling: (i) to disability indicators from the National Long Term Care Survey,  (ii) abstracts and
bibliographies of research reports in The Proceedings of the National Academy of Sciences, (iii) genetic SAGE libraries, and
(iv) protein-protein interactions in yeast (this involves extensions that incorporate stochastic block-modeling).  Our methods include
the computation of full posterior distributions as well as various forms of variational approximations.  In the examples, we also discuss
issues of model assessment and specification.