Student Seminar Series - May 4, 2006
University of Minnesota
School of Statistics
College of Liberal Arts
Variance
Estimation in Complex Survey Data: A Comparative Review and a Bayesian
Proposal
Jeremy W. Strief
Thursday, May 4, 2006
9:00 AM, 170
Ford Hall
Minneapolis, East Bank Campus
Refreshments at 8:30 AM
300 Ford Hall
Abstract
In
stratified cluster surveys like the U.S. Census Bureau Long Form
Report, the probabilities of each person being included in the sample
are often unequal. Design-based statisticians would argue that these
inclusion probabilities must be considered when estimating population
characteristics, such as means, totals, or regression coefficients.
Performing inference on the regression coefficients is especially
challenging to a design-based statistician, since there exist no
closed-form standard errors of the estimated coefficients.
Approximations to the standard errors are commonly calculated with
tools like Taylor series linearization, the bootstrap, the jackknife,
and balanced repeated replication. The model-based perspective
considers the finite population as being generated from some
statistical model, irrespective of the sampling design. So the
inclusion probabilities have no effect on the estimate of any
population quantity. In the case of a simple random sample, model-based
standard errors of the regression coefficients may be calculated from
standard linear model theory. Mixed-effects models may be applied to
more complex survey designs.
Our discussion is inspired by the Minnesota Population Center (MPC), an
inter-departmental demography research group at the University of
Minnesota. The MPC's databases incorporate subsets of the Census
Bureau's internal data, but privacy concerns prevent the Census Bureau
from releasing stratification information to the MPC. This situation
makes the standard design-based and model-based methods difficult to
use. The Bayesian perspective on survey sampling, however, can
incorporate uncertain stratum membership through use of prior
distributions. In
particular, we propose an extension of the Polya posterior to the case
of stratified, cluster surveys. The Polya posterior will simulate
complete copies of the population; by calculating the regression
coefficients---or any desired quantity---for each copy, one will obtain
a sampling distribution from which variance may be estimated.