Efficient Semiparametric Estimation with Surrogate Outcome
This paper considers estimating a parameter β that
defines an
estimating function U(y,x; β)$ for an outcome variable
y
and
its covariate x when the outcome is missing in some of the
observations. We assume that, in addition to the outcome and the
covariate, a surrogate outcome is available in every observation.
The efficiency of existing estimators for β depend
critically
on correctly specifying the conditional expectation of U
given the
surrogate and the covariate. When the conditional expectation is not
correctly specified, which is the most likely the case in practice,
the estimation efficiency can be severely compromised. We propose an
estimator that is robust against the choice of the conditional
expectation via the empirical likelihood.
We demonstrate that the proposed estimator achieves efficiency gain
whether the conditional score is correctly specified or not. When
the conditional score is correctly specified, the estimator reaches
the semi-parametric variance bound within the class of estimating
functions generated by U. The practical performance of the
estimator is evaluated using simulation and a dataset based on the
1996 U.S. presidential election.