While (1.31) and similar formulas work well in calculating
log likelihood ratios for points
and
that are not widely
separated, they fail miserably when
and
are far apart,
as may occur when they are the estimates in the null and alternative of
a hypothesis test. Some other method is needed.
Geyer (1991) proposed the method of `reverse logistic regression' in a tech report that has never appeared in print. (The report was revised in 1994, but Theorem 4 added in the revision was wrong.) As applied to the problem of estimating normalizing constants and log likelihood ratios, the method is sound. It has been used to calculate likelihood ratio test statistics by Thompson, Lin, Olshen and Wijsman (1993).
Suppose we have samples from independent runs for m models with unnormalized
densities
, j = 1,
, m and unknown normalizing constants
. We want to estimate these normalizing constants,
up to an overall constant of proportionality, or equivalently we want to
estimate
for all j. How this is used in calculating likelihood
ratio tests and how one choses the densities
and the number m is
explained in Section 1.14.3
Suppose
, i = 1,
,
are samples from an MCMC sampler
for the distribution with unnormalized density
. Write
so the normalized densities have the form
.
Reverse logistic involves
the following curious scheme for estimating the vector
.
If we are given a point x that is one of the
but we are not told
which one, the probability that it belongs to the jth sample is
We simplify this by absorbing the factor
into the definition of the
`parameter'
, writing
so that (1.39) becomes
We propose to estimate
by maximizing the quasi-likelihood
It is not really a likelihood because in defining (1.40) we pretend
we don't know which sample x belongs to, but in (1.41) we do
use the knowledge of which sample
belongs to. Or perhaps we don't
in some sense, because the quasi-score is
For each r we sum over all the samples. Thus (1.42) does not use information about which sample a data point belongs to.
We shan't try to further explain the philosophy of this method further here,
but merely note that (1.41)
is arithmetically equivalent to the log likelihood for a multivariate logistic
regression (also called a multinomial response model). Statistically, it is
not a logistic regression, because the regression is reversed. This is clear
from (1.42) which has the usual `observed minus `expected' form
of an exponential family. This makes
the `response' and
the
`predictor' when we think of this as a logistic regression. That makes the
`response' a fixed known quantity and the `predictor' the random quantity.
So let us just say that (1.42) defines a system of estimating
equations used to estimate the vector
. It is clear from (1.40)
that the value of the quasi-likelihood or quasi-score is unchanged by adding
a constant to all of the
. Hence
is estimated only up to
an additive constant, as is
, and the
are estimated only up to
a multiplicative constant, but that is all we want.
We know from the properties of exponential families that the estimator of
will be unique up to an additive constant if
is identifiable
up to an additive constant. Geyer (1991) shows this happens if the densities
cannot be divided into nonempty two subsets such that for each data
point
for all j in one of the subsets. The quasi-score is
the average of a bounded function
. Hence it satisfies a
Lyapunov condition, hence a sufficient condition for a CLT (Theorem 2 in
Geyer, 1991) is that the Markov chain samplers be geometrically ergodic.
From the CLT we can obtain a Monte Carlo standard error for
. Geyer (1991) gives the details, which we shall
pass over here. The computer code described in Section 1.13
does this calculation.
A possible alternative to the method of reverse logistic regression for calculating log likelihood ratios for widely separated points is the method of umbrella sampling (Torrie and Valleau, 1977; Geyer and Thompson, 1995), but we shall not use it in this chapter.