next up previous
Next: Fitting the Saturation Model Up: Likelihood Inference for Spatial Previous: Stochastic Approximation

Reverse Logistic Regression

 

While (1.31) and similar formulas work well in calculating log likelihood ratios for points tex2html_wrap_inline2411 and tex2html_wrap_inline3965 that are not widely separated, they fail miserably when tex2html_wrap_inline2411 and tex2html_wrap_inline3965 are far apart, as may occur when they are the estimates in the null and alternative of a hypothesis test. Some other method is needed.

Geyer (1991) proposed the method of `reverse logistic regression' in a tech report that has never appeared in print. (The report was revised in 1994, but Theorem 4 added in the revision was wrong.) As applied to the problem of estimating normalizing constants and log likelihood ratios, the method is sound. It has been used to calculate likelihood ratio test statistics by Thompson, Lin, Olshen and Wijsman (1993).

Suppose we have samples from independent runs for m models with unnormalized densities tex2html_wrap_inline4043 , j = 1, tex2html_wrap_inline2551 , m and unknown normalizing constants tex2html_wrap_inline4051 . We want to estimate these normalizing constants, up to an overall constant of proportionality, or equivalently we want to estimate tex2html_wrap_inline4053 for all j. How this is used in calculating likelihood ratio tests and how one choses the densities tex2html_wrap_inline4043 and the number m is explained in Section 1.14.3

Suppose tex2html_wrap_inline4061 , i = 1, tex2html_wrap_inline2551 , tex2html_wrap_inline4067 are samples from an MCMC sampler for the distribution with unnormalized density tex2html_wrap_inline4043 . Write

displaymath4071

so the normalized densities have the form tex2html_wrap_inline4073 . Reverse logistic involves the following curious scheme for estimating the vector tex2html_wrap_inline3965 . If we are given a point x that is one of the tex2html_wrap_inline4061 but we are not told which one, the probability that it belongs to the jth sample is

  equation860

We simplify this by absorbing the factor tex2html_wrap_inline4067 into the definition of the `parameter' tex2html_wrap_inline3965 , writing

displaymath4087

so that (1.39) becomes

  equation866

We propose to estimate tex2html_wrap_inline4089 by maximizing the quasi-likelihood

  equation871

It is not really a likelihood because in defining (1.40) we pretend we don't know which sample x belongs to, but in (1.41) we do use the knowledge of which sample tex2html_wrap_inline4061 belongs to. Or perhaps we don't in some sense, because the quasi-score is

  equation881

For each r we sum over all the samples. Thus (1.42) does not use information about which sample a data point belongs to.

We shan't try to further explain the philosophy of this method further here, but merely note that (1.41) is arithmetically equivalent to the log likelihood for a multivariate logistic regression (also called a multinomial response model). Statistically, it is not a logistic regression, because the regression is reversed. This is clear from (1.42) which has the usual `observed minus `expected' form of an exponential family. This makes tex2html_wrap_inline4097 the `response' and tex2html_wrap_inline4061 the `predictor' when we think of this as a logistic regression. That makes the `response' a fixed known quantity and the `predictor' the random quantity. So let us just say that (1.42) defines a system of estimating equations used to estimate the vector tex2html_wrap_inline3965 . It is clear from (1.40) that the value of the quasi-likelihood or quasi-score is unchanged by adding a constant to all of the tex2html_wrap_inline4103 . Hence tex2html_wrap_inline4089 is estimated only up to an additive constant, as is tex2html_wrap_inline3965 , and the tex2html_wrap_inline4109 are estimated only up to a multiplicative constant, but that is all we want.

We know from the properties of exponential families that the estimator of tex2html_wrap_inline3965 will be unique up to an additive constant if tex2html_wrap_inline3965 is identifiable up to an additive constant. Geyer (1991) shows this happens if the densities tex2html_wrap_inline4043 cannot be divided into nonempty two subsets such that for each data point tex2html_wrap_inline4117 for all j in one of the subsets. The quasi-score is the average of a bounded function tex2html_wrap_inline4121 . Hence it satisfies a Lyapunov condition, hence a sufficient condition for a CLT (Theorem 2 in Geyer, 1991) is that the Markov chain samplers be geometrically ergodic. From the CLT we can obtain a Monte Carlo standard error for tex2html_wrap_inline4123 . Geyer (1991) gives the details, which we shall pass over here. The computer code described in Section 1.13 does this calculation.

A possible alternative to the method of reverse logistic regression for calculating log likelihood ratios for widely separated points is the method of umbrella sampling (Torrie and Valleau, 1977; Geyer and Thompson, 1995), but we shall not use it in this chapter.


next up previous
Next: Fitting the Saturation Model Up: Likelihood Inference for Spatial Previous: Stochastic Approximation

Charles Geyer
Fri Jul 5 15:26:21 CDT 1996