The likelihood ratio test is harder to calculate and cannot be done using samples from one model. It can easily be done using the method of reverse logistic regression (Section 1.12) to estimate the normalizing constants.
Having estimated the normalizing constants, the likelihood ratio test statistic is
if we choose the sequence of distributions so that
is the unnormalized density for the parameter value
which
is the MLE in the null hypothesis and
is the unnormalized density for the parameter value
which
is the MLE in the alternative hypothesis.
Although the method is defined when there are just two samples, one for the null and one for the alternative hypothesis, it does not work unless the samples overlap. There are no guidelines for how much overlap is needed. One must try and see. We start with seven samples for distributions with parameter values evenly spaced along a line between the null and alternative
Then we try four evenly spaced distributions using every other line (the same samples). The results are
The samples all have size 1000 and spacing 200. With larger samples, we would get more accuracy. It is clear that the more overlap the more accuracy. It was not possible to do reverse logistic regression with only three samples. The optimization code crashed when the Hessian became singular to the precision of computer arithmetic. If we had obtained estimates, the MCSE would have been huge. The MCSE here is understated, since it only estimates the error of the log likelihood ratio assuming the two parameter values being compared are known. If we took proper account of the effect of errors in the parameter estimates, the error would be larger. MCSE formulas for the Wald and Rao test statistics have not been worked out.
All three test statistics are to be compared to the chi-squared distribution on one degree of freedom and give P-values essentially zero. The test statistics agree as well as one might expect given their size. As mentioned in connexion with confidence intervals for the MLE, it is not clear that the usual asymptotics are valid, that their exist infinite-volume limiting distributions for the saturation model and that the usual asymptotics obtain in the limit. One would expect from the elliptical shape of scatter plots of the canonical statistic that these asymptotics give reasonable answers. If one were very worried about the validity of asymptotics, one could do a parametric bootstrap, like Geyer and Møller (1994). One might find, like they did, that the bootstrap gave the same answer as the asymptotics and hence was unnecessary except for whatever reassurance it provided.