We now turn to the question of choosing the optimal spacing as described
by Geyer (1992, Section 3.6). The optimal spacing depends on the use
made of the samples, in particular of the ratio of the cost of
generating a sample to the cost of using a sample in
subsequent calculations. The computer on which
all of the computations for this chapter were done (an HP 715/100)
took 2.4 seconds in calculating the MCLMLE and 0.7 seconds in
calculating the MCSE for a total of 3.1 seconds using the samples.
It took 255.7 seconds to run the MCMC sampler for
iterations
(
samples spaced 100 iterations apart). Thus the cost ratio
is R = 3.1 / 255.7 = .012.
The cost ratio is very small despite the code for the sampler being written in C and fairly efficient and the code for the maximization of the likelihood being written in S and very slow. Cost ratios this low or lower are are typical of most applications.
Let
denote an estimate of the variance of the mean of the time
series (1.43) subsampled at spacing m, estimated by
summing autocovariances at lags that are multiples of m. Then
is the estimate of the
MC error variance, and the asymptotic relative efficiency of spacing m is
proportional to
. This goes up linearly in m for
large m, so large spacing is definitely bad (Geyer, 1992).
Applied to our example, this method says that spacings 1 or 2
times the spacing of 100 used for samples at hand are about equally efficient,
and any larger spacing wastes time.
In order to have any idea what is a useful spacing, one must do a calculation
like this. Heuristic arguments don't help. Note that a spacing of 200 does
not allow every point to be changed between samples, because the points are
selected for attempted deletion only half the time, and it takes roughly
steps to visit each of n points one when a random point is
visited each step. The naive notion that the spacing should be large
enough so that most points are changed between samples is wrong.