With the normalized densities of a model given by (1.2), the log likelihood for an observation x is
It turns out to be more convenient to use the log likelihood ratio against
a fixed point
The first term, involving the unnormalized densities, is known in closed
form, the second, involving the normalizing function, is not.
But, if
whenever
, then
This permits calculation
of the log likelihood by MCMC. Let
,
,
be simulations
from
. Then
is approximated by
Maximizing (1.31) gives a Monte Carlo approximation
to the MLE
, which maximizes (1.30).
The gradient of (1.31) is
which can be recognized as a case of importance sampling. Define
and for any function g
Then (1.34) is the Monte Carlo approximation of
given by the importance sampling formula using normalized importance weights
(1.33). Using this notation, we get
Similarly,
where for a vector-valued function g
This all simplifies considerably in the exponential family case where we have
and normalized importance weights are
Giving estimates of the score
and Fisher information
just what one expects, since the exact score and Fisher information are obtained by replacing Monte Carlo expectations by exact expectations.