This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License (http://creativecommons.org/licenses/by-sa/4.0/).
The zero-truncated Poisson distribution is a Poisson distribution conditioned on being nonzero. It has (exponential family canonical) parameter \(\theta\) and data \(x\). These satisfy \(- \infty < \theta < \infty\) and \(x \in \{ 1, 2, 3, \ldots \}\).
Define \begin{align*} m & = e^\theta \\ \mu & = \frac{m}{1 - e^{- m}} \end{align*} (\(\mu\) is the mean of the zero-truncated Poisson distribution and \(m\) is the mean of the corresponding untruncated Poisson distribution). Then the log likelihood and derivatives are given by \begin{align*} l(\theta) & = x \theta - m - \log(1 - e^{- m}) \\ l'(\theta) & = x - \mu \\ l''(\theta) & = - \mu (1 + m - \mu) \\ & = - \mu (1 - \mu e^{- m}) \end{align*}where, as always in statistics and as in R, \(\log\) is the natural logarithm function.
The two formulas for the second derivative have different parameter values for which they suffer catastrophic cancellation. More on this below.
This is a super hard problem to calculate accurately.
First we need to know the behavior of the function near \(- \infty\). Naively taking termwise limits gives \((- \infty) - 0 - (- \infty)\), which is undefined. So we apply L’Hospital’s rule.
\begin{align*} \lim_{\theta \to - \infty} e^{l(\theta)} & = \lim_{\theta \to - \infty} \frac{e^{x \theta} e^{- m}}{1 - e^{- m}} \\ & = \left( \lim_{\theta \to - \infty} e^{- m} \right) \left( \lim_{\theta \to - \infty} \frac{e^{x \theta}}{1 - e^{- m}} \right) \\ & = \lim_{\theta \to - \infty} \frac{e^{x \theta}}{1 - e^{- m}} \end{align*}because \(m \to 0\) and \(e^{- m} \to 1\) as \(\theta \to - \infty\).
Now we apply L’Hospital’s rule to the limit that remains. \[ \lim_{\theta \to - \infty} \frac{e^{x \theta}}{1 - e^{- m}} = \lim_{\theta \to - \infty} \frac{x e^{x \theta}}{e^{- m} e^\theta} = \begin{cases} 1, & x = 1 \\ 0, & x > 1 \end{cases} \] Hence (remembering that the limit above is for \(e^{l(\theta)}\)) \[ \lim_{\theta \to - \infty} l(\theta) = \begin{cases} 0, & x = 1 \\ - \infty, & x > 1 \end{cases} \] The problem is to get correct answers, avoiding overflow or underflow, except when the answers are really too large or too small for the computer to represent.
In case \(x = 1\), we need to calculate the log of this quantity accurately, avoiding underflow when possible. \[ \log\left( \frac{e^{x \theta}}{1 - e^{- m}} \right) = \log\left( \frac{m}{1 - e^{- m}} \right) = \log(\mu) \] which we know is near zero when \(m\) is near zero. The problem is that, even if we calculate this as
log(- m / expm1(- m))
we know that - expm1(- m)
is very close to \(m\) for very small \(m\) so we do get very close to \(\log(1) = 0\), but in order to avoid catastrophic cancellation, getting exactly zero when we should be getting something nonzero, we need to replace the log
with log1p
. But we don’t have the expression in the form to do that.
Replace the exponential in the denominator by its Maclaurin series \[ e^{- m} = 1 - m + \frac{m^2}{2} - \frac{m^3}{3!} + \frac{m^4}{4!} - \cdots + \frac{(- m)^k}{k!} + \cdots \] so \[ \mu = \frac{m}{1 - \left(1 - m + \frac{m^2}{2} - \frac{m^3}{3!} + \frac{m^4}{4!} - \cdots + \frac{(- m)^k}{k!} + \cdots \right)} = \frac{1}{1 - \frac{m}{2} + \frac{m^2}{3!} - \frac{m^3}{4!} + \cdots + \frac{(- m)^{k - 1}}{k!} + \cdots } \]
So this gives us a formula for stable computation for large negative \(\theta\) \begin{align*} l(\theta) & = x \theta - m - \log(1 - e^{- m}) \\ & = (x - 1) \theta - m + \log(\mu) \\ & = (x - 1) \theta - m - \log\left( 1 - \frac{m}{2} + \frac{m^2}{3!} - \frac{m^3}{4!} + \cdots + \frac{(- m)^{k - 1}}{k!} + \cdots \right) \end{align*}Thus we get \(1 - \mu \approx - m / 2\) for \(\theta \approx - \infty\). So the first derivative should not underflow until \(m\) does, which is at about \(\theta = - 750\).
For \(x > 1\), we will have \(l'(\theta) \approx x - 1\) for \(\theta \approx - \infty\) and this is stably calculated by subtraction.
but in the preceding section we learned how to calculate \(1 - \mu\) stably. Thus we get \(l''(\theta) \approx - \mu m / 2\) for \(\theta \approx - \infty\).
Since this formula does not contain \(x\), it works for all values of \(x\).
and there is nothing that can be done about any of this. The quantities to be calculated are larger than any numbers the computer has. That’s what the values Inf
and -Inf
were invented for.
However, there are subtractions in the formulas above. We need to be sure that they do not give Inf - Inf
= NaN
. Any formulas that do do that also suffer from catastrophic cancellation for very large \(\theta\).
The formula \[
l(\theta) = x \theta - m - \log(1 - e^{- m})
\] will have no problems (we should use the log1p
function to calculate the log) until \(x \theta\) overflows, in which case \(m = e^\theta\) has already overflowed, and the computer calculates Inf - Inf = NaN
, which is wrong. We know the limit is -Inf
. This is because \(x \theta\) goes to infinity much more slowly than \(e^\theta\).
So we will have to test for x * theta == Inf
and return -Inf
in this case.
There is no problem with the first derivative formula \[
l'(\theta) = x - \mu
\] When \(\theta\) is so large that \(\mu\) overflows, this formula will give the correct result -Inf
.
There is a problem with the second derivative formula \[ l''(\theta) = - \mu (1 + m - \mu) \] that is good for negative \(\theta\) or even moderate sized positive \(\theta\). For large \(\theta\) we have \(m \approx \mu\) and suffer catastrophic cancellation.
So for large positive \(\theta\) we need to use our other second derivative formula. \[
l''(\theta) = - \mu (1 - \mu e^{- m})
\] Since \(\mu e^{- m} \approx 0\) when \(\theta \approx \infty\), this gives the correct answer, approximately \(- \mu\) when this does not overflow and -Inf
when it does overflow.
There are three cases. For large negative \(\theta\), use \[ l(\theta) = (x - 1) \theta - m - \log\left( 1 - \frac{m}{2} + \frac{m^2}{3!} - \frac{m^3}{4!} + \cdots + \frac{(- m)^{k - 1}}{k!} + \cdots \right) \] (since the infinite series is convergent for all \(\theta\), this works for all \(\theta\) but is only necessary for large negative \(\theta\)).
Otherwise, use \[
l(\theta) = x \theta - m - \log(1 - e^{- m})
\] except when \(x \theta\) overflows (is Inf
), in which case the correct result is -Inf
.
The mean \(\mu\) is not part of the result but is used in intermediate calculations.
When \(\theta\) is large negative, use \[ \mu = \frac{1}{1 - \frac{m}{2} + \frac{m^2}{3!} - \frac{m^3}{4!} + \cdots + \frac{(- m)^{k - 1}}{k!} + \cdots } \] Otherwise, the formula \[ \mu = \frac{m}{1 - e^{- m}} \] works.
When \(x = 1\) and \(\theta\) is large negative, use \[ l'(\theta) = m \cdot \frac{- \frac{1}{2} + \frac{m}{3!} - \frac{m^2}{4!} + \cdots + \frac{(- 1)^{k - 1} m^{k - 2}}{k!} + \cdots } {1 - \frac{m}{2} + \frac{m^2}{3!} - \frac{m^3}{4!} + \cdots + \frac{(- m)^{k - 1}}{k!} + \cdots } \] In all other cases \[ l'(\theta) = x - \mu \] gives good answers.
When \(\theta\) is large negative, use \[
l''(\theta)
=
- \mu (1 + m - \mu)
\] When \(\theta\) is large positive, use \[
l''(\theta)
=
- \mu (1 - \mu e^{- m})
\] except when \(\mu\) overflows (is Inf
), in which case this formula gives NaN
but the correct answer is -Inf
.
For moderate sized \(\theta\), positive or negative, either formula works.