Radically Elementary Probability and Statistics

University of Minnesota, Twin Cities School of Statistics Charlie Geyer's Home Page

Radically Elementary Probability Theory is the title of a book by Edward Nelson (Princeton University Press, 1987, amazon.com web page for his book).

This web page is about University of Minnesota School of Statistics Technical Report No. 657, which is my attempt to fill in some details that Nelson leaves out of his (very short) book. This is a work in progress. I intend to write several more chapters, but needed to turn it into something one of my students could cite.

Both of these books — Nelson's and mine — use nonstandard analysis (Wikipedia entry and Mathworld entry), but a very simple version of nonstandard analysis that IMHO is much easier to understand than conventional measure theory. As I say in my preface

… almost everything one needs to know about nonstandard analysis are the arithmetic rules for infinitesimal, appreciable, and unlimited numbers (a number is unlimited if its reciprocal is infinitesimal, and a number is appreciable if it is neither infinitesimal or unlimited) given in the tables in our Section 3.3, the principle of external induction — an axiom of the nonstandard analysis used in Nelson (1987) and this book (Axiom IV in Section 2.1) — and the principle of overspill (our Section 3.4).

But the aim of our books is very different from most nonstandard analysis, which merely aims to provide alternative proofs (using infinitesimals) of conventional theorems that also have conventional proofs. We change the subject of study. We do this by limiting our attention to probability models that have

finite sample space and
no nonempty events of probability zero.

In (i) Robinson-style nonstandard analysis one would say hyperfinite but in Nelson-style nonstandard analysis (Wikipedia entry for internal set theory) we just say finite, the point being that infinitesimals and unlimited numbers are real numbers just like any other real numbers.

The point of (i) is that measure theory becomes completely unnecessary. Every probability and expectation — conditional or unconditional — is just a finite sum given by a simple explicit formula. The point of (ii) is that conditional probability and expectation is always well defined — no need to consider conditioning on events of probability zero (perhaps infinitesimal but not exactly zero).

Another quote from my preface

One might think that this radical simplification is too radical — throwing the baby out with the bathwater — but Nelson (1987) and this book provide some evidence that this is not so. Even though our theory has no continuous random variables or even discrete random variables with infinite sample space, hence no normal, exponential, Poisson, and so forth random variables. We shall see that finite approximations satisfactorily take their place.
Consider a Binomial(n, p) random variable X such that neither p nor 1 − p is infinitesimal and n is unlimited. Then (the Nelson-style analog of) the central limit theorem says that (X - n p) / √p (1 - p) / n has a distribution that is nearly normal in the sense that the distribution function of this random variable differs from the distribution function of the [standard] normal distribution in conventional probability only by an infinitesimal amount at any point (our Theorem 8.2).
Consider a Binomial(n, p) random variable X such that p is infinitesimal but n p is appreciable. Then X has a distribution that is nearly Poisson in the sense that the distribution function of this random variable differs from the distribution function of the Poisson(n p) distribution in conventional probability only by an infinitesimal amount at any point (unfortunately this result is in a yet to be written chapter of this book, but is easily proved).
Consider a Geometric(p) random variable X such that p is infinitesimal and choose a (necessarily infinitesimal) number ε such that Y = ε X has appreciable expectation μ. Then Y has a distribution that is nearly exponential in the sense that the distribution function of this random variable differs from the distribution function of the Exponential(1 ⁄ μ) distribution in conventional probability only by an infinitesimal amount at any point.

So we actually study nonstandard probability models — like the Binomial(n, p) random variable with unlimited sample size n mentioned above — for their own sake. We do not turn them into continuous random variables.

How interesting this approach is remains to be seen. Since its level of abstraction is much lower than conventional measure theory, it has the potential of allowing much more rigor in lower-level courses where much handwaving and beyond the scope of this course now occurs.

Consider a Poisson process. The Nelson-style analog is just a sequence of independent and identically distributed Bernoulli(p) random variables with infinitesimal p spaced ε apart on the real line so that λ = p ⁄ ε is appreciable. It is trivial to show, no more than Pr(A ∩ B) ⁄ Pr(B) type conditional probability, that the interarrival times in the Bernoulli process are geometric and as in the quotation above that the interarrival times are nearly exponential. It is also easy to show — just the conventional proof of the Poisson approximation to the binomial — that counts of events in intervals with appreciable length are nearly Poisson. No lack of rigor and no handwaving. You don't need measure theory unless you insist on taking the nonstandard Bernoulli process to the conventional Poisson process limit.

For more than that, you'll have to read the books.