General Instructions
To do each example, just click the Submit
button.
You do not have to type in any R instructions or specify a dataset.
That's already done for you.
Theory
(Cumulative) Distribution Functions
The (cumulative) distribution function of a (realvalued) random variable X is the function F defined by
 Because F(x) is a probability, it is necessarily between zero and one.
 Because the event X ≤ x increases as x increases, F is a nondecreasing function.

Because the event X ≤ x decreases to the empty
set as x goes to minus infinity,
lim_{x → − ∞} F(x) = 0.

Because the event X ≤ x increases to the whole
real line as x goes to plus infinity,
lim_{x → + ∞} F(x) = 1.
 If the support of X is not the whole real line, then all of the increase of F takes place on the support, that is, if a ≤ X ≤ b with probability one, then F(a) = 0 and F(b) = 1.
 Other properties of the distribution function depend on whether X is discrete or continuous.

If X is a continuous random variable, then
 F is a continuous function and is strictly increasing on the support of X.

If X is a discrete random variable, then
 F is a discontinuous function.
 The discontinuities (jumps) of F occur at the atoms of X (the points having nonzero probability).
 The height of the jump gives the probability of the atom, that is,
pr(X = x) = F(x) − F(x − ε)whenever ε is small enough so that there are no other jumps between x − ε and x.
 F is constant (its graph is horizontal) between jumps.
Empirical (Cumulative) Distribution Functions
The empirical distribution function F_{n} is just the distribution function of the empirical distribution, which puts probability 1 / n at each data point of a sample of size n.
If x_{(i)} are the order statistics and all of the order statistics are distinct, then the empirical distribution function jumps from (i − 1) / n to i / n at the point x_{(i)} and is constant except for the jumps at the order statistics.
If exactly k order statistics x_{(i)}, …, x_{(i + k − 1)}, are tied at some value, then then the empirical distribution function jumps from (i − 1) / n to (i + k) / n at that point.
Distribution Function Examples
The R function ecdf
(online help)
produces empirical (cumulative) distribution functions. The R functions
of the form p
followed by a distribution name (pnorm
,
pbinom
, etc.) produce theoretical distribution functions.
Comments
If you increase the sample size n
the empirical distribution
function will get closer to the theoretical distribution function.
If you change the theoretical distribution function from standard normal
to something else, the empirical and theoretical distribution functions
will still be close to each other, just different. For example, try
standard exponential (rexp
replaces rnorm
and
pexp
replaces pnorm
).
The Uniform Law of Large Numbers (GlivenkoCantelli Theorem
As everywhere else in statistics, the law of large numbers holds. In fact, for fixed x this is just the usual law of large numbers because the empirical distribution function F_{n}(x) is a sample proportion (the proportion of X_{i} that are less than or equal to x) that estimates the true population proportion F(x). Thus the statement that
is just the ordinary law of large numbers (the convergence here is either in probability or almost sure).
But much more is true. In fact, the convergence is actually uniform
(a fact known as the GlivenkoCantelli theorem in advanced probability theory).
The Asymptotic Distribution (Brownian Bridge)
As everywhere else in statistics, there is also asymptotic normality. In fact, as noted above, for fixed x this is just the usual central limit theorem because F_{n}(x) is a sample proportion
where p = F(x), is just the ordinary central limit theorem (the convergence here is convergence in distribution).
But much more is true. In fact, the convergence is actually uniform in a sense that we can't even start to explain at this level, because the objects of interest are not scalarvalued random variables, nor even vectorvalued random variables, but functionvalued random variables F_{n}. This can be thought of as an infinitedimensional random vector because it has an infinite number of coordinates F_{n}(x) for each of the (infinitely many) values of x.
But we won't bother with those technicalities. Suffice it to say that
converges to a Gaussian stochastic process called the Brownian bridge
in the special case that the true population distribution is Uniform(0, 1).
The Gaussian
here refers to the normal distribution, more about this
in class.
This result would have no place in a course on nonparametrics if it were peculiar to the uniform distribution. But the nonuniform case is not much different. It will be described presently.
We can see what the Brownian bridge looks like by just taking a very large sample size (large enough for the asymptotics to work).
If you repeat the plot over and over, you will see many different realizations of this random function.
The nonuniform case has to do with a curious fact from theoretical statistics. If X is any continuous random variable and F is its distribution function, then F(X) is a Uniform(0, 1) random variable. This means
 Any continuous random variable X can be mapped to a Uniform(0, 1) random variable U (by the transformation F) and vice versa (by the transformation F^{−1}).
 More importantly for the subject of KolmogorovSmirnov tests, this means that the distribution of √n ( F_{n}(x) − F(x) ) is the same for all continuous population distributions except for a transformation of the xaxis. If we base our procedures only on the vertical distance between F_{n}(x) and F(x) and ignore horizontal distances (which are transformed), then our procedure will be truly nonparametric.
Suprema over the Brownian Bridge
The distributions of both onesided and twosided suprema over the Brownian bridge are known. Define
where B(t) is the Brownian bridge. Since the Brownian bridge is a random function, D^{+} is a random variable. The distribution of this random variable is known. It has distribution function
The Brownian bridge is symmetric with respect to being turned upside down
(in distribution). Thus the statistic D^{−} defined
by replacing sup
with inf
in the definition of
D^{+} has the same distribution as
D^{+}.
Similarly, if we define the twosided supremum
where B(t) is the Brownian bridge. Since the Brownian bridge is a random function, D is a random variable. The distribution of this random variable is also known. It has distribution function
although this involves an infinite series, the series is extremely rapidly converging. Usually a few terms suffice for very high accuracy.
OneSample Tests
The onesample KolmogorovSmirnov test is based on the test statistic
for an uppertailed test.
Or on the test statistic
D^{−}_{n}
defined by replacing sup
with inf
in the formula above
for a lowertailed test.
Or on the test statistic
for a twotailed test. Usually, we want a twotailed test.
Because the distribution F hypothesized under the null hypothesis must be completely specified (no free parameters whatsoever). This test is fairly useless, and Hollander and Wolfe do not cover it. However, a very closely related test, the Lilliefors test, covered below is useful.
For now we just do a toy example using the R function ks.test
(online help).
As can be seen by trying out the example, the test is not very powerful even for large sample sizes if the distributions are not too different. Try different sample sizes and degrees of freedom for the t.
The Corresponding Confidence Interval
As we said, onesample KolmogorovSmirnov tests are fairly useless from
an applied point of view (however theoretically important). But the
dual confidence interval is of use. It gives a confidence band
for the whole distribution function (Section 11.5 in Hollander
and Wolfe).
The programmer who wrote the ks.test
function for R didn't
bother with the confidence interval. So we are on our own again. We
(like Hollander and Wolfe) will only do the twosided interval. The
onesided is similar. Just use the distribution of D^{+}
instead of the distribution of D.
Example 11.6 in Hollander and Wolfe.
Comments.
 The step function is the empirical distribution function.
 The dashed lines on either side mark a 95%
confidence band
for the true population distribution function. (The probability that the true population distribution function lies entirely within the confidence band gets closer and closer to 0.95 as the sample size goes to infinity.)  The first half of the code (above the blank line) could be replaced by
crit.val < 1.358099
if there was no interest in confidence levels other than 95%.  The tricky
ylab = expression(F[n](x))
argument to the firstplot
function makes the yaxis labelF_{n}(x)
withn
a subscript. Many more such effects are possible and are described byhelp(plotmath)
(online version of this help).
The Corresponding Point Estimate
Procedures always come in threes: a hypothesis test, the dual confidence interval, and the corresponding point estimate. What is the point estimator here?
TwoSample Tests
The difference of two independent Brownian bridges is a rescaled Brownian bridge (vertical axis expanded by √2). The obvious statistic for comparing two empirical distribution functions F_{m} and G_{n} which is
has an asymptotic distribution that is a Brownian bridge with the vertical axis expanded by (1 / m + 1 / n)^{1 / 2} because F_{m} has variance proportional to 1 / m and G_{n} has variance proportional to 1 / n.
Thus
has the standard Brownian bridge for its asymptotic distribution.
But we don't actually need to know this ourselves. It is buried in
the code for ks.test
.
Example 5.4 in Hollander and Wolfe.
Comment
It won't bother those with no previous exposure to the R
ks.test
function
(online
help) but it came as a shock to me that the meaning
of alternative = "less"
changed since the last time I taught
the course. It now means
But if the distribution function of x is less than that of y, the median of x is greater than that of y.The possible values
"two.sided"
,"less"
and"greater"
ofalternative
specify the null hypothesis that the true distribution function ofx
is equal to, not less than or not greater than the hypothesized distribution function (onesample case) or the distribution function ofy
(twosample case), respectively.
So the applied
meaning of alternative
is just the opposite
of what it is for wilcox.test
. If you want
wilcox.test(x, y, alternative = "less")
its competitor is
ks.test(x, y, alternative = "greater")
No real problem as long as you are aware of this issue. (A big problem if you forget!)
The Lilliefors Test
The onesample KolmogorovSmirnov isn't very useful in practice because it requires a simple null hypothesis, that is, the distribution must be completely specified with all parameters known.
What you want to do is test with unknown parameters. You would like the null hypothesis to be all normal distributions (and the alternative all nonnormal distributions) or something like that. What you want to do is something like this, a KolmogorovSmirnov test with estimated parameters.
The reason for the WARNING is that estimating the parameters changes the null distribution of the test statistic. The null distribution is generally not known when parameters are estimated and is not the same as when parameters are known.
Fortunately, when we have a computer, we can approximate the null distribution of the test statistic by simulation.
There is random error in this calculation from the simulation.
However, because of the trick of adding 1 to the numerator and denominator
in calculating the Pvalue it can be used straight
without regard
for the randomness. Under the null hypothesis the probability
Pr(P ≤ k / n_{sim})
is exactly k / n_{sim} when both the randomness
in the data and the randomness in the simulation are taken into account.
Summary
 Bogus Pvalue: 0.1578
 Simulation Pvalue: 0.004 ± 0.001
Comment
The name Lilliefors test only applies to this procedure of using the KolmogorovSmirnov test statistic with estimated null distribution when the null distribution is assumed to be normal. In this case, the test is exact because the test statistic and the normal family of distributions are invariant under locationscale transformations.
If the same procedure were used with another family of distributions that was not a locationscale family, then the test would not be exact. It would be a special case of the parametric bootstrap, which we will eventually cover.