Section 9.6 in DeGroot and Schervish describes the Kolmogorov-Smirnov test, both one-sample (the subject of this section) and two-sample (for which see below).
Here are the calculations for Example 9.6.1 in DeGroot and Schervish done using R
The R statements in our example also do a much more sensitive test of normality, as so-called quantile-quantile plot which is always called Q-Q plot for short. It plots sorted data values (quantiles of the empirical distribution) against the corresponding quantiles of a theoretical distribution (here the normal distribution). If the theoretical distribution is a good fit to the data, the points will all be near a straight line, the line with intercept zero and slope one if the distribution is completely specified or any line if we only specify a location-scale family (like the family of all normal distributions).
Rather than try to read the data off the picture for DeGroot and Schervish's two-sample example, we will use some other data.
Here the Q-Q plot plots sorted x
values against
sorted y
values (quantiles of the empirical distribution
for x
versus quantiles of the empirical distribution for y
).
If the distributions are the same, the points should lie near the
straight line with intercept zero and slope one.
The one-sample Kolmogorov-Smirnov isn't very useful in practice because it requires a simple null hypothesis, that is, the distribution must be completely specified with all parameters known.
What you want to do is test with unknown parameters. You would like the null hypothesis to be all normal distributions (and the alternative all non-normal distributions) or something like that. What you want to do is something like this, as Kolmogorov-Smirnov test with estimated parameters.
The reason for the WARNING is that estimating the parameters changes the
null distribution of the test statistic. The null distribution is
generally not known when parameters are estimated and is not the
same as when parameters are known. The plug-in
principle doesn't
work here.
Fortunately, when we have a computer, we can approximate the null distribution of the test statistic by simulation.
For comparison
However because of the trick of adding 1 to the numerator and denominator
in calculating the P-value it can be used straight
without regard
for the randomness. Under the null hypothesis the probability
Pr(P ≤ k / nsim)
is exactly k / nsim when both the randomness
in the data and the randomness in the simulation are taken into account.