University of Minnesota, Twin Cities School of Statistics Stat 5601 Rweb Computing Examples
The data for this problem are at the URL
http://rweb.stat.umn.edu/WSdata/Ch08data/passtime.txtWith that URL given to Rweb, one variable
passtime
is
loaded.
Data taken from Wild and Seber Chance Encounters: A First Course in Data Analysis and Inference (Wiley, 2000), page 328.
These data are measurements of the passage time in microseconds for a beam of light to pass from one mirror to another 3721 meters away and return. The experiment was performed by the American physicist Simon Newcomb in 1882. The vector
speed <- 2 * 3721 / passtime * 1e6gives the values velocity of light corresponding to the passage time measurements.
speed
not passtime
).
The data for this problem are at the URL
http://www.stat.umn.edu/geyer/5601/mydata/w.txtWith that URL given to Rweb, one variable
x
is loaded.
The one-sample Kolmogorov-Smirnov test can be modified to allow estimated parameters. To test whether a data set is normal, just do
ks.test(x, pnorm, mean = mean(x), sd = sd(x))But (a very big but), when the
mean
and sd
arguments are estimates as we have here rather than
known constants specifed without reference to the data, then
the test statistic computed by the R function
foo <- function(x) ks.test(x, pnorm, mean = mean(x), sd = sd(x))$statisticno longer has the same distribution as when the parameters are known (the theoretical distribution of the Kolmogorov-Smirnov test statistic).
The distribution of this estimate with parameters estimated has no nice theory, is not distribution-free, and must be approximated by simulation. When the distribution assumed by the null hypothesis is the normal distribution, this test is known as the Lillefors test. Since change of location or scale does not change the distribution of the test statistic, this is not, strictly speaking, a bootstrap. The Lillefors test is exact and distribution-free. However, it proceeds just like a parametric bootstrap. Generate (parametric) bootstrap data using the R statement
x.star <- rnorm(n)where
n
is the length of the data x
.
Do a parametric bootstrap using at least 10,000 (1e4
)
bootstrap samples.
foo
defined above. Show the
value of the test statistic for the observed data by a line on the histogram.
ks.test
function, just to see how wrong it is.
The data for this problem are the cholost
data set in
the bootstrap
library of R used for all the examples
on the more on regression
page, in particular, for the
smoothing spline example.
When the smoothing spline is calculated as in that example, the predicted
value at the point x = 43
is calculated by the code
as.numeric(predict(out, newdata = data.frame(x = 43)))(the
as.numeric
is required to keep the bootstrap below
from crashing).
Note: this originally said 42 instead of 43 (my mistake) you can use 42 for your answer if you've already done that.
x = 43
described above.
x = 43
.
Hint: The
on-line help for the bcanon
function describes how to
use bcanon
to bootstrap regression (or any complex data structure)
using the k.star
trick. Write your theta
function
to take one argument k.star
and give the sequencefoo <- function(k.star) {
some code that calculates the estimate as a function ofk.star
}
1:n
as the first argument to bcanon
bcanon(1:n, theta = foo,
and any other arguments you need)
The data for this problem are the lutenhorm
data set
in the bootstrap
library of R that was used as the
example of the subsampling bootstrap for time series
both the
simple standard error calculation
and the
more complicated confidence interval calculation.
What you are to do is change the latter
(the
more complicated confidence interval calculation)
to use a sensible estimator of the AR(1) parameter ρ.
The R code
library(ts) ar.mle(x, order.max = 1)$aris the estimator of the AR(1) parameter for the time series
x
that we want to use for this problem. This function
calculates the maximum likelihood estimator
(on-line help),
but you don't need to know what it
is to use it. Note that you do need the library
statement
to access to this function.
Change the example to use this estimator, and produce the point estimate, histogram from which critical values are derived, and confidence interval using these critical values as in the example (but with the new estimator). (You do not need to produce the last confidence interval based on standard error and an assumption of normality and unbiasedness of the estimator).