Rules
See the Section about Rules for Quizzes and Homeworks on the General Info page.
Your work handed into Canvas should be an Rmarkdown file with text and code chunks that can be run to produce what you did. We do not take your word for what the output is. We may run it ourselves. But we also want the output.
You may ask questions if the wording of the questions are confusing. But the instructor will not be giving hints.
Quizzes must uploaded by the end of class (1:10). It should actually allow a few minutes after that, but those not uploaded by 1:10 will be marked late. Here is the link for uploading this quiz https://canvas.umn.edu/courses/330843/assignments/2795250.
Homeworks must uploaded before midnight the day they are due. Here is the link for uploading this homework. https://canvas.umn.edu/courses/330843/assignments/2795258.
Quiz 1
Problem 1
Write an R function that, given a numeric
vector x
returns a numeric vector whose components are
the first 10 strictly positive components of x
or all of the strictly positive components of x
if there are
fewer than 10. If x
has no positive components, then your
function should return a numeric vector of length zero.
For this problem you do not have to worry about GIEMO (garbage in, error messages out). That is the next problem. If your function does what it is supposed to when the input is correct, that gets full credit.
Not only write a function, but also show it working on numeric vectors having
- zero strictly positive components,
- between one and nine strictly positive components,
- exactly ten strictly positive components, and
- more than ten strictly positive components.
Problem 2
Rewrite your function for the preceding problem so it does GIEMO (garbage in, error messages out).
It should give an error if its argument has NA
or NaN
components or is not of type "numeric"
.
Show that your new function still works on the data described in the preceding problem.
Note:
In R the logical not operator is !
(exclamation point, also called bang).
So to reverse a test, precede it with !
.
The expression ! (x < 0)
does the same thing as x >= 0
.
This is illustrated several places in
the notes but we did not mention it in class.
Problem 3
Modify the calculations of
Section 7.5.4 of
the first course notes Basics
so that the statistical model
is the Cauchy location family, that is, the Cauchy family of distributions
(documented in the help for R function dcauchy
) with unknown
location parameter and known scale parameter, which we take to be the
default value (1).
As in that section, use a function factory to make your log likelihood function.
Not only write a function, but also show it working for finding the MLE (maximum likelihood estimate) for the data obtained by the R command
x <- scan(url("https://www.stat.umn.edu/geyer/3701/data/2022/q1p3.txt"))
Note:
If this does not work on your computer, see
a note about downloading files. Of course, you have to modify
the command used in that note to be the command used here. The
point is to give R function scan
a local file to input
when the computer forbids downloads from the internet.
Note: For an interval to find the MLE use sample median plus or minus 1. Asymptotic theory says the standard error of the median considered as an estimator of the location parameter is (π ⁄ 2) ⁄ √ n where n is the sample size. This is much smaller than 1 here, so this interval should include the MLE.
Note:
R function median
calculates the median.
Note: For this problem ignore GIEMO (garbage in, error messages out). You do not have to detect erroneous arguments to your function.
Homework 1
Homework problems start with problem number 4 because, if you don't like your solutions to the problems on the quiz, you are allowed to redo them (better) for homework. If you don't submit anything for problems 1–3, then we assume you liked the answers you already submitted.
Problem 4
This problem is about probability models on finite sample spaces. These can be specified by two vectors
-
x
, which gives possible values of a random variable, and -
p
, which gives the corresponding probabilities.
p
must have components
that are nonnegative and sum to one.
If xi are the components of x
and pi are the components of p
and n is the length of both x
and p
,
then the mean of the random variable is
Your function should return a list with three components named
mean
, var
, and sd
,
which are the three things you calculated.
For this problem you need not worry about GIEMO (that's the next problem). If your function works correctly, then it will be considered correct.
Data for this problem are
d <- read.table(url("https://www.stat.umn.edu/geyer/3701/data/q1p4.txt"), header = TRUE)This produces a data frame, which we have not covered yet but which you can think of as a list (which it is, just a list with extra requirements), that is
d$x
is what we are calling x
above
and d$p
is what we are calling p
above.
Write the function and show it working on these data.
Problem 5
This problem is to problem 4 as problem 2 is to problem 1. Add GIEMO to your solution to the preceding problem. Catch any problems with either argument. Show your function working.
Hint:
When you are checking that p
sums to one, don't
compare doubles for exact equality
Section 10.6
of the first course notes Basics
explains.
Problem 6
This problem is about testing. In particular, seeing that the error checks catch all the errors.
For both the functions you wrote in problems 2 and 5, make up bad data for which they fail. Make up data that makes them fail each different check you put in the functions.
Note: In order to have errors not stop Rmarkdown, you need to look at the new Section about Errors in the Rmarkdown Demo document.
Problem 7
This problem is to add GIEMO (garbage in, error messages out) to Problem 3.
For the Cauchy location model the parameter can be any real number (so do not check that it is positive) and the data can be any real numbers (so do not check that they are positive).
Problem 8
This problem is like Problem 4 except that we want to add the median of the distribution to our output.
Defning the median of the distribution is tricky.
First we have to sort the x
vector (because the median
needs the data in sorted order.
But we have to keep track of which components of p
go with which components of x
, so we cannot just sort
x
.
R has a function order
to do this.
i <- order(x) x <- x[i] p <- p[i]does the same thing to
x
as
x <- sort(x)but keeps the corresponding components of
x
and
p
in the same place.
Then the other tricky part is that we need the cumulative distribution
function of this distribution. For each component of x
we need the probability that the random variable is less than or equal
to that value. R function cumsum
does that (after both
x
and p
have been modified as described above)
foo <- cumsum(p)Then
foo[k]
is the value of the cumulative distribution
function at x[k]
.
With all of that done, the median of the distribution of the random
variable is the smallest x[k]
such that foo[k]
is greater than or equal to one half.
So redo your solution to Problem 4 adding a component median
to the output.
Show your function working on the data for Problem 4.
Like in Problem 4 you may ignore GIEMO.