Rules
See the Section about Rules for Quizzes and Homeworks on the General Info page.
Your work handed into Moodle should be a plain text file with R commands and comments that can be run to produce what you did. We do not take your word for what the output is. We run it ourselves.
Note: Plain text
specifically excludes
Microsoft Word native format (extension .docx
). If you have to
Word as your text editor, then save as and choose the format to be Text (.txt)
or something like that. Then upload the saved plain text file.
If you have questions about the quiz, ask them in the Moodle forum for this quiz. Here is the link for that https://ay16.moodle.umn.edu/mod/forum/view.php?id=1221096.
On future assignments you can use knitr
or rmarkdown
after we have talked about it. But avoid that on this assignment.
Quizzes must uploaded by the end of class (1:10). It should actually allow a few minutes after that. Here is the link for uploading the quiz https://ay16.moodle.umn.edu/mod/assign/view.php?id=1221097.
Homeworks must uploaded before midnight the day they are due. Here is the link for uploading the homework. https://ay16.moodle.umn.edu/mod/assign/view.php?id=1221099.
Quiz 1
Problem 1
Write an R function that, given a numeric
vector x
calculates its mean, population variance
,
and population standard deviation
, that is, if xi
are the components of x
and n is the length
of x
, then the mean is
population varianceis given by
population standard deviationis σ (the square root of the variance).
Do not use the R functions mean
, var
,
or sd
. You may use the R function sum
or any other
R function in the R core (what is available without using the R function
library
to attach a package).
Your function should return a list with three components named
mean
, var
, and sd
, which are the
three things you calculated.
For this problem you do not have to worry about GIEMO (garbage in, error messages out). That is the next problem. If your function does what it is supposed to when the input is correct, that gets full credit.
Not only write a function, but also show it working on the data obtained by the R command
x <- scan(url("http://www.stat.umn.edu/geyer/s17/3701/data/q1p1.txt"))
Problem 2
Rewrite your function for the preceding problem so it does GIEMO (garbage in, error messages out).
It should give an error if its argument has length zero,
has NA
or NaN
or Inf
or -Inf
components,
or is not of type "numeric"
.
Hint: in Section 8 of the first
course notes Basics
we used the function is.finite
.
Look it up and see if you want to use that.
Show that your new function still works on the data described in the preceding problem.
Problem 3
Modify the calculations of
Sections 7.5.2, 7.5.3, and 7.5.4 of the first
course notes Basics
so that they are done by one R function.
Your R function will have one argument, which is the data
(x
in the example in the notes) and will produce
one scalar value, which is the MLE (maximum likelihood estimate)
(oout$maximum
in the example in the notes).
For this problem you can use the easier
method of Section 7.5.3
of the first course notes because inside your function x
is not
a global variable (hence not evil
) because it is a local variable
in your function.
For this problem you do not have to worry about GIEMO (garbage in, error messages out). If your function does what it is supposed to when the input is correct, that gets full credit.
Not only write a function, but also show it working on the data obtained by the R command
x <- scan(url("http://www.stat.umn.edu/geyer/s17/3701/data/q1p3.txt"))
Homework 1
Homework problems start with problem number 4 because, if you don't like your solutions to the problems on the quiz, you are allowed to redo them (better) for homework. If you don't submit anything for problems 1–3, then we assume you liked the answers you already submitted.
Problem 4
This is a modification of problem 1. Now we will allow unequal probabilities
for the data values. So now we have two vectors of the same length, call
them x
and p
and the latter is a probability vector,
meaning its components are nonnegative and sum to one.
If xi are the components of x
and pi are the components of p
and n is the length of both x
and p
,
then the equations in problem 1 are modified to
population variance. As before and as always, the standard deviation is the square root of the variance.
Again for this problem you need not worry about GIEMO (that's the next problem).
Data for this problem are
d <- read.table(url("http://www.stat.umn.edu/geyer/s17/3701/data/q1p4.txt"), header = TRUE)This produces a data frame, which we have not covered yet but which you can think of as a list (which it is, just a list with extra requirements), that is
d$x
is what we are calling x
above
and d$p
is what we are calling p
above.
Otherwise, this problem is just like problem 1: write the function and show it working on these data.
Problem 5
This problem is to problem 4 as problem 2 is to problem 1. Add GIEMO to your solution to the preceding problem. Catch any problems with either argument. Show your function working.
Hint: when you are checking that p
sums to one, don't
compare doubles for exact equality
Section 10.6
of the first course notes Basics
explains.
Problem 6
This problem is about testing. In particular, seeing that the error checks catch all the errors.
For both the functions you wrote in problems 2 and 5, make up bad data for which they fail. Make up data that makes them fail each different check you put in the functions.
Problem 7
This problem is like problem 3 except that now we want to allow different probability distributions. We will still ignore GIEMO, because that may be too hard at this point in the course.
Write an R function that has three arguments
- the data vector
x
, just like before, - a function that itself has two arguments,
- the (univariate) parameter
theta
and - the data vector
x
,
- the (univariate) parameter
- and an interval over which to search given by a vector of length two
called
interval
.
x
is contiuous or discrete, respectively.
(The reason we have the user supply the interval because there is no
way we can tell what the range of values of the parameter is).
One example of such a function is
function(theta, x) dgamma(x, shape = theta, log = TRUE)that we used in our function but another would be
function(theta, x) dcauchy(x, location = theta, log = TRUE)and yet another would be
function(theta, x) dbinom(x, 20, prob = 1 / (1 + exp(- theta)), log = TRUE)The idea is that the user provides a function for whatever the distribution the user wants.
Modify your answer to problem 3 so it works as described here.
If you use the gamma PDF function above as your function argument, then the data for problem 3 are appropriate.
If you use the Cauchy PDF function above as your function argument, then the following data are appropriate
x <- scan(url("http://www.stat.umn.edu/geyer/s17/3701/data/q1p7c.txt"))
If you use the binomial PMF function above as your function argument, then the following data are appropriate
x <- scan(url("http://www.stat.umn.edu/geyer/s17/3701/data/q1p7b.txt"))
The parameter spaces are 0 to ∞ for gamma, − ∞ to ∞ for Cauchy and binomial. But I assure you that the true unknown parameter values are less than 10 in absolute value and, for the gamma, greater than 0.1. You are relying on the user of your function to get this right, but while testing your function you have to play the role of the user.
Show that your function works with all three of the user-supplied functions given above.