Statistics 3701 (Geyer, Fall 2022) Homework 2

Rules

See the Section about Rules for Quizzes and Homeworks on the General Info page.

Your work handed into Canvas should be an Rmarkdown file with text and code chunks that can be run to produce what you did. We do not take your word for what the output is. We may run it ourselves. But we also want the output.

You may ask questions if the wording of the questions are confusing. But the instructor will not be giving hints.

Quizzes must uploaded by the end of class (1:10). It should actually allow a few minutes after that, but those not uploaded by 1:10 will be marked late. Here is the link for uploading this quiz https://canvas.umn.edu/courses/330843/assignments/2807883.

Homeworks must uploaded before midnight the day they are due. Here is the link for uploading this homework. https://canvas.umn.edu/courses/330843/assignments/2807885.

Quiz 2

Problem 1

Write an R function that, given a numeric matrix A

checks that there are no NA or NaN components,
checks that both dimensions are at least 1 (R matrices can have zero for dimensions), and
returns the submatrix of A whose columns are the columns of A that contain a component equal to the largest value of any component of A.

Not only write a function, but also show it working on the data obtained by the R command


load(url("https://www.stat.umn.edu/geyer/3701/data/2022/q2p1.rda"))
ls()

(This loads two R objects: matrices a and b. Apply your function to both.)

Note: The problem states that your function should return a matrix not a vector. Beware of the footgun behavior of square bracket function.

Problem 2

Write an R function %foo% that interleaves two character vectors of the same length (length zero is allowed). In more detail, its output vector takes the first element from the first vector (the one on the left-hand side of the operator), then the first element from the other vector, then the second element of the first vector, then the second element of the second vector, and so forth.

Test your function on some nonzero length vectors.

Problem 3

Write a function that takes a numeric matrix and symmetrizes its columns: for any vector x its symmetrization is c(x, -x).

If any components of the matrix are NA, NaN, Inf, or -Inf your function should produce an error message.

Not only write a function, but also show it working on the data obtained by the R command


load(url("https://www.stat.umn.edu/geyer/3701/data/q2p3.rda"))
ls()

(This loads one R object: a matrix a.)

Homework 2

Homework problems start with problem number 4 because, if you don't like your solutions to the problems on the quiz, you are allowed to redo them (better) for homework. If you don't submit anything for problems 1–3, then we assume you liked the answers you already submitted.

Problem 4

This is a modification of problem 3. Do it without loops. (If you already did it without loops, then there is nothing left to do, your solution to problem 3 also counts as a solution to this problem.)

Hint: The R function apply, when given a function that maps vectors to vectors, returns a matrix. Section 6.8 of the course notes about Matrices, Arrays, and Data Frames illustrates this.

Problem 5

This problem is to write a function just like the function apply in the R base package, which is described in Section 6.8 of the course notes about Matrices, Arrays, and Data Frames except that the function to be written for this problem — for concreteness call it myapply — is a lot simpler.

Like R function apply its signature is

function(X, MARGIN, FUN, ...)

but unlike apply its arguments are a lot simpler.

Argument X is a matrix of any type a matrix can have (numeric, character, logical, complex). This is unlike the corresponding argument of apply, which can be an array of any dimension.
Argument MARGIN is either (the number) 1 or (the number) 2. This is unlike the corresponding argument of apply, which can be either a numeric vector or a character vector (the latter a possibility I was unaware of before writing this question).
Argument FUN is an R function that maps vectors to vectors (possibly of length 1 possibly of longer length, as explained in the section of the course notes cited above). We will consider it an error if the function FUN returns results of different length in any invocation of myapply. The requirements for the corresponding argument of apply are much looser, as help("apply") explains.
For this problem, you can assume (rather than check) that FUN always returns a vector of the same length in any invocation of myapply.
Argument ... is passed to FUN, that is, any arguments to myapply that do not match X, MARGIN, or FUN are passed to FUN whenever it is called by myapply. (This is just like how apply works).
That is, if foo is the thingummy we are trying to apply FUN to, either a row or column of X depending on whether MARGIN is 1 or 2, we always invoke it as
```
     FUN(foo, ...)
     
```

In order to make this problem non-trivial, you are not allowed to use the R function apply or any other R function with apply as part of its name (lapply, for example).

According to the rules (in the rules section above), it is perfectly legal to look at the source code for apply, what

apply

shows. But that code is very confusing because what apply does is much more complicated than what a solution to this problem has to do. You can even copy code from apply so long as you say you are doing that (put comments in your code to say which lines you have copied). In order that you don't just copy all the lines of array we make another rule for this problem that your function should have no more than 30 R commands, not counting any code that catches errors or the function signature (the part with the R function named function).

Hint: Looking at the source for array, its first line is


FUN <- match.fun(FUN)

You should copy that. Then FUN can either be a function or the name of a function, and match.fun takes care of that. If you do this, you do not have to do any error checks for the argument FUN. The function match.fun will catch all errors. You also do not need a comment about using this.

Another Hint: Since you cannot use apply or any other function with apply as part of its name (lapply, for example), you will have to use a loop.

Yet Another Hint: Since you do not know what length vector FUN returns until after the first time you call it, you have to wait until you have called it to find out what the dimensions of the result of myapply are.

Try your function myapply on all of the examples using apply in Section 6.8 of the course notes about Matrices, Arrays, and Data Frames making sure that you get the same results.

Your function does not have to produce the same row and column names on its result as array does. It is also OK if your function sometimes produces a matrix when apply produces a vector or vice versa. It is enough that the numbers are the same.

Problem 6

This problem is about arrays and the R function apply applied to arrays. This problem does not require you to write a function.

The R command


load(url("https://www.stat.umn.edu/geyer/3701/data/q2p6.rda"))

loads one R object: a three-dimensional array called pat.

Apply the R function median to the three two-dimensional margins (indexed by pairs of indices) and the three one-dimensional margins (indexed by single indices) of this array.