The Problem
This assignment is our first try making an R package.
The problem comes from Professor Qiu. He and a student wrote some Fortran code for a paper. He does not know how to make R packages. I have volunteered you as helpers.
He sent me the following e-mail
Attached please find four files for your students to make a R package: two-stage.R --- R code which calls the compiled Fortran subroutine. two-stage.f --- The original Fortran subroutine two-stage.so --- The compiled Fortran subroutine rats.txt --- A dataset for demonstration In file two-stage.R, I used data file "rats.txt" for demonstration. Maybe in the R package, we should change it to any specified data file. Please let me know if you have any questions. Thanks a lot.(Actually he used the name
two-stage.r which I took the
liberty of renaming to two-stage.R following the R convention.
First Part of the Assignment
The first part of the the assignment is the following.
- Download the three files that I have make links in the e-mail above.
- Build the shared library that I did not provide. How to do that is explained in Writing R Extensions.
- Run
two-stage.Ras an RBATCHjob. Check the output to see that there were no errors.
So far, so good (assuming you had no errors).
Second Part of the Assignment
Now comes the hard part. What does this code do?
Comments in the file two-stage.R reference
a paper available for download if you are
coming from within the umn.edu domain.
That paper may clear up some questions about this code
(or may not for all I know).
Also the comments in two-stage.R and two-stage.f
should answer some questions.
Perhaps we will have to formulate some questions for Prof. Qiu. Don't all bug him separately. I'll collate your questions and hand them to him in a bunch.
Start writing a design document
for this package.
This should be your own work. You are allowed to talk to each other,
but should do your own write-up.
This design document should say what the R package will do — what the input to the (yet to be written) R function (or functions) in the package will be and what they will calculate. You don't have to copy stuff from the paper; citing the paper is o. k (more than o. k. — it is the Right Thing).
Especially important is
what conditions should the inputs to the R function(s) satisfy?
Do they need to be "numeric" or "character"
or other R type?
Do they need to be nonnegative, integer, or other numerical type?
It is Good Programming Practice to write a design document before
writing any code. It often saves huge amounts of time.
If you don't know where you're going, any road will take you there
.
If you don't know what your program is supposed to do, it's a bit
difficult to make it do it.
The emphasis on input conditions is because Good Programs
do not obey Garbage
in, Garbage out — they obey Garbage in, Error messages out
.
Good Programs check all input for validity, issue error messages
for invalid input, and then do not fail for input they consider valid.
Most of R actually follows this philosophy. It is the major reason why R is an actually useful computing environment. Bugs that cause R to crash are exceedingly rare. Also rare are bugs — almost as bad as crashes — that give error messages from some low-level function called by your function, error messages completely incomprehensible to the user because the user has no idea even that you are calling that function much less why.
What to hand in
Your design document, complete or incomplete, and if incomplete a list of questions you need answered to complete it.