Date Assigned: Tuesdau, Movember 21, 2023
Date Due: Wednesday, December 13, 2023 at one minute before midnight
Do using Rmarkdown. Upload two files to Canvas before the time Canvas says the homework is due
the R markdown source (by convention this has extension .Rmd
) and
the R markdown output. Use output format HTML or output format PDF (your choice). This must be the result of running Rmarkdown on the source you submit.
Rmarkdown file
and the corresponding output
cover the basics of Rmarkdown. For the basics of markdown text formatting, see
For further info on Rmarkdown, the source for all of the course notes is always available for seeing how anything in any of the course notes was done.
Solve each problem. Explain your reasoning. No credit for answers with no explanation.
The data are the data in Table 10.9 in Agresti. These data can be read into R as follows
u <- "http://www.stat.umn.edu/geyer/5421/data/table-10.9.txt"
foo <- read.table(u, header = TRUE)
sapply(foo, class)
## survival gestation smoking age counts
## "character" "character" "character" "character" "integer"
If you had trouble doing the above, then download the file in a separate step and then read it, as discussed in an announcement on the course home page.
Which single hierarchical model has the lowest AIC?
Which group of hierarchical models has the lowest AIC and has Akaike weights adding up to 0.95?
Which models listed in part (b) are graphical?
Which single graphical model has the lowest AIC?
Which group of graphical models has the lowest AIC and has Akaike weights adding up to 0.95?
For the five graphical models with the lowest AIC, draw the graphs and interpret them by giving the implied conditional independence relationships.
In all parts it may be helpful to know that the result of R generic function summary
applied to the result of R function glmbb
is a list, the results
component of which is the printed data frame. For example
gout <- glmbb(...)
sout <- summary(gout)
then sout$results$criterion
is the vector of criteria (AIC, BIC, or AICc, as the case may be), sout$results$weight
is the vector of weights, and sout$results$formula
is the vector of formulas expressed as character strings. The R function as.formula
converts one character string to a formula. The R function isGraphical
in the R package glmbb
tells whether a formula corresponds to a graphical model.
For each of the top five models (according to AIC) in part (a) of problem 1 on this homework, what do the “observed equals expected” and “maximum entropy” principles say about the maximum likelihood estimates?
For the horseshoe crab data
library(CatDataAnalysis)
data(table_4.3)
names(table_4.3)
## [1] "color" "spine" "width" "satell" "weight" "y"
sapply(table_4.3, class)
## color spine width satell weight y
## "integer" "integer" "numeric" "integer" "integer" "integer"
table_4.3 <- transform(table_4.3, color = as.factor(color))
in problem 3.2 we did a Bayesian analysis of the model having formula
satell ~ 0 + color + weight
If instead we do a frequentist analysis
gout <- glm(satell ~ 0 + color + weight, family = poisson,
data = table_4.3)
summary(gout)
##
## Call:
## glm(formula = satell ~ 0 + color + weight, family = poisson,
## data = table_4.3)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## color2 -4.978e-02 2.331e-01 -0.214 0.8309
## color3 -2.549e-01 1.974e-01 -1.291 0.1967
## color4 -4.996e-01 1.959e-01 -2.551 0.0108 *
## color5 -5.018e-01 2.156e-01 -2.328 0.0199 *
## weight 5.462e-04 6.811e-05 8.019 1.07e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 1050.8 on 173 degrees of freedom
## Residual deviance: 551.8 on 168 degrees of freedom
## AIC: 917.1
##
## Number of Fisher Scoring iterations: 6
What does the “observed equals expected” principle say about this model? What are the submodel canonical sufficient statistics? Calculate their expected values. Interpret these submodel canonical sufficient statistics. What is their meaning?