Chapter 29 Fixing Problems
29.1 Data sets
We will illustrate fixing problems with several data sets, most of which we have used before. Three old friends are the resin lifetime, cloud seeding, and oven data sets.
> library(cfcdae)
> data(ResinLifetimes)
> lifetimes.fit <- lm(Time~temp,data=ResinLifetimes)
> data(CloudSeeding)
> cloud.fit <- lm(rainfall~seeding,data=CloudSeeding)
> data(Thermocouples)
> Oven.fit <- lm(temp.diff~1,data=Thermocouples)
TissueTearing
. Here weight
is added to a facial tissue stretched across an open
jar until the tissue tears. There are four treatments: tissues
of brand A or brand B, used either dry or wet.
> data(TissueTearing)
> tearing.fit <- lm(tear.wt~treatment,data=TissueTearing)
TissueVaporization
. For these
data, a laser scalpel is drawn across a rat liver at a rate of
.01 inches/second. We measure the width of the vaporized tissue that
results as we use the laser at four different power settings.
> data(TissueVaporization)
> vaporization.fit <- lm(width~power,data=TissueVaporization)
PeaGermination
. Each experimental
unit was a group of 20 seeds. Each group was pre-soaked either 12 or
24 hours and then placed on either potting soil or paper towel. The response
is the number of germinated seeds after 5 days.
> data(PeaGermination)
> peas.fit <- lm(germinated~treatment,data=PeaGermination)
> data(Copepoda)
> Copepoda.fit <- lm(count~pH,data=Copepoda)
Note that the responses in the PeaGermination
and Copepoda
data sets
are counts. Thus we might already be skeptical of models that assume
normal distributions for the responses.
CloudBackup
, which gives the time in seconds to
save a 50MB file to a cloud backup service over the internet and then
recover it. Each of three services is used ten times, with the 30
observations assigning services at random. The 30 runs are done
consecutively, and the data are listed in time order.
> data(CloudBackup)
> backup.fit <- lm(updowntime~service,data=CloudBackup)
A First Course in Design and Analysis of Experiments talks about non-normality, non-constant variance, and dependence one-by-one as well as treating them as if they were unrelated things. It also treats the various approaches to handling violations of assumptions completely separately. In practice, you are going to be assessing all of the assumptions for any given data set and potentially dealing with more than one problem at a time. In addition, you may well try more than one approach to dealing with violations of assumptions in a single data set.
In an attempt to maintain a parallel presentation here and in A First Course in Design and Analysis of Experiments, we present the examples in a technique by technique order. In practice, you are working with one data set and (probably) will be using more than one method on it.
29.2 Power family transformations
A power family transformation replaces the data
with the data raised to some power. The Box-Cox
method helps us choose a good power. We will
use the boxCox()
function from package car
for
this purpose. This function will produce
a plot that shows the likelihood as a function
of the transformation power. The power at the peak
is the estimated value, and the plot shows
a confidence
interval for the power as well.
Any power corresponding to a log-likelihood above the “95%” line is a reasonable power to consider.
There are several optional parameters. The most
useful are lambda
, which can be used to select a vector
of powers to evaluate, and plotit
, which if set to FALSE
will prevent the plot and return a list with the powers
and likelihoods.
Which power should you choose? It’s a compromise between stabilizing the variability, interpretability, sensibility. In the end, the transformation has to work; transform the data and see if the transformed data have stable variance. Interpretability is often a high bar: logs and reciprocals are sometimes interpretable, others less so. Sensibility means we round off to something reasonable; we don’t use the .19223 power, we’d use .2 or .25 or something else.
29.2.1 Example 6.6 Resin lifetime data
The resin lifetime data on the original scale showed considerable increasing variance (right opening megaphone). Let’s try Box-Cox to choose a transformation.> car::boxCox(lifetimes.fit)

> car::boxCox(lifetimes.fit,lambda=seq(-.5,.75,.05))

The peak is very near \(\lambda=.2\), and values between about -.1 and .5 are plausible. We chose power 0 (the logarithm) for simplicity and interpretability.
29.2.2 Example 6.7 Cloud seeding
The cloud seeding data have at least two problems: the residuals are not normally distributed and there is evidence of non-constant variance. There might be outliers, but the apparent outliers could be the result of the non-normal distribution.> plot(cloud.fit,which=c(2,3))


> attach(CloudSeeding)
> seeded <- rainfall[seeding=="seeded"]
> nonseeded <- rainfall[seeding=="nonseeded"]
> qqplot(seeded,nonseeded,log="xy")

> car::boxCox(cloud.fit)

> cloud.logfit <- lm(log(rainfall)~seeding,data=CloudSeeding)
> plot(cloud.logfit,which=1:3)



The normal plot shows a little bit of a long left tail, and the scale-location plot is just the tiniest bit decreasing. Neither of these issues is sufficient to warrant using some other transformation. The fact that 0 was toward the left hand end of the acceptable range of transformations means that we probably went beyond the optimal transformation, but logged data are a lot easier to understand than the .05 power.
There were outliers on the original scale, but transforming to logged data dealt with that issue.> car::outlierTest(cloud.logfit)
No Studentized residuals with Bonferroni p < 0.05
Largest |rstudent|:
rstudent unadjusted p-value Bonferroni p
26 -2.658736 0.010565 0.5494
> anova(cloud.logfit)
Analysis of Variance Table
Response: log(rainfall)
Df Sum Sq Mean Sq F value Pr(>F)
seeding 1 17.007 17.0071 6.4738 0.01408 *
Residuals 50 131.353 2.6271
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> linear.contrast(cloud.logfit,seeding,c(-1,1))
estimates se t-value p-value lower-ci upper-ci
1 1.143781 0.4495342 2.544369 0.01408266 0.240865 2.046697
attr(,"class")
[1] "linear.contrast"
What did seeding do? Look at a contrast between the nonseeded and seeded treatments. The estimated difference on the log scale is 1.14 with a 95% confidence interval from .24 to 2.05. Back on the original scale, this translates into the median rainfall being a factor of \(e^{1.14} = 3.14\) higher in the seeded treatment, with a 95% confidence interval from \(e^{.241}=1.27\) to \(e^{2.047}= 7.74\). So the data are consistent with seeding increasing the rainfall by just over a quarter, but they are also consistent with seeding increasing the rainfall by a factor of almost 8.
29.2.3 Example 6.8 Tearing Tissues
We are looking at how much mass we can place on taut facial tissue before it tears. Tissues can be either of two brands, and tissues can be wet or dry. The p-value for the F-test is quite small, but there is strong evidence of non-constant variance. What looks like long tails in the normal plot is likely the result of mixing normals with different variances; thus fixing non-constant variance will probably fix non-normality as well.> anova(tearing.fit)
Analysis of Variance Table
Response: tear.wt
Df Sum Sq Mean Sq F value Pr(>F)
treatment 3 32400 10800.1 1518.7 < 2.2e-16 ***
Residuals 92 654 7.1
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> plot(tearing.fit,which=1:3)



> range(TissueTearing$tear.wt)
[1] 45 108
> car::boxCox(tearing.fit)

Box-Cox suggests the power \(-.5\), with \(-1\) and 0 both being right at the edge of the interval.
We make the -.5 power transformation, but note that we multiply by -1. When you use a negative power, the big points become the small points and vice versa. Multiplying by -1 means that the big points remain the big points. After this transformation, the residuals look much better and we can proceed with analysis of variance and so on. The overall F-test is highly significant and all the treatments are signficantly different from each other.> tearing.fitmp5 <- lm(-1/sqrt(tear.wt)~treatment,data=TissueTearing)
> plot(tearing.fitmp5,which=1:3)
> anova(tearing.fitmp5)
Analysis of Variance Table
Response: -1/sqrt(tear.wt)
Df Sum Sq Mean Sq F value Pr(>F)
treatment 3 0.024963 0.0083209 2312.8 < 2.2e-16 ***
Residuals 92 0.000331 0.0000036
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> pairwise(tearing.fitmp5,treatment,confidence=.999)
Pairwise comparisons ( hsd ) of treatment
estimate signif diff lower upper
* A_dry - A_wet 0.017871202 0.002140712 0.015730491 0.020011914
* A_dry - B_dry 0.012741368 0.002140712 0.010600657 0.014882080
* A_dry - B_wet 0.044277986 0.002140712 0.042137275 0.046418698
* A_wet - B_dry -0.005129834 0.002140712 -0.007270546 -0.002989123
* A_wet - B_wet 0.026406784 0.002140712 0.024266072 0.028547495
* B_dry - B_wet 0.031536618 0.002140712 0.029395906 0.033677329
> tearing.fit100 <- lm(tear.wt+100~treatment,data=TissueTearing)
> car::boxCox(tearing.fit100,lambda=seq(-5,0,.05))

> car::boxCox(lm(tear.wt-25~treatment,data=TissueTearing))

Note that some “extreme” transformations really make a lot of sense. If your response is miles per gallon and you take the reciprocal transformation (power -1, kind of extreme), then you’re really just doing an analysis on gallons per mile, which makes perfect sense.
Now go back; which transformation should we use? Let’s look at the scale-location plots for the -.5 power, the logarithm, and the logarithm of the data-25.> plot(tearing.fitmp5,which=1)
> plot(lm(log(tear.wt)~treatment,data=TissueTearing),which=1)
> plot(lm(log(tear.wt-25)~treatment,data=TissueTearing),which=1)

There is not a lot to choose between here. It looks like the log transformation doesn’t quite get rid of the non-constant variance (although it’s not all that bad). The other two seem OK, it’s just that neither of them is very interpretable.
29.3 Removing outliers
Outliers are data points that would be unusual or unlikely under the current model and the patterns established by the other data. One standard approach to dealing with outliers is to analyze the data with, and without, the outliers. If your inference doesn’t really change, then you are OK. If your inference does change, then you are in a situation where your analysis and conclusions depend strongly on just a few of the data points. That is not good.
29.3.1 Example 7.9 Width of Vaporized Tissue
A laser scalpel cuts by vaporizing tissue. This experiment looks at how wide the cut is based on the power setting of the laser. If you just look at the analysis of variance, it does not look like the power level matters much.> anova(vaporization.fit)
Analysis of Variance Table
Response: width
Df Sum Sq Mean Sq F value Pr(>F)
power 3 1.3067 0.43558 3.1446 0.08666 .
Residuals 8 1.1081 0.13852
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> plot(vaporization.fit,which=1:3)



> with(TissueVaporization,plot(power.z,width))
> car::outlierTest(vaporization.fit)
rstudent unadjusted p-value Bonferroni p
10 -11.24597 9.8168e-06 0.0001178
That tenth value certainly fails to follow the pattern set by the rest of the data. Maybe it is an incorrect value. Perhaps it was mistyped and should have been 2.12 instead of 1.12. But maybe some liver tissues are just tough and resistant to vaporization. We need the original investigator to go back to lab notebooks and other original sources and determine whether we have the correct value.
In the meantime, we can try refitting with the outlier removed to see if it changes our conclusions.> out10 <- (1:12) != 10
> vaporization.fit.out10 <- lm(width~power,
+ data=TissueVaporization,subset=out10)
> plot(vaporization.fit.out10,which=1:3)



There is a hint of decreasing variance now, but if you look at Box-Cox, it does not recommend making any transformation.
Now look at the test of the single mean null:> anova(vaporization.fit.out10)
Analysis of Variance Table
Response: width
Df Sum Sq Mean Sq F value Pr(>F)
power 3 2.05797 0.68599 82.626 7.907e-06 ***
Residuals 7 0.05812 0.00830
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
After removing the outlier, the treatment differences are highly significant.
Compare the fitted values with and without the outlier:> emmeans::emmeans(vaporization.fit,~power)
power emmean SE df lower.CL upper.CL
5 1.14 0.215 8 0.644 1.64
10 1.56 0.215 8 1.061 2.05
15 1.92 0.215 8 1.424 2.42
19 1.96 0.215 8 1.461 2.45
Confidence level used: 0.95
> emmeans::emmeans(vaporization.fit.out10,~power)
power emmean SE df lower.CL upper.CL
5 1.14 0.0526 7 1.02 1.26
10 1.56 0.0526 7 1.43 1.68
15 1.92 0.0526 7 1.80 2.04
19 2.38 0.0644 7 2.22 2.53
Confidence level used: 0.95
The first three means are the same, and the fourth mean changes a lot when the outlier is excluded. Furthermore, the standard errors shrink by a factor of 4.
With the outlier removed, there is no evidence against our assumptions. With the outlier removed, laser power is highly significant, which is what we would have anticipated from the plot of the data.
We get vastly different results based on whether a single data point is included or excluded. It is difficult to make any conclusions until we know the true status of that point.
Proceeding for the moment under the assumption that point 10 is bad, we could also explore the linearity that appears to be present in the data plot.> vaporization.fit.out10.poly <- lm(width~power.z+I(power.z^2)+I(power.z^3),
+ data=TissueVaporization,subset=out10)
> anova(vaporization.fit.out10.poly)
Analysis of Variance Table
Response: width
Df Sum Sq Mean Sq F value Pr(>F)
power.z 1 2.04093 2.04093 245.8243 1.039e-06 ***
I(power.z^2) 1 0.00784 0.00784 0.9438 0.3637
I(power.z^3) 1 0.00921 0.00921 1.1094 0.3272
Residuals 7 0.05812 0.00830
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> vaporization.fit.out10.linear <- lm(width~power.z,
+ data=TissueVaporization,subset=out10)
> coef(vaporization.fit.out10.linear)
(Intercept) power.z
0.70193050 0.08499035
29.3.2 Finding a typo
Suppose we were typing in the temperatures and made a typo in the resin lifetimes (case 11 was put in as 175 instead of 194). What might we find? Well, the typo certainly stands out in the box plots.
> temp.oops <- ResinLifetimes$temp
> temp.oops[11] <- 175
> with(ResinLifetimes,boxplot(Time~temp.oops))

Figure 29.1: Box plot for resin lifetimes with typo
> separate.oops <- lm(Time~temp.oops,data=ResinLifetimes)
> plot(separate.oops,which=1)

Figure 29.2: Residual plot for resin lifetimes with typo
> car::boxCox(separate.oops)

Figure 29.3: Box-Cox plot for resin lifetimes with typo
> separate.oops.log <- lm(logTime~temp.oops,data=ResinLifetimes)
> plot(separate.oops.log,which=1)

Figure 29.4: Residual plot for logged resin lifetimes with typo
> car::outlierTest(separate.oops.log)
rstudent unadjusted p-value Bonferroni p
11 -5.447548 5.9509e-06 0.00022018
At this stage we would examine our data and discover the typo, fix it, and then go about our business.
29.4 Modified ANOVA
The Welch version of the t-test (confidence intervals) is well known to work well when the two groups have unequal variances. There are a couple of methods that generalize the Welch test to more than two groups. The good news is that the tests have much more accurate p-values than using ordinary ANOVA when the data have non-constant variance. The bad news is that the procedures do not generalize easily or handle inference beyond the test of the single mean null hypothesis.
The brown.forsythe.test()
function in cfcdae
implements the Brown-Forsythe test. The oneway.test()
function in base R implements a different approximation
to the p-value.
29.4.1 Example 6.10 Resin Lifetimes
Let’s illustrate these alternative forms of ANOVA using the resin lifetime data. First, recall what we obtained using standard methods:> anova(lifetimes.fit)
Analysis of Variance Table
Response: Time
Df Sum Sq Mean Sq F value Pr(>F)
temp 4 28027.9 7007.0 101.81 < 2.2e-16 ***
Residuals 32 2202.4 68.8
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> brown.forsythe.test(Time~temp,data=ResinLifetimes)
One-way analysis of means, Brown-Forsythe (unequal variances)
data: Time and temp
F = 111.74, num df = 4.000, denom df = 18.298, p-value = 1.36e-12
> oneway.test(Time~temp,data=ResinLifetimes)
One-way analysis of means (not assuming equal variances)
data: Time and temp
F = 68.943, num df = 4.000, denom df = 14.373, p-value = 3.273e-09
All the methods have very low p-values. One noticeable difference is that the alternative methods use a much lower denominator degrees of freedom.
29.5 Robust methods
There is a class of statistical techniques called robust methods that assume that the errors arise from a distribution that is symmetric but has longer tails than the normal distribution and is thus outlier-prone (robust methods also retain the independence and constant variability assumptions). Roughly speaking, robust methods automatically and internally determine which points appear to be outliers and downweight those points.
The MASS
package includes a function rlm()
that does robust linear models. In addition to estimating
the parameters in the model for the means,
it also produces a robust estimate of the scale
of the statistical errors. Inferential
difficulties arise because we don’t know an
equivalent degrees of freedom for that estimate
of error.
The summary()
function works fine for
an rlm()
fit. The built in anova()
runs, but it doesn’t
estimate an error degrees of freedom so it doesn’t
produce useful output. Instead,
use Anova()
in the car
package. car::Anova()
uses the error
degrees of freedom from the standard linear model; this
is probably still an overestimate of error degrees of
freedom. The functions
in package emmeans
also work with rlm()
fits.
These functions generally default to infinite
degrees of freedom for confidence intervals, but
you can specify some other error degrees of freedom.
29.5.1 Example 6.11 Tissue Vaporization
UseMASS::rlm()
to fit the separate means model:
> vaporization.rfit <- MASS::rlm(width~power,data=TissueVaporization)
> plot(vaporization.rfit,which=1:3)



anova()
function is not helpful;
instead, use car::Anova()
:
> anova(vaporization.rfit)
Analysis of Variance Table
Response: width
Df Sum Sq Mean Sq F value Pr(>F)
power 3 1.9657 0.65523
Residuals 1.5379
> car::Anova(vaporization.rfit)
Analysis of Deviance Table (Type II tests)
Response: width
Df F Pr(>F)
power 3 79.32 2.718e-06 ***
Residuals 8
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
car::Anova()
is using the same error degrees of
freedom as the full-data linear model. This is probably a
little too big. In any case, the p-value is quite
small and similar to what we obtained after deleting
the outlier.
> emmeans::emmeans(vaporization.rfit,~power)
power emmean SE df asymp.LCL asymp.UCL
5 1.14 0.0573 NA 1.03 1.25
10 1.55 0.0573 NA 1.44 1.66
15 1.92 0.0573 NA 1.81 2.03
19 2.34 0.0573 NA 2.22 2.45
Confidence level used: 0.95
> emmeans::emmeans(vaporization.rfit,~power,df=8)
power emmean SE df lower.CL upper.CL
5 1.14 0.0573 8 1.01 1.27
10 1.55 0.0573 8 1.42 1.68
15 1.92 0.0573 8 1.79 2.05
19 2.34 0.0573 8 2.20 2.47
Degrees-of-freedom method: user-specified
Confidence level used: 0.95
Now the intervals are wider.
We can retrieve the weights thatrlm()
used to
down weight the outlier:
> vaporization.rfit$w
[1] 1.00000000 1.00000000 1.00000000 0.49883606 0.56931858 1.00000000 1.00000000 1.00000000
[9] 1.00000000 0.06564224 1.00000000 1.00000000
Somewhat surprisingly, cases 4 and 5 are also down weighted. This makes some sense, as the responses for the second power level are more spread out than the other non-outlier responses.
If we fit a linear model using case weights that are the whatrlm()
gives us, we will
get the same thing as the the rlm()
fit.
Let’s tabulate the coefficients for the rlm()
fit,
the weighted lm()
fit, the lm()
fit with
the outlier removed, and the lm()
fit with the
full data set.
> mytable <- rbind(vaporization.rfit$coefficients,
+ lm(width~power,data=TissueVaporization,weights=vaporization.rfit$w)$coefficients,
+ vaporization.fit.out10$coefficients,
+ vaporization.fit$coefficients)
> rownames(mytable) <- c("rlm","wtd","removed","orig")
> mytable
(Intercept) power1 power2 power3
rlm 1.736293 -0.5962928 -0.18624002 0.1837072
wtd 1.736293 -0.5962928 -0.18624002 0.1837072
removed 1.747917 -0.6079167 -0.19125000 0.1720833
orig 1.643333 -0.5033333 -0.08666667 0.2766667
The first two are exactly the same; the coefficients with the outlier removed are very similar (with the differences being a small fraction of a standard error of the coefficients); and the linear model including all the data is very different.
29.6 Modeling the variance
The gls()
function in package nlme
can fit
models using different variance structures using
its weights
argument. The value assigned
to weights
is called a variance function, and there
are several options in nlme
. Two we will consider are:
weights=varIdent(form=~1|foo)
wherefoo
is a factor. This says to fit a different variance for each level of factorfoo
.weights=varPower(form=~fitted(.))
. This says to make the standard deviation be proportional to a power of the fitted values, with the power to be estimated.weights=varPower(form=~fitted(.),fixed=delta)
. This says to make the standard deviation be proportional to the fitted values raised to the powerdelta
.
29.6.1 Example 6.12 Resin Lifetimes
Let’s fit the two variance models to the resin lifetime data.> library(nlme)
Attaching package: 'nlme'
The following object is masked from 'package:bcfcdae':
fixef
> lifetimes.varsep <- gls(Time~temp,data=ResinLifetimes,weights=varIdent(form=~1|temp))
> lifetimes.varpower <- gls(Time~temp,data=ResinLifetimes,weights=varPower(form=~fitted(.)))
plot()
method plots standardized residuals
against fitted values. For the separate variances model,
we see an even spread all the way across.
> plot(lifetimes.varsep)

> plot(lifetimes.varpower)

type=pearson
divides the residuals by
their estimated standard deviations):
> qqnorm(residuals(lifetimes.varsep,type="pearson"))

> AIC(lifetimes.varsep);AIC(lifetimes.varpower)
[1] 237.2421
[1] 237.1706
There is really nothing to choose between these in terms of AIC. The separate variances model has a higher likelihood but uses more parameters, and they balance out very evenly.
Note: bothlifetimes.varsep
and lifetimes.varpower
were fit using
a method denoted REML (you will
see that in the summary()
below). REML AIC values
can be used to compare two models with
the same mean structure and
different variance structures, so the comparison
above is OK. You cannot compare REML
AIC values with non-REML AIC values. Because
models fit by lm()
do not produce a REML likelihood, you cannot
compare lifetimes.varsep
directly to lifetimes.fit
.
Instead, you must either refit the constant variance
model using gls()
(which will default to REML)
instead of lm()
, or you must refit the gls()
models
using a method="ML"
argument.
> lifetimes.varsep.ML <- gls(Time~temp,data=ResinLifetimes,
+ weights=varIdent(form=~1|temp),
+ method="ML")
> AIC(lifetimes.varsep.ML)
[1] 249.94
> AIC(lifetimes.fit)
[1] 268.1971
Clearly, fitting separate variances is better than fitting a single variance.
The summary for the GLS fit now contains additional information about how the variance was modeled.> summary(lifetimes.varsep)
Generalized least squares fit by REML
Model: Time ~ temp
Data: ResinLifetimes
AIC BIC logLik
237.2421 251.8995 -108.6211
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | temp
Parameter estimates:
175 194 213 231 250
1.0000000 0.7339797 0.4918990 0.1329593 0.2839912
Coefficients:
Value Std.Error t-value p-value
(Intercept) 36.41900 1.272706 28.615413 0.0000
temp1 50.00519 3.788275 13.199988 0.0000
temp2 7.14083 2.911776 2.452398 0.0198
temp3 -11.89934 2.168016 -5.488587 0.0000
temp4 -20.70035 1.370035 -15.109355 0.0000
Correlation:
(Intr) temp1 temp2 temp3
temp1 0.544
temp2 0.180 -0.445
temp3 -0.215 -0.445 -0.268
temp4 -0.880 -0.522 -0.189 0.171
Standardized residuals:
Min Q1 Med Q3 Max
-1.8045963 -0.6400941 -0.1057086 0.3666410 1.9257397
Residual standard error: 13.02881
Degrees of freedom: 37 total; 32 residual
gls()
estimates an overall residual standard error
(shown near the bottom) and then also provides
multipliers for the standard errors in other groups
relative to the overall (which is taken as group 1).
anova()
or car::Anova()
.
anova()
forms tests using the same denominator
degrees of freedom as the lm()
model; this is likely
optimistic (that is, too large). car::Anova()
recognizes that it doesn’t know the denominator
degrees of freedom and uses infinite denominator degrees
of freedom; this is also too large, but at least it
tells us it is doing the large sample approximation
by indicating a chi-squared test rather than an F-test.
> anova(lifetimes.varsep)
Denom. DF: 32
numDF F-value p-value
(Intercept) 1 959.0707 <.0001
temp 4 78.5368 <.0001
> car::Anova(lifetimes.varsep)
Analysis of Deviance Table (Type II tests)
Response: Time
Df Chisq Pr(>Chisq)
temp 4 314.15 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> use <- with(ResinLifetimes,(temp==175 | temp==250))
> lifetimes.firstlast <- gls(Time~temp,data=ResinLifetimes,
+ subset=use,
+ weights= varIdent(form=~1|temp))
> summary(lifetimes.firstlast)
Generalized least squares fit by REML
Model: Time ~ temp
Data: ResinLifetimes
Subset: use
AIC BIC logLik
96.33582 98.27545 -44.16791
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | temp
Parameter estimates:
175 250
1.0000000 0.2839908
Coefficients:
Value Std.Error t-value p-value
(Intercept) 49.14843 2.423867 20.27687 0
temp1 37.27576 2.423867 15.37863 0
Correlation:
(Intr)
temp1 0.806
Standardized residuals:
Min Q1 Med Q3 Max
-1.3815526 -0.5011049 -0.1774938 0.3040356 1.7824815
Residual standard error: 13.02882
Degrees of freedom: 14 total; 12 residual
We get a t-statistic of 15.38 for the group differences,
and when you do anova()
it’s going to use 12 degrees
of freedom for error.
> with(ResinLifetimes,t.test(Time[1:8],Time[32:37]))
Welch Two Sample t-test
data: Time[1:8] and Time[32:37]
t = 15.379, df = 8.4496, p-value = 1.795e-07
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
63.47533 85.62771
sample estimates:
mean of x mean of y
86.42419 11.87267
We get the same t-statistic, but the t-test is using 8.5 degrees of freedom. This is much more realistic, because the variance of the difference is dominated by the larger variance, and it is estimated with only 7 degrees of freedom. For this test, the difference in degrees of freedom doesn’t matter, because the p-values are tiny either way, but it could matter for some data sets.
The Welch t-test uses the Satterthwaite approximation to get degrees of freedom. Functions inemmeans
also use the Satterthwaite approximation
for more general contrasts. This gives us
more confidence that the reported p-values and confidence
levels will be realistic.
> library(emmeans)
> lifetimes.emmsep <- emmeans::emmeans(lifetimes.varsep,~temp)
> lifetimes.emmsep
temp emmean SE df lower.CL upper.CL
175 86.4 4.606 6.99 75.53 97.3
194 43.6 3.381 6.99 35.56 51.6
213 24.5 2.266 7.00 19.16 29.9
231 15.7 0.655 6.00 14.12 17.3
250 11.9 1.511 5.00 7.99 15.8
Degrees-of-freedom method: satterthwaite
Confidence level used: 0.95
When we just estimate group means in the separate variances model, the degrees of freedom in each group are \(n_i-1\), as we would expect.
Things are much more interesting if we estimate treatment means using the power model of variances:> emmeans::emmeans(lifetimes.varpower,~temp)
temp emmean SE df lower.CL upper.CL
175 86.4 5.14 9.2 74.84 98.0
194 43.6 3.03 24.5 37.32 49.8
213 24.5 1.94 29.3 20.55 28.5
231 15.7 1.47 16.5 12.60 18.8
250 11.9 1.28 10.9 9.05 14.7
Degrees-of-freedom method: satterthwaite
Confidence level used: 0.95
The standard errors are a little different, particularly at temperature 231, but the really interesting thing is the equivalent degrees of freedom, which peak in the middle and are lower on either end. This reflects the fact that assuming the variance is proportional to some power of the fitted values, that variance will be better estimated (with more equivalent degrees of freedom) in the middle of the range.
More general contrasts give us a variety of degrees of freedom:> contrast(emmeans(lifetimes.varsep,~temp),method="pairwise",adjust="holm")
contrast estimate SE df t.ratio p.value
temp175 - temp194 42.86 5.71 12.84 7.502 <.0001
temp175 - temp213 61.90 5.13 10.19 12.059 <.0001
temp175 - temp231 70.71 4.65 7.27 15.197 <.0001
temp175 - temp250 74.55 4.85 8.44 15.379 <.0001
temp194 - temp213 19.04 4.07 12.22 4.678 0.0020
temp194 - temp231 27.84 3.44 7.52 8.084 0.0003
temp194 - temp250 31.69 3.70 9.53 8.557 0.0001
temp213 - temp231 8.80 2.36 8.15 3.731 0.0112
temp213 - temp250 12.65 2.72 11.44 4.644 0.0020
temp231 - temp250 3.85 1.65 6.85 2.336 0.0529
Degrees-of-freedom method: satterthwaite
P value adjustment: holm method for 10 tests
> contrast(emmeans(lifetimes.varsep,~temp),list(temp=c(0,0,1,-2,1)))
contrast estimate SE df t.ratio p.value
temp 4.96 3.02 15.7 1.640 0.1209
Degrees-of-freedom method: satterthwaite
No evidence of curvature among the last three levels of temperature.
29.6.2 Example 6.13 Tearing Tissues
Let’s go back to the tearing tissues data. There are four treatments:> levels(TissueTearing$treatment)
[1] "A_dry" "A_wet" "B_dry" "B_wet"
> tissues.cfs <- cbind(c(.5,.5,-.5,-.5),c(.5,-.5,.5,-.5),c(.5,-.5,-.5,.5))
> tissues.cfs
[,1] [,2] [,3]
[1,] 0.5 0.5 0.5
[2,] 0.5 -0.5 -0.5
[3,] -0.5 0.5 -0.5
[4,] -0.5 -0.5 0.5
> tissues.varsep <- gls(tear.wt~treatment,data=TissueTearing,weights=varIdent(form=~1|treatment))
> plot(tissues.varsep)

> AIC(tissues.varsep)
[1] 441.5931
> AIC(gls(tear.wt~treatment,data=TissueTearing)) # refit using REML
[1] 467.046
> summary(tissues.varsep)
Generalized least squares fit by REML
Model: tear.wt ~ treatment
Data: TissueTearing
AIC BIC logLik
441.5931 461.7674 -212.7965
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | treatment
Parameter estimates:
A_dry A_wet B_dry B_wet
1.0000000 0.4269964 0.6777834 0.3354123
Coefficients:
Value Std.Error t-value p-value
(Intercept) 74.18750 0.2721713 272.57649 0
treatment1 25.10417 0.6417957 39.11551 0
treatment2 -2.68750 0.3683360 -7.29633 0
treatment3 3.97917 0.4788220 8.31032 0
Correlation:
(Intr) trtmn1 trtmn2
treatment1 0.543
treatment2 -0.432 -0.531
treatment3 0.027 -0.561 -0.195
Standardized residuals:
Min Q1 Med Q3 Max
-2.2594011 -0.7938436 -0.0610649 0.8723702 2.1625633
Residual standard error: 4.026857
Degrees of freedom: 96 total; 92 residual
> tissues.emm <- emmeans(tissues.varsep,~treatment)
> contrast(tissues.emm,list(treatment=tissues.cfs))
contrast estimate SE df t.ratio p.value
treatment.1 22.42 0.544 56.3 41.181 <.0001
treatment.2 29.08 0.544 56.3 53.428 <.0001
treatment.3 -1.29 0.544 56.3 -2.373 0.0211
Degrees-of-freedom method: satterthwaite
Brand A has a response about 22.42 higher than brand B, and dry has a response about 29.08 higher than wet; these is strong evidence that both of these are highly significant. On the other hand, there is only marginal evidence that the brand differences change between wet and dry.
29.7 Modeling dependence
There are many models of dependence, but we will only look at the simplest model for time dependence. The autoregressive model of order 1 (AR1) says that data values k time steps apart have a correlation of \(\rho^k\). It is also possible to have blocks of data that are have temporal dependence within the block but independence between blocks.
Please note, a model with AR1 dependence only makes sense if there is a time order (or, potentially, some other linear order that is indicative of correlation) to the data. Do not try to use AR1 models on data that lack a time order.
In R, you can fit autocorrelated data
using the gls
function from package nlme
.
You must set the optional correlation
argument
to indicate the type of correlation present.
We only use the simple AR1 type correlation.
You need some variable x
that indicates time.
Setting
correlation=corAR1(form=~x)
says to fit the autocorrelation across time x
.
You can also indicate groups, where the data
are independent between groups but have
AR1 correlation within groups using
correlation=corAR1(form=~x|group)
where group
is a factor that indicates the different groups.
29.7.1 Example 6.14 Thermocouples
The thermocouples data consists of 64 consecutive measurements of a temperature difference. Plotting the data in time order we saw that high values tend to clump and low values tend to clump. That is a visual indication of autocorrelation, but you can also use Durbin-Watson or other tests to quantify it.
We just want to fit the mean, so ignoring correlation we can do:> Oven.ols <- lm(temp.diff~1,data=Thermocouples)
> mytime <- 1:64
> Oven.ar1 <- gls(temp.diff~1,data=Thermocouples,cor=corAR1(form=~mytime))
That’s all there is to it.
AIC prefers to the AR1 model, but the difference is not overwhelming.> AIC(Oven.ar1)
[1] -278.9285
> AIC(gls(temp.diff~1,data=Thermocouples)) # for REML
[1] -277.3358
> summary(Oven.ols)
Call:
lm(formula = temp.diff ~ 1, data = Thermocouples)
Residuals:
Min 1Q Median 3Q Max
-0.09125 -0.01125 0.00375 0.00875 0.04875
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.141250 0.003138 1001 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.0251 on 63 degrees of freedom
The standard error of the estimated mean is .00314.
Now look at the AR1 fit:> summary(Oven.ar1)
Generalized least squares fit by REML
Model: temp.diff ~ 1
Data: Thermocouples
AIC BIC logLik
-278.9285 -272.4991 142.4643
Correlation Structure: AR(1)
Formula: ~mytime
Parameter estimate(s):
Phi
0.243579
Coefficients:
Value Std.Error t-value p-value
(Intercept) 3.141487 0.004027105 780.0855 0
Standardized residuals:
Min Q1 Med Q3 Max
-3.6228911 -0.4548726 0.1391309 0.3371320 1.9211413
Residual standard error: 0.02525238
Degrees of freedom: 64 total; 63 residual
Now we see that the standard error of the estimated mean is .00403, or about 28% larger. That means we need longer confidence intervals to get the correct coverage; the shorter confidence intervals coming from the fit assuming independence will not have the desired coverage.
When the data are dependent, each additional data point does not give a full unit’s worth of new information. The ratio of variances is> (.003138/.004027)^2
[1] 0.6072152
Thus accounting for correlation, it is as if we had only \(.61*64=39\) data points.
29.7.2 Example 6.15 Cloud Backup Times
The cloud backup data show non-constant variance, and Box-Cox suggests a log transformation.> plot(backup.fit,which=1)
> car::boxCox(backup.fit)

> backup.logfit <- lm(log(updowntime)~service,data=CloudBackup)
> plot(backup.logfit,which=1)

> plot(residuals(backup.logfit),type='b')
> abline(h=0)

> car::durbinWatsonTest(backup.logfit)
lag Autocorrelation D-W Statistic p-value
1 0.44276 1.06326 0.002
Alternative hypothesis: rho != 0
> mytime <- 1:30
> backup.logar1 <- gls(log(updowntime)~service,data=CloudBackup,
+ correlation=corAR1(form=~mytime))
> AIC(backup.logar1)
[1] -26.18125
> AIC(gls(log(updowntime)~service,data=CloudBackup))
[1] -19.74796
> summary(backup.logar1)
Generalized least squares fit by REML
Model: log(updowntime) ~ service
Data: CloudBackup
AIC BIC logLik
-26.18125 -19.70206 18.09062
Correlation Structure: AR(1)
Formula: ~mytime
Parameter estimate(s):
Phi
0.562057
Coefficients:
Value Std.Error t-value p-value
(Intercept) 5.579809 0.04224382 132.08581 0
service1 -0.483829 0.02136900 -22.64164 0
service2 -0.138539 0.02482215 -5.58126 0
Correlation:
(Intr) servc1
service1 -0.012
service2 0.023 -0.522
Standardized residuals:
Min Q1 Med Q3 Max
-2.0350429 -0.8235312 0.1645706 0.8880758 1.4037069
Residual standard error: 0.1276129
Degrees of freedom: 30 total; 27 residual
> emmeans(backup.logfit,~service)
service emmean SE df lower.CL upper.CL
1 5.07 0.0387 27 4.99 5.15
2 5.45 0.0387 27 5.37 5.53
3 6.22 0.0387 27 6.14 6.30
Results are given on the log (not the response) scale.
Confidence level used: 0.95
> emmeans(backup.logar1,~service)
service emmean SE df lower.CL upper.CL
1 5.10 0.0471 5.01 4.97 5.22
2 5.44 0.0495 5.61 5.32 5.56
3 6.20 0.0477 5.24 6.08 6.32
Degrees-of-freedom method: satterthwaite
Results are given on the log (not the response) scale.
Confidence level used: 0.95
The estimated means are not exactly the same in the two models, and the standard errors for the estimates are greater (and with fewer degrees of freedom) in the AR1 model and also not all the same.
The pattern of the treatments affects the standard errors of estimated means and differences of means. Here the treatments were randomized, so the three different services are intermingled over time. The positive autocorrelation increases the standard error of means, with more increase if the observations are closer in time and less increase if the observations are farther apart in time. Here a typical ratio of standard deviations is \(.477/.387=1.23\), or a 23% increase in the standard error. If we had not randomized and had instead run each service for ten consecutive time slots, then the ratio of AR1 model standard errors of means to independence model standard errors of means would be 1.72:> sqrt(sum(.56^abs(outer(1:10,1:10,FUN='-')))/10)
[1] 1.722991
(No, I don’t expect you to understand where that came from.)
Because the treatments are intermingled and the autocorrelation is positive, the standard errors for differences of means are smaller under the AR1 model than they are under the independence model:> confint(pairs(emmeans(backup.logar1,~service)))
contrast estimate SE df lower.CL upper.CL
service1 - service2 -0.345 0.0403 21.1 -0.447 -0.244
service1 - service3 -1.106 0.0365 19.2 -1.199 -1.013
service2 - service3 -0.761 0.0426 22.0 -0.868 -0.654
Degrees-of-freedom method: satterthwaite
Results are given on the log (not the response) scale.
Confidence level used: 0.95
Conf-level adjustment: tukey method for comparing a family of 3 estimates
> confint(pairs(emmeans(backup.logfit,~service)))
contrast estimate SE df lower.CL upper.CL
service1 - service2 -0.383 0.0547 27 -0.518 -0.247
service1 - service3 -1.156 0.0547 27 -1.292 -1.021
service2 - service3 -0.773 0.0547 27 -0.909 -0.638
Results are given on the log (not the response) scale.
Confidence level used: 0.95
Conf-level adjustment: tukey method for comparing a family of 3 estimates
The differences are highly significant under both models.
The ratio of backup times for service 1 and service 2 is \(\exp(-.345) = .71\), with a confidence interval from \(\exp(-.447)\) to \(\exp(-.244)\), or (.64, .78). The difference between service 1 and service 3 is even greater.
Having estimates of the ratios of backup times is probably adequate, but suppose we wanted results on the original (time) scale. We could instead fit a model on the original scale with different variances by treatment group and AR1 correlation.> backup.separ1 <- gls(updowntime~service,data=CloudBackup,
+ correlation=corAR1(form=~mytime),
+ weights=varIdent(form=~1|service))
> confint(pairs(emmeans(backup.separ1,~service)))
contrast estimate SE df lower.CL upper.CL
service1 - service2 -65.5 7.4 12.84 -85.1 -45.9
service1 - service3 -332.4 19.9 3.93 -403.7 -261.0
service2 - service3 -266.9 20.1 4.76 -333.4 -200.4
Degrees-of-freedom method: satterthwaite
Confidence level used: 0.95
Conf-level adjustment: tukey method for comparing a family of 3 estimates
On the original scale, we estimate service 1 to be 65.5 seconds faster than service 2 (confidence interval (45.9 to 85.1)).
29.8 Using other distributions: Generalized Linear Models
Generalized linear models allow us to fit mean structures to data from non-normal distributions such as binomial, Poisson, or gamma. The mean of the data point \(i\) is \(\mu_i\). There is a function \(g()\) called the link function, and we assume that \(g(\mu_i)\) can be described with some kind of linear model such as separate means, polynomial, etc. For each distribution there is a canonical link function; using the canonical link function makes some things simpler, but they are not typically required.
Many distributions have a dispersion parameter \(\phi\); for example, the normal distribution has a variance and the gamma distribution has a reciprocal shape. For the binomial and Poisson, the variability is determined by the mean and we assume \(\phi=1\).
The deviance for a model is twice the difference in log likelihood between a model of interest and the saturated model (the saturated model has a rich enough structure to compute a separate mean for each response). For models where \(\phi\) must be fit, the deviance is twice the difference in log likelihood between a model of interest and the saturated model multiplied by \(\phi\). We estimate \(\phi\) as the residual deviance divided by the residual degrees of freedom.
The analysis of deviance partitions the deviance in analogy with how ANOVA partitions sums of squares. Each time we make the mean structure richer, we use additional parameters and we reduce the deviance. In models where \(\phi=1\) (binomial and Poisson), terms can be tested by treating the incremental deviance for the term as being distributed under the null as chi-squared with degrees of freedom equal to the number of incremental parameters for the term. In fact, this is the likelihood ratio test. When \(\phi\) must be estimated, form a test statistic for a term by dividing its incremental deviance by its incremental degrees of freedom, and then dividing that ratio by our estimate of \(\phi\). Treat that ratio as having an F distribution under the null, with numerator and denominator degrees of freedom taken as the incremental degrees of freedom for the term and the residual degrees of freedom used to estimate \(\phi\).
In R, the glm()
function fits generalized linear models.
The family
argument determines the type of
distribution to use. By default, R will use
the canonical link, but you can specify other link functions.
The summary()
function works as you would expect,
and anova()
applied to a GLM produces an
analysis of deviance.
29.8.1 Example 6.16 Pea Germination
There are eight batches of 20 seeds. Each batch is randomly assigned to one of four treatments, and the response is the number of seeds in a batch that germinate. This can reasonably be modeled as binomial with \(N=20\) and \(p\) depending on the treatments.
With a binomial family, you specify the response as a two column matrix, with the first column being the number of successes and the second column being the number of failures. Thus> peas.glm <- glm(cbind(germinated,nongerminated)~treatment,
+ data=PeaGermination,family=binomial)
> summary(peas.glm)
Call:
glm(formula = cbind(germinated, nongerminated) ~ treatment, family = binomial,
data = PeaGermination)
Deviance Residuals:
1 2 3 4 5 6 7 8
0.3292 -0.7039 -0.3292 0.4966 -0.3487 0.7329 0.3487 -0.4809
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.3945 0.2077 1.900 0.0575 .
treatment1 0.3363 0.3164 1.063 0.2878
treatment2 0.4528 0.3204 1.413 0.1576
treatment3 -2.3405 0.3968 -5.899 3.66e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 58.5202 on 7 degrees of freedom
Residual deviance: 1.9703 on 4 degrees of freedom
AIC: 33.856
Number of Fisher Scoring iterations: 4
The linear predictor for the success probability is on the logit scale: \(logit(p) = \log(p/(1-p))\). You can invert the logit via \(p = \exp(logit)/(1+\exp(logit))\). An increase in the linear predictor of \(\alpha\) means that the odds \(p/(1-p)\) have increased by a factor of \(\exp(\alpha)\). Thus the odds for success in treatment 1 are \(\exp(.34 - -2.34) = 14.6\) times as large as for treatment 3.
The deviance residuals say something about how
far each observation is from the fitted value.
The sum of squares of the deviance residuals
will be the residual deviance, here, 1.97.
residuals(fit,type="deviance")
also produces
the deviance residuals.
anova()
:
> anova(peas.glm)
Analysis of Deviance Table
Model: binomial, link: logit
Response: cbind(germinated, nongerminated)
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev
NULL 7 58.52
treatment 3 56.55 4 1.97
> pchisq(56.55,3,lower.tail = FALSE)
[1] 3.206016e-12
so the treatment effects are highly significant.
We can get estimated treatment means on the linear predictor scale in the usual way:> emmeans(peas.glm,~treatment)
treatment emmean SE df asymp.LCL asymp.UCL
soil_12 0.731 0.338 Inf 0.0692 1.39
soil_24 0.847 0.345 Inf 0.1710 1.52
towel_12 -1.946 0.478 Inf -2.8830 -1.01
towel_24 1.946 0.478 Inf 1.0089 2.88
Results are given on the logit (not the response) scale.
Confidence level used: 0.95
> pairs(emmeans(peas.glm,~treatment))
contrast estimate SE df z.ratio p.value
soil_12 - soil_24 -0.116 0.483 Inf -0.241 0.9951
soil_12 - towel_12 2.677 0.585 Inf 4.574 <.0001
soil_12 - towel_24 -1.215 0.585 Inf -2.076 0.1609
soil_24 - towel_12 2.793 0.590 Inf 4.738 <.0001
soil_24 - towel_24 -1.099 0.590 Inf -1.863 0.2440
towel_12 - towel_24 -3.892 0.676 Inf -5.756 <.0001
Results are given on the log odds ratio (not the response) scale.
P value adjustment: tukey method for comparing a family of 4 estimates
This shows that the towel-12 treatment is lower than the other three, which cannot really be distinguished. The “infinite” degrees of freedom means that we are using an asymptotic approximate distribution for critical values.
We can back transform them usingregrid()
:
> regrid(emmeans(peas.glm,~treatment))
treatment prob SE df asymp.LCL asymp.UCL
soil_12 0.675 0.0741 Inf 0.5299 0.820
soil_24 0.700 0.0725 Inf 0.5580 0.842
towel_12 0.125 0.0523 Inf 0.0225 0.227
towel_24 0.875 0.0523 Inf 0.7725 0.977
Confidence level used: 0.95
By default, regrid()
back transforms based on
the link function.
29.8.2 Example 6.17 Cloud Seeding, continued
The rainfall amounts in the cloud seeding data were asymmetrically distributed with a long tail to the right. In a previous example we used a log transformation of the data; that both equalized the variances and made the data more normally distributed. Suppose instead that we want to do the inference on the rainfall scale rather than the log of rainfall scale. We will need a technique that handles both the non-normal distribution and the non-constant variance.
Gamma distributions are distributions for positive data that have long tails to the right. For gamma distributions with the same shape parameter, the variance is proportional to the square of the mean. If the variance divided by the square of the mean is (roughly) the same in the seeded and unseeded clouds, that suggests that they might both follow a gamma distribution with the same shape. That is approximately the case for these data:.> var(nonseeded)/mean(nonseeded)^2
[1] 2.861687
> var(seeded)/mean(seeded)^2
[1] 2.168022
> cloud.glmfit <- glm(rainfall~seeding,data=CloudSeeding,
+ family=Gamma(link="identity"))
> summary(cloud.glmfit)
Call:
glm(formula = rainfall ~ seeding, family = Gamma(link = "identity"),
data = CloudSeeding)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.86689 -1.42422 -0.73570 -0.02294 2.93868
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 303.29 73.34 4.135 0.000135 ***
seeding1 -138.70 73.34 -1.891 0.064406 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for Gamma family taken to be 2.514854)
Null deviance: 119.85 on 51 degrees of freedom
Residual deviance: 107.65 on 50 degrees of freedom
AIC: 682.33
Number of Fisher Scoring iterations: 3
We can get an estimate of the difference in means as \(2 \times -138.7 = -277.4\) with a standard error of \(2 \times 73.34 = 146.68\). An approximate 95% confidence interval for the difference is from \(-277.4-2 \times 146.8 = -571\) up to \(-277.4+2 \times 146.8 = 16.2\). The reported p-value is .064; technically, this is for the “Wald test”, based on an estimate divided by its standard error.
A generally more accurate p-value comes from the analysis of deviance:> anova(cloud.glmfit)
Analysis of Deviance Table
Model: Gamma, link: identity
Response: rainfall
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev
NULL 51 119.85
seeding 1 12.201 50 107.65
> pf(12.2/1/2.51,1,50,lower=FALSE)
[1] 0.03210672
The p-value of .03 should be a little more accurate than the .06 from the Wald test.
As a diagnostic, we should look at how well the data conform to the gamma distribution. The seeded and unseeded clouds have different means, so we should rescale them before combining them.> rescaled <- c(nonseeded/mean(nonseeded),seeded/mean(seeded))
> qqplot(rescaled,2.5*qgamma((1:52)/53,.4))
> abline(0,1)

This is not perfectly straight, but it is pretty good and gives us confidence in the results.
29.8.3 Example 6.18 Copepoda
Artificial wetlands are treated with snowmelt water with either neutral or reduced pH. After treatment, we observed Copepoda counts from each of the six wetlands. Given that these are counts, a Poisson model is a reasonable starting point.> copepoda.poisson <- glm(count~pH,data=Copepoda,family=poisson())
By default, a Poisson model uses a log link function,
but you can change that in the poisson()
function
if you want to.
> summary(copepoda.poisson)
Call:
glm(formula = count ~ pH, family = poisson(), data = Copepoda)
Deviance Residuals:
1 2 3 4 5 6
4.698 -2.173 -2.952 -8.436 -1.605 8.067
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 5.09496 0.03212 158.638 < 2e-16 ***
pH1 -0.14148 0.03212 -4.405 1.06e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 193.93 on 5 degrees of freedom
Residual deviance: 174.32 on 4 degrees of freedom
AIC: 219.29
Number of Fisher Scoring iterations: 5
The deviance residuals ought to be of roughly the magnitude of standardized residuals; instead, they are very large. For a Poisson model, the residual deviance divided by residual degrees of freedom ought to be roughly 1; instead, it is \(174.32/4 = 43.58\). These data have much more variability than we would expect from a Poisson distribution.
One approach to handling this hyper-Poisson variation is to use a quasi-likelihood model, in this case,quasipoisson
.
> copepoda.quasip <- glm(count~pH,data=Copepoda,family=quasipoisson())
> summary(copepoda.quasip)
Call:
glm(formula = count ~ pH, family = quasipoisson(), data = Copepoda)
Deviance Residuals:
1 2 3 4 5 6
4.698 -2.173 -2.952 -8.436 -1.605 8.067
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.0950 0.2116 24.075 1.77e-05 ***
pH1 -0.1415 0.2116 -0.669 0.54
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for quasipoisson family taken to be 43.42048)
Null deviance: 193.93 on 5 degrees of freedom
Residual deviance: 174.32 on 4 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 5
the pH effect is no longer significant.
We can also test via the deviance:> anova(copepoda.quasip)
Analysis of Deviance Table
Model: quasipoisson, link: log
Response: count
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev
NULL 5 193.93
pH 1 19.601 4 174.32
The F-test is \((19.601/1)/43.58 = .45\) with 1 and 4 degrees of freedom. Note that the square of the t-statistic above is essentially this F-test.
An alternative to a quasi-Poisson model is to fit a negative binomial model. Negative binomial distributions have more variance than a Poisson with the same mean, with an additional parameter that controls the amount of extra variance. Theglm.nb()
function in the MASS
package will fit a
GLM with a negative binomial response.
> copepoda.negbin <- MASS::glm.nb(count~pH,data=Copepoda)
> summary(copepoda.negbin)
Call:
MASS::glm.nb(formula = count ~ pH, data = Copepoda, init.theta = 5.077672042,
link = log)
Deviance Residuals:
1 2 3 4 5 6
0.7246 -0.3621 -0.4971 -1.8268 -0.3054 1.3708
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 5.0950 0.1840 27.690 <2e-16 ***
pH1 -0.1415 0.1840 -0.769 0.442
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for Negative Binomial(5.0777) family taken to be 1)
Null deviance: 6.8023 on 5 degrees of freedom
Residual deviance: 6.2128 on 4 degrees of freedom
AIC: 73.788
Number of Fisher Scoring iterations: 1
Theta: 5.08
Std. Err.: 2.95
2 x log-likelihood: -67.788
The estimated coefficients are the same as the Poisson and quasi-Posson models, but the standard errors for the coefficients in the negative binomial model are similar to those in the quasi-poisson model.
We can also do an analysis of deviance:> anova(copepoda.negbin)
Warning in anova.negbin(copepoda.negbin): tests made without re-estimating 'theta'
Analysis of Deviance Table
Model: Negative Binomial(5.0777), link: log
Response: count
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 5 6.8023
pH 1 0.58948 4 6.2128 0.4426
Overall, the quasi-Poisson and negative binomial approaches are producing very similar inferences.
29.9 Nonparametric methods
Nonparametric methods do not assume parametric distributional shapes and typically derive their inferences from other potential randomization outcomes. Examples of these tests include randomization tests and rank tests.
29.9.1 Example 6.19 Cloud Seeding, continued
FunctionpermTS()
in the perm
package does
a randomization analogue of the two-sample t-test.
It computes the mean in the first group minus the mean
in the second group and then sees where the observed
difference of means falls in the distribution of
differences of means across all potential randomizations.
> perm::permTS(rainfall~seeding,data=CloudSeeding)
Permutation Test using Asymptotic Approximation
data: rainfall by seeding
Z = -1.9421, p-value = 0.05213
alternative hypothesis: true mean seeding=nonseeded - mean seeding=seeded is not equal to 0
sample estimates:
mean seeding=nonseeded - mean seeding=seeded
-277.3962
Here we see a p-value of .052, which is similar
to the p-value of the t-test.
This p-value is for a two-sided alternative;
permTS()
has an optional argument that allows you
to specify a one-sided alternative. Also note,
permutation tests sometimes compute the complete
permutation distribution (particularly for small
data sets), but sometimes the permutation distribution
is sampled (particularly for large data sets). When
sampling is done, you may see a slightly different
p-value in repeated applications of the test.
> perm::permTS(log(rainfall)~seeding,data=CloudSeeding)
Permutation Test using Asymptotic Approximation
data: log(rainfall) by seeding
Z = -2.4179, p-value = 0.01561
alternative hypothesis: true mean seeding=nonseeded - mean seeding=seeded is not equal to 0
sample estimates:
mean seeding=nonseeded - mean seeding=seeded
-1.143781
This has a smaller p-value, which is again similar to what we obtained with the t-test.
The difference of means of logs is the same as the log of the ratio of geometric means. Thus working with logged data is like working with the original data but using a different summary statistic instead of the difference of group means.
For a rank-based test we can do:> wilcox.test(rainfall~seeding,data=CloudSeeding)
Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot compute exact p-value
with ties
Wilcoxon rank sum test with continuity correction
data: rainfall by seeding
W = 203, p-value = 0.01383
alternative hypothesis: true location shift is not equal to 0
This p-value is close to what we get with a t-test when working with logged data, although since this is a rank test, logging the data will not change the rank test.
When the assumptions of the parametric test (e.g., the t-test) are met, the randomization, rank, and parametric procedures will generally agree.