Chapter 14 Example 3.4: Fitting Models for a Single Experimental Factor

OK, it’s time to start fitting some models to data. At this stage of the game there are two model fitting functions to use: lm() and/or aov(). I just use lm() (linear model). The basic arguments to lm() are a model and an optional data frame.

There are many kinds of models that you can fit, but to begin with, let’s put them in two types. The first type just considers data coming from each treatment as being in one or more groups, with no link between the groups. The second type of model requires a numerical aspect for each treatment (generically called a dose), and then assumes that the responses in each treatment follow a functional form based on the dose.

14.1 Models Based on Groups

The two models of primary interest are the single mean model, where we just have the overall mean \(\mu\), and the separate means model, where each treatment has its own mean \(\mu + \alpha_i\) (or \(\mu_i\)).

In R, the form of the model is response variable, then a tilde (which we interpret as “is modeled by”) and then the explanatory variable(s). For the single mean model, the explanatory variable will be a “1”, indicating a constant or overall mean. This constant is added to all models as the first explanatory term unless you tell R to leave it out by having -1 or +0 in the model.

For the separate means model, the right hand side is a factor variable. In our case, we have just the single factor variable temp representing the different temperature treatments.

It is usually best to save the output of lm() into a variable. We then use that variable in various ways to extract information that we need. Descriptive names for the model fits are generally a good idea.

> # the MyRL data frame was created in the last chapter
> separate.means <- lm(logTime ~ temp,data=myRL)
> single.mean <- lm(logTime~1,data=myRL)

The separate.means model represents the mean structure as an overall central value plus deviations from the central value. As an alternative, we could just ask lm() to fit a separate mean for each group. We can do that by asking lm() to leave the overall constant out of the model via -1.

> individual.means <- lm(logTime ~ temp-1,data=myRL)

Note, it is even possible, although rarely useful, to fit all of the data by 0; that is, we do not even fit an overall mean.

> zero.mean <- lm(logTime ~ 0,data=myRL)

14.2 Predicted or Fitted Values

A model describes a pattern of means for the data, and after fitting the model we can observe what those estimated means are. This is often called looking at the fitted or predicted values. In R, the predict() function allows us to access the fitted or predicted values.

If you simply say predict(modelfit), R will give you the predicted values for all of the data used to fit the model. Note that even though the individual means and separate means models are formulated differently, they still give us the same fitted values.
> predict(individual.means)
       1        2        3        4        5        6        7        8 
1.932500 1.932500 1.932500 1.932500 1.932500 1.932500 1.932500 1.932500 
       9       10       11       12       13       14       15       16 
1.628750 1.628750 1.628750 1.628750 1.628750 1.628750 1.628750 1.628750 
      17       18       19       20       21       22       23       24 
1.377500 1.377500 1.377500 1.377500 1.377500 1.377500 1.377500 1.377500 
      25       26       27       28       29       30       31       32 
1.194286 1.194286 1.194286 1.194286 1.194286 1.194286 1.194286 1.056667 
      33       34       35       36       37 
1.056667 1.056667 1.056667 1.056667 1.056667 
> one.each <- c(1,9,17,25,32) # one from each treatment group
> predict(separate.means)[one.each] # same as above
       1        9       17       25       32 
1.932500 1.628750 1.377500 1.194286 1.056667 
> predict(single.mean)[one.each] # kind of boring
       1        9       17       25       32 
1.465135 1.465135 1.465135 1.465135 1.465135 

These fitted or predicted values are estimating a parameter, namely the mean response in different treatment groups for each model. We can ask for confidence intervals for the fitted values. The default level is 95%, so we did not really need to specify it. Note that the confidence intervals are not all the same width; this is a result of the different \(n_i\) values.

> predict(separate.means,interval="confidence",level=.95)[one.each,]
        fit       lwr      upr
1  1.932500 1.8635073 2.001493
9  1.628750 1.5597573 1.697743
17 1.377500 1.3085073 1.446493
25 1.194286 1.1205294 1.268042
32 1.056667 0.9770008 1.136333

The titular function from the emmeans (estimated marginal means) package lets you estimate, and get confidence intervals for, the group means.

> library(emmeans);emmeans(separate.means,"temp")
 temp emmean     SE df lower.CL upper.CL
 175    1.93 0.0339 32    1.864     2.00
 194    1.63 0.0339 32    1.560     1.70
 213    1.38 0.0339 32    1.309     1.45
 231    1.19 0.0362 32    1.121     1.27
 250    1.06 0.0391 32    0.977     1.14

Confidence level used: 0.95 

We can also get prediction intervals for future data. Recall that prediction intervals take into account the variability of data around the mean as well as the uncertainty in the value of the mean (which is all that confidence intervals account for). Thus prediction intervals will be wider than confidence intervals.

> predict(separate.means,interval="prediction",level=.95)[one.each,]
Warning in predict.lm(separate.means, interval = "prediction", level = 0.95): predictions on current data refer to _future_ responses
        fit       lwr      upr
1  1.932500 1.7255219 2.139478
9  1.628750 1.4217719 1.835728
17 1.377500 1.1705219 1.584478
25 1.194286 0.9856714 1.402900
32 1.056667 0.8458905 1.267443

We can can also specify where we want to predict. We do it by having a data frame with named elements that specify the variables in the model.

Here we only need temp, and we’ll predict at the first four levels. We could have added the fifth level.

> predframe <-data.frame(temp=factor(c(175,194,213,231)))
> predframe
  temp
1  175
2  194
3  213
4  231
> predict(separate.means,predframe)
       1        2        3        4 
1.932500 1.628750 1.377500 1.194286 

14.3 Polynomial response models

In this data set, the treatment corresponds to values of some more or less continuously variable factor: temperature. Instead of thinking of the model being five separate means, we can think of the mean response as a function of the temperature.

The advantages of functional models are:

  • We can make predictions at values of the treatment where we did not take measurements.
  • We can often make good predictions without using all the between groups degrees of freedom (parsimony).

Polynomials are by far the most common functions to use, to the point that for some analysts, functional models are practically synonymous with polynomial models, but there are many other choices that could be useful in certain circumstances (e.g., trigonometric functions, splines, wavelets, etc.)

Two points determine a line, three points a quadratic, four points a cubic, and five points a quartic. There are five different values for temperature, so we can fit up to a quartic polynomial.

We created the square, cube, etc. variables earlier, and now we fit models with various numbers of terms. Here we fit polynomial models of degree 0 (that is, a constant), degree 1 (linear), degree 2 (quadratic), degree 3, and degree 4.

> p4 <- lm(logTime~temp.z+temp.z2+temp.z3+temp.z4,data=myRL)
> p3 <- lm(logTime~temp.z+temp.z2+temp.z3,data=myRL)
> p2 <- lm(logTime~temp.z+temp.z2,data=myRL)
> p1 <- lm(logTime~temp.z,data=myRL)
> p0 <- lm(logTime~1,data=myRL) # same as single.mean

There are a few things to note. First, the order 0 polynomial model is the same thing as the single mean model, because a constant mean is the same thing as saying every group has the same mean.

Second, note that we maintain hierarchy, meaning that if we have power p in the model we also include the powers below p. This is generally good practice to help the model stay interpretable. For example, a pure quadratic model in degrees C would have linear terms in degrees F. So which terms are really in the model? If you maintain hierarchy, you can do a linear change of units (degrees C to degrees F or centimeters east of the front of the room to centimeters east of the back of the room) without having previously absent terms show up after the change of units.

The form I() in the right hand side of a model says to take whatever is between the parentheses and evaluate that operation, then use it as a term in the model. Thus, for example, instead of p4 or p2 as above, we could use

> p4b <- lm(logTime~temp.z+I(temp.z^2)+I(temp.z^3)+I(temp.z^4),data=myRL)
> p2b <- lm(logTime~temp.z+I(temp.z^2),data=myRL)

Model p4b will give the same thing as in model p4, and p2b will give you the same thing as in model p2, but the terms will be labelled differently. Here we see that the pairs of models have the same fitted values (returned by predict(model)).

> sum( (predict(p4)-predict(p4b))^2 ) # numerically 0
[1] 0
> sum( (predict(p2)-predict(p2b))^2 ) # numerically 0
[1] 0

Finally, we fit a couple of non-hierarchical models. We do this to show how apparently-similar non-hierarchical models can actually be quite different.

> nonh2 <- lm(logTime~I(temp.z^2),data=myRL)  # Beware!
> nonh2b <- lm(logTime~I((temp.z-215)^2),data=myRL) # Beware!
> p2c <- lm(logTime~I(temp.z-215)+I((temp.z-215)^2),data=myRL)

nonh2 uses just the intercept and the square of temperature as predictors, while nonh2b first centers temperature (so that the centered values are more or less equally distributed around 0) before squaring. We can see that these are not the same fitted models, because they do not have the same fitted values.

> sum( (predict(nonh2)-predict(nonh2b))^2 ) # NOT numerically 0
[1] 3.321688

p2c is a quadratic model that uses the shifted version of temperature for both first and second order terms. p2c has the same fitted values as p2, so the linear shift did not affect the hierarchical models.

> sum( (predict(p2) - predict(p2c))^2) # numerically 0
[1] 4.14152e-29

14.3.1 Orthogonal Polynomials

There are many ways to re-express polynomials. Consider the following. \[ \begin{eqnarray} P(z) & = & a + bz + cz^2 \\ Q(z) & = & 1 - z + z^2 \\ L(z) & = & 5 + z \end{eqnarray} \] Then \[ P(z) = cQ(z) + (b+c)L(z) + (a-c-5b) \] In fact, any quadratic can be re-expressed as a constant plus multiples of Q and L. Moreover, any polynomial can be re-expressed as a constant plus multiples of other polynomials, as long as there is one polynomial of each order in the set of “other polynomials”.

This means that instead of using the monomials like \(z^2\) or \(z\) as our predictive terms, we could use more general polynomials like Q and L as our predictive terms, and we will get the same model fits. This is an interesting fact, but it is really only useful if we choose the “other polynomials” very carefully.

Orthogonal polynomials is one set of very carefully chosen “other polynomials”; they have the following advantages.

  1. Orthogonal polynomials are numerically stable. Fitting ordinary polynomials can be numerically unstable if you are fitting a high order polynomial or your \(z\) values are far from 0 (e.g., 99999991, 99999992, 99999993).
  2. Orthogonal polynomials are statistically stable in the sense that the fitted coefficients for the different orders are statistically independent from each other. Among other things, this means that the coefficient for the quadratic orthogonal polynomial will be the same whether or not you include the cubic orthogonal polynomial in your model.
  3. Orthogonal polynomials partition model sums of squares very cleanly. (Yes, it is more right triangles and Pythagorean theorem in N dimensional space.)

A linear orthogonal polynomial is a combination of the ordinary linear monomial \(z\) plus a constant. A quadratic orthogonal polynomial is a combination of the ordinary quadratic monomial \(z^2\) plus a constant and a multiple of the ordinary linear monomial \(z\), all possibly rescaled. The exact combination depends on the data set, so orthogonal polynomials depend on the data set.

The orthogonal polynomials depend on the data, but R has a nice way to generate them for you. Within a model statement, a term like poly(temp.z,degree=3) will create the a model that includes linear, quadratic, and cubic orthogonal polynomials for the data set. Let’s fit some orthogonal polynomial models.

> op4 <- lm(logTime~poly(temp.z,degree=4),data=myRL)
> op3 <- lm(logTime~poly(temp.z,degree=3),data=myRL)
> op2 <- lm(logTime~poly(temp.z,degree=2),data=myRL)
> op1 <- lm(logTime~poly(temp.z,degree=1),data=myRL)

14.4 Non-polynomial Functional Model

And just to show we can do it, here is a model using trigonometric functions. Note that we do not need to write the trig functions in I(). This is because the model parser recognizes the trig functions, but it wants to interpret ^ as something other than “raise to a power” unless we wrap it in I().

> trig4 <- lm(logTime~sin(2*pi*temp.z/60)+cos(2*pi*temp.z/60)+
+               sin(4*pi*temp.z/60)+cos(4*pi*temp.z/60),data=myRL)

14.5 All Roads Lead to Rome

We have mentioned the separate means model, the individual means model, the ordinary polynomial model, the orthogonal polynomial model, and a trigonometric model. For the resin lifetime data, there are five treatment groups. That means that the largest models you can fit are:

  • Five individual means.
  • A grand mean and five deviations from the grand mean.
  • Constant, linear, quadratic, cubic, and quartic terms.
  • Constant, first, second, third, and fourth order othogonal polynomials.
  • Constant and four (linearly independent) trigonometric functions.

In all cases, there are five quantities to be estimated in these largest models (coefficients, means, etc.), and in all cases the fitted values from these largest models are the same.

> # all of these are numerically zero
> sum( (predict(separate.means) - predict(individual.means))^2)
[1] 1.770007e-29
> sum( (predict(separate.means) - predict(p4))^2)
[1] 1.177345e-24
> sum( (predict(separate.means) - predict(op4))^2)
[1] 6.162976e-30
> sum( (predict(separate.means) - predict(trig4))^2)
[1] 1.16357e-29

One model formulation is not mathematically better than another. In different circumstances you might find different formulations of the model to be preferable. In fact, you might use more than one model formulation in the same analysis.

14.6 Buyer Beware

The separate means, individual means, fourth order polynomial, fourth order orthogonal polynomial, and four order trigonometric models all fit the five treatment means exactly. However, they say very different things about what the response might be away from the five places where we have observed responses.

The individual and separate means models (based on categorical factors) say nothing about the response away from the observed temperatures.

The other three models can make predictions away from the observed temperatures, but they make very different predictions away from the observed temperatures. Here is what the quartic and trigonometric models look like from 150 to 375 C.
> pframe2 <- data.frame(temp.z=150:375,temp.z2=(150:375)^2,
+                       temp.z3<-(150:375)^3,temp.z4<-(150:375)^4)
> with(myRL,plot(temp.z,logTime,xlab='Temperature',ylab='log lifetime',
+    main="Quartic and Trigonometric Fits",xlim=c(150,375),ylim=c(0,3)))
> lines(pframe2$temp.z,predict(p4,pframe2))
> lines(pframe2$temp.z,predict(trig4,pframe2),lty=2)

The quartic does a reasonable job within the range of the data, but we really don’t know how well it is working at higher temperatures (where it flattens and then drops suddenly). The trigonometric fit is just hilarious away from our five observed values.

Extrapolating the mean response away from the observed data is possible but risky. Draw a picture and determine whether you think the fits you get are sensible or just silly.

Note: there are other data sets where the trigonometric fit works well and the polynomial is silly. You cannot make general statements about what will, or will not, work well.