R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet"
Copyright (C) 2014 The R Foundation for Statistical Computing
Platform: x86_64-unknown-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

>  
>   pdf(file = "gam-plots.pdf")
>  
>  # stealing stuff from Chapter 5 of
>  #     Simon Wood (2006)
>  #     Generalized Additive Models: An Introduction with R
>  #     Chapman & Hall/CRC Texts, Boca Raton, FL.
>  # uses the R package mgcv which is one of the "recommended packages that
>  # comes with every R installation (no need for install.packages or for
>  # using the "packages" menu in the GUI's -- it is already installed).
>  
>  # note: in this package have multiple ways to specify smooths in formulas
>  # s(x), s(x, y), te(x, y), ti(x, y), see help for s, te, ti, t2, smooth.terms
>  
>  # Cherry trees, see help(trees)
>  
>  # using the default, thin-plate splines
>  
>  library(mgcv)
Loading required package: nlme
This is mgcv 1.8-3. For overview type 'help("mgcv-package")'.
>  data(trees)
>  out1 <- gam(Volume ~ s(Height) + s(Girth),
+      family = Gamma(link = log), data = trees)
>  print(out1)

Family: Gamma 
Link function: log 

Formula:
Volume ~ s(Height) + s(Girth)

Estimated degrees of freedom:
1.00 2.42  total = 4.42 

GCV score: 0.008082356     
>  summary(out1)

Family: Gamma 
Link function: log 

Formula:
Volume ~ s(Height) + s(Girth)

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.27570    0.01492   219.6   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
            edf Ref.df      F  p-value    
s(Height) 1.000  1.000  31.32 3.92e-06 ***
s(Girth)  2.422  3.044 219.28  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.973   Deviance explained = 97.8%
GCV = 0.0080824  Scale est. = 0.006899  n = 31
>  plot(out1, residuals = TRUE)
>  
>  # using cubic regression splines (number of knots fixed ???)
>  
>  out2 <- gam(Volume ~ s(Height, bs = "cr") + s(Girth, bs = "cr"),
+      family = Gamma(link = log), data = trees)
>  summary(out2)

Family: Gamma 
Link function: log 

Formula:
Volume ~ s(Height, bs = "cr") + s(Girth, bs = "cr")

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.27570    0.01492   219.6   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
            edf Ref.df      F  p-value    
s(Height) 1.000  1.000  31.32 3.91e-06 ***
s(Girth)  2.419  3.035 219.50  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.973   Deviance explained = 97.8%
GCV = 0.0080805  Scale est. = 0.006898  n = 31
>  plot(out2, residuals = TRUE)
> 
>  predict(out1) - predict(out2)
            1             2             3             4             5 
 1.581844e-04  5.209911e-05  2.180389e-06  4.110952e-05  3.667582e-05 
            6             7             8             9            10 
 3.044918e-05 -4.155989e-06  9.920721e-07 -1.355421e-05 -3.068825e-05 
           11            12            13            14            15 
-3.267077e-05 -2.612674e-05 -2.612674e-05  1.915406e-05  3.066144e-05 
           16            17            18            19            20 
-1.212661e-04 -1.151295e-04 -4.974142e-05 -1.663593e-05 -2.422028e-05 
           21            22            23            24            25 
-1.919146e-05 -7.464358e-07  3.340554e-05 -2.581612e-05  3.197808e-05 
           26            27            28            29            30 
 1.444099e-04  7.843352e-05 -1.187383e-04 -1.634183e-04 -1.634183e-04 
           31 
 2.888061e-04 
>  # not a lot of difference
> 
>  # using 2-D thin plate splines
> 
>  out3 <- try(gam(Volume ~ s(Height, Girth),
+      family = Gamma(link = log), data = trees))
Error in smooth.construct.tp.smooth.spec(object, dk$data, dk$knots) : 
  A term has fewer unique covariate combinations than specified maximum degrees of freedom
>  # humpf, guess we have to follow the book
>  out3 <- try(gam(Volume ~ s(Height, Girth, k = 25),
+      family = Gamma(link = log), data = trees))
>  # it works, but why do we need k = 25, what is it doing?
>  summary(out3)

Family: Gamma 
Link function: log 

Formula:
Volume ~ s(Height, Girth, k = 25)

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.27583    0.01579   207.4   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
                  edf Ref.df     F p-value    
s(Height,Girth) 4.791  6.487 161.5  <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.966   Deviance explained = 97.7%
GCV = 0.0093576  Scale est. = 0.0077321  n = 31
>  plot(out3, residuals = TRUE)
> 
>  # is this any better ?
>  anova(out1, out3)
Analysis of Deviance Table

Model 1: Volume ~ s(Height) + s(Girth)
Model 2: Volume ~ s(Height, Girth, k = 25)
  Resid. Df Resid. Dev     Df   Deviance
1    26.578    0.18417                  
2    25.209    0.19183 1.3688 -0.0076599
>  # apparently this is nonsense ?????????
>  anova(out3)

Family: Gamma 
Link function: log 

Formula:
Volume ~ s(Height, Girth, k = 25)

Approximate significance of smooth terms:
                  edf Ref.df     F p-value
s(Height,Girth) 4.791  6.487 161.5  <2e-16
>  # no use
>  # read p. 195 of Wood book !!!!!!!!!!!!!!!!!!!!!!!
>  # would have to bootstrap to get P-value, but how ?????
>  # just getting a valid P-value is an open research question
> 
>  # still following book
>  out4 <- try(gam(Volume ~ te(Height, Girth, k = 5),
+      family = Gamma(link = log), data = trees))
>  summary(out4)

Family: Gamma 
Link function: log 

Formula:
Volume ~ te(Height, Girth, k = 5)

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.27584    0.01522   215.2   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
                 edf Ref.df     F p-value    
te(Height,Girth)   3      3 377.2  <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.973   Deviance explained = 97.7%
GCV = 0.0081971  Scale est. = 0.0071821  n = 31
>  plot(out4, residuals = TRUE)
> 
>  # still following book, new data set, brain image
>  library(gamair)
>  data(brain)
>  brain <- brain[brain$medFPQ > 5e-3, ]
>  bout0 <- gam(medFPQ ~ s(X, Y, k = 100), data = brain)
>  summary(bout0)

Family: gaussian 
Link function: identity 

Formula:
medFPQ ~ s(X, Y, k = 100)

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.25030    0.03029   41.27   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
         edf Ref.df     F p-value    
s(X,Y) 86.78  96.45 8.547  <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.337   Deviance explained = 37.4%
GCV = 1.5207  Scale est. = 1.4353    n = 1564
>  gam.check(bout0)

Method: GCV   Optimizer: magic
Smoothing parameter selection converged after 7 iterations.
The RMS GCV score gradiant at convergence was 5.235846e-06 .
The Hessian was positive definite.
The estimated model rank was 100 (maximum possible: 100)
Model rank =  100 / 100 

Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.

          k'   edf k-index p-value
s(X,Y) 99.00 86.78    0.82       0
>  # looks terrible so go with book.
>  bout1 <- gam(medFPQ ~ s(X, Y, k = 100), data = brain,
+      family = Gamma(link = log))
>  summary(bout1)

Family: Gamma 
Link function: log 

Formula:
medFPQ ~ s(X, Y, k = 100)

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.12031    0.01922   6.258 5.07e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
         edf Ref.df     F p-value    
s(X,Y) 60.61  77.82 4.798  <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.307   Deviance explained = 26.4%
GCV = 0.62169  Scale est. = 0.57802   n = 1564
>  gam.check(bout1)

Method: GCV   Optimizer: outer newton
full convergence after 3 iterations.
Gradient range [1.056991e-09,1.056991e-09]
(score 0.6216871 & scale 0.5780217).
Hessian positive definite, eigenvalue range [0.006576209,0.006576209].
Model rank =  100 / 100 

Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.

           k'    edf k-index p-value
s(X,Y) 99.000 60.611   0.847       0
> 
>  vis.gam(bout1, plot.type = "contour")
> 
> 
> proc.time()
   user  system elapsed 
  4.943   0.082   5.330