\documentclass[11pt,twoside,notitlepage]{article} \usepackage{indentfirst} \usepackage{amsmath} \usepackage{amssymb} \usepackage{natbib} \usepackage{url} \usepackage{graphicx} \usepackage[utf8]{inputenc} % remove room for headers, add to textheight \addtolength{\textheight}{\headheight} \addtolength{\textheight}{\headsep} \setlength{\headheight}{0 pt} \setlength{\headsep}{0 pt} % adjust so right and left hand pages have different margins % not sure why these particular numbers were picked or if they % still make sense % \showthe\evensidemargin % \showthe\oddsidemargin % \showthe\textwidth % \evensidemargin 15 pt % \oddsidemargin 61.5 pt % \textwidth 392.5 pt \evensidemargin 28.75 pt \oddsidemargin 75.25 pt \textwidth 365 pt %%%%% NEW go with 1.25 inch margin on all sides \setlength{\textheight}{\paperheight} \addtolength{\textheight}{- 2 in} \setlength{\topmargin}{0.25 pt} \setlength{\headheight}{0 pt} \setlength{\headsep}{0 pt} \addtolength{\textheight}{- \topmargin} \addtolength{\textheight}{- \topmargin} \addtolength{\textheight}{- \footskip} \setlength{\oddsidemargin}{0.25 in} \setlength{\evensidemargin}{0.25 in} \addtolength{\textwidth}{- \oddsidemargin} \addtolength{\textwidth}{- \evensidemargin} \begin{document} \vspace*{0.9375in} \begin{center} {\bfseries More Supporting Data Analysis for \\ ``Unifying Life History Analysis for Inference \\ of Fitness and Population Growth''} \\ By \\ Ruth G. Shaw, Charles J. Geyer, Stuart Wagenius, \\ Helen H. Hangelbroek, and Julie R. Etterson \\ Technical Report No.~661 \\ School of Statistics \\ University of Minnesota \\ % April 20, 2005 \\ \today \end{center} \thispagestyle{empty} \cleardoublepage \setcounter{page}{1} \thispagestyle{empty} \begin{abstract} This technical report (TR) gives details of a data reanalysis backing up a paper having the same authors as this TR and having the title that is quoted in the title of this TR. This reanalysis was not in the first submission of the paper, which instead had analyses given in Chapters~3 and~4 of TR~658. This analysis is for the second submission (to the same journal, \emph{American Naturalist}) of that paper. Unlike the first analyses, these reanalyses directly estimate the fitness landscape rather than quantities related to it. The two analyses are also much more alike than the two analyses for the first submission. Both estimate exactly the same quantities, although one has to work harder to do so. In an unrelated issue, we also give an example of subsampling a component of fitness and its affect on parameter estimates. This issue was mentioned in the first draft of the paper, but this is the first worked example illustrating this method. \end{abstract} \thispagestyle{empty} \cleardoublepage \setcounter{page}{1} <>= options(keep.source = TRUE, width = 70) ps.options(pointsize = 15) @ \section{Creating this Document} This document is created from its source file \texttt{tr661.Rnw} using the R \texttt{Sweave} command and the \LaTeX\ document preparation system. First do \begin{verbatim} Sweave("tr661.Rnw") \end{verbatim} if you have downloaded the file, or do \begin{verbatim} Sweave(url("http://www.stat.umn.edu/geyer/aster/tr661/tr661.Rnw")) \end{verbatim} otherwise. This step takes an hour and a half on a fairly fast computer because of the Monte Carlo calculation in Section~\ref{sec:land-sim}, and this step needs to be redone until the statements \texttt{print(ok)} on pages \pageref{pg:ok2} and \pageref{pg:ok1} print \texttt{TRUE}. Then process the output, \texttt{tr661.tex} and several files with suffixes \texttt{pdf} and \texttt{eps} in the usual fashion (which depends on your system and installation). \section{Introduction} The analysis presented in this technical report is one more attempt to do full justice to the \emph{Chamaecrista} data described below. As the experiment was designed there were multiple components of fitness. For each plant that survived to that stage, fruits were counted (\texttt{fruit}) and then a random sample of fruits of size 3 was taken and the seeds in those fruits counted (\texttt{seed}). This experimental design does not fit aster models perfectly (not the fault of the experimenters because the experiment was done before aster models were described). It would have been better if seeds were counted for all fruits or for a fraction $p$ of fruits. Nevertheless, we do what we can. Using a Monte Carlo calculation we can still estimate the fitness surface that corresponds to any aster model we decide fits the data. We can use the parametric bootstrap to carry out statistical tests or confidence intervals, although these no longer have a simple relationship to the parameters of the fitted aster model (as they would if the experimental design had been more favorable to aster analysis). In Section~\ref{sec:both} we perform an aster analysis in which both components of fitness, \texttt{fruit} and \texttt{seed} are used, and fitness is deemed to be \verb@fruit * seed / 3@. The multiplication in this definition complicates estimation of expected fitness. The aster software can calculate the expectation of any linear combination of components of fitness, but it cannot calculate expectations of nonlinear functions of components of fitness. Fortunately, expectations that cannot be calculated exactly can be approximated by Monte Carlo. This takes time but is not otherwise problematic. In Section~\ref{sec:single} we perform an aster analysis in which \texttt{fruit} is deemed fitness. This illustrates the typical situation in which a linear combination of fitness components is deemed fitness and no Monte Carlo calculation is needed. \section{Analysis involving Both Components of Fitness} \label{sec:both} \subsection{Data} We reanalyze a subset of the data analyzed by \citet{es}. These data are in the \texttt{chamae} dataset in the \texttt{aster} contributed package to the R statistical computing environment \citep{rcore}. Individuals of \emph{Chamaecrista fasciculata} (common name, partridge pea) were obtained from three locations in the country and planted in three field sites. Of the complete data we only reanalyze here individuals planted in one field site (Minnesota). These data are already in ``long'' format, no need to use the \texttt{reshape} function on them to do aster analysis. We will, however, need the ``wide'' format for Lande-Arnold analysis \citep{la}. So we do that, before making any changes (we will add newly defined variables) to \texttt{chamae}. <>= library(aster) data(chamae) chamaew <- reshape(chamae, direction = "wide", timevar = "varb", v.names = "resp", varying = list(levels(chamae$varb))) names(chamaew) @ For each individual, many characteristics were measured, three of which we consider phenotypic characters (so our $z$ is three-dimensional), and others which combine to make up an estimate of fitness. The three phenotypic characters are reproductive stage (\verb@STG1N@), log leaf number (\verb@LOGLVS@), and log leaf thickness (\verb@LOGSLA@). ``At the natural end of the growing season, [they] recorded total pod number and seed counts from three representative pods; from these measures, [they] estimated [fitness]'' \citep[further explained in their note 12]{es}. % There are several complications in the estimation of fitness. % Some fruits (pods) had already dehisced by the time data were collected, % so seeds could not be counted. The number of dehisced fruits are not recorded % in the data we are working with (although they could be reconstructed from % the original data). Some individuals had fewer than three (non-dehisced) % fruits to count. % An aster model does not allow missing data as opposed to structural zeros, % which are data that are necessarily zero because other data are zero, for % example, that dead individuals have no fruits and individuals that have % zero fruits also have zero seeds. Structural zeros are allowed, but true % missing data, random variables that are part of the statistical model and % that can have multiple possible values given the observed data, are not. % We know how to handle missing data in theory \citep{g,sg}, but this would % require Monte Carlo likelihood approximation (MCLA), a complication we % do not wish to introduce here, for which no computer implementation % currently exists (for aster models, MCLA has been implemented in many % other contexts). Although aster model theory in the published version of \citet{gws} does allow conditionally multinomial response variables, versions of the \texttt{aster} package up through 0.7-2, the current version at the time this was written, do not. Multinomial response, if we could use it, would allow us to deal individuals having seeds counted from 0, 1, 2, or 3 fruits. % To avoid the missing data issue, we ignore dehisced fruits, treating % them for the purposes of this example has having no seeds. % In contrast, \citet{es} imputed fitness in certain cases. To avoid multinomial response, we remove individuals with seeds counted for only one or two fruits (there were only four such). \begin{figure} \begin{center} \setlength{\unitlength}{0.4 in} \begin{picture}(2.65,2.15)(-2.25,-2.10) \put(0,0){\makebox(0,0){$1$}} \put(-1,-1){\makebox(0,0){\ttfamily fecund}} \put(-2,-2){\makebox(0,0){\ttfamily seed}} \put(0,-2){\makebox(0,0){\ttfamily fruit}} \multiput(-0.25,-0.25)(-1,-1){2}{\vector(-1,-1){0.5}} \multiput(-0.75,-1.25)(1,-1){1}{\vector(1,-1){0.5}} \end{picture} \end{center} \caption{Graph for \emph{Chamaecrista} Aster Data. Arrows go from parent nodes to child nodes. Nodes are labeled by their associated variables. The only root node is associated with the constant variable $1$. \texttt{fecund} is Bernoulli (zero indicates no seeds, one indicates nonzero seeds). If \texttt{fecund} is zero, then so are the other variables. If \texttt{fecund} is nonzero, then \texttt{fruit} (fruit count) and \texttt{seed} (seed count) are conditionally independent, \texttt{fruit} has a two-truncated negative binomial distribution, and \texttt{seed} has a zero-truncated negative binomial distribution.} \label{fig:graph:chamae} \end{figure} Figure~\ref{fig:graph:chamae} shows the graph of the aster model we use for these data. Fruit count (\texttt{fruit}) and seed count (\texttt{seed}) are dependent only in that if one is zero, then so is the other (we only model fruit count for individuals who have seeds, because fruit count for other individuals is irrelevant). Given that neither is zero (when \verb@fecund == 1@), they are conditionally independent. Given that fruit count is nonzero, it is at least three (by our data modifications). The conditional distribution of \texttt{seed} given that it is nonzero is what is called zero-truncated negative binomial, which is negative binomial conditioned on being greater than zero. By analogy we call the conditional distribution of \texttt{fruit} given that it is nonzero, two-truncated negative binomial, which is negative binomial conditioned on being greater than two. \subsection{Aster Analysis} We need to choose the non-exponential-family parameters (sizes) for the negative binomial distributions, since the \verb@aster@ package only does maximum likelihood for exponential family parameters. We start with the following values, which were chosen with knowledge of the maximum likelihood estimates for these parameters, which we find in Section~\ref{sec:mle}. The values that are found then are written out to a file and loaded here if the file exists, so after several runs (of \texttt{Sweave}) we are reading in here the maximum likelihood values of these non-exponential-family parameters. <>= options(show.error.messages = FALSE, warn = -1) try(load("chamae-alpha.rda")) options(show.error.messages = TRUE, warn = 0) ok <- exists("alpha.fruit") && exists("alpha.seed") if (! ok) { alpha.fruit <- 3.0 alpha.seed <- 15.0 } print(alpha.fruit) print(alpha.seed) @ Then we set up the aster model framework. <>= vars <- c("fecund", "fruit", "seed") pred <- c(0,1,1) famlist <- list(fam.bernoulli(), fam.poisson(), fam.truncated.negative.binomial(size = alpha.seed, truncation = 0), fam.truncated.negative.binomial(size = alpha.fruit, truncation = 2)) fam <- c(1,4,3) @ We can now fit our first aster model. <>= out1 <- aster(resp ~ varb + BLK, pred, fam, varb, id, root, data = chamae, famlist = famlist) summary(out1, show.graph = TRUE) @ The ``response'' \verb@resp@ is a numeric vector containing all the response variables (\verb@fecund@, \verb@fruit@, and \verb@seed@). The ``predictor'' \verb@varb@ is a factor with three levels distinguishing with \verb@resp@ which original response variable an element is. The predictor \verb@BLK@ has not been mentioned so far. It is block within the field where the plants were grown. % Delete as per Ruth criticism % One might think we should use \verb@varb * BLK@ but this uses up % too many parameters when we have not yet added the predictors of interest. % <>= % out1foo <- aster(resp ~ varb * BLK, pred, fam, varb, id, root, % data = chamae, famlist = famlist) % summary(out1foo) % anova(out1, out1foo) % @ % Despite the statistically significant improvement (based on the chi-square % approximation to the log likelihood ratio, % we do not want to % use up all our degrees of freedom before we put the predictors of % interest in the model. Now we add phenotypic variables. <>= out2 <- aster(resp ~ varb + BLK + LOGLVS + LOGSLA + STG1N, pred, fam, varb, id, root, data = chamae, famlist = famlist) summary(out2) @ One might think we should use \verb@varb * (LOGLVS + LOGSLA + STG1N)@ but it turns out this is too many parameters and the Fisher information is ill conditioned, as shown by the need to use the \verb@info.tol@ argument. <>= out2foo <- aster(resp ~ BLK + varb * (LOGLVS + LOGSLA + STG1N), pred, fam, varb, id, root, data = chamae, famlist = famlist) summary(out2foo, info.tol = 1e-11) anova(out2, out2foo) @ Despite the statistically significant improvement (based on the chi-square approximation to the log likelihood ratio, which may not be valid with such an ill-conditioned Fisher information), we do not adopt this model (\verb@out2foo@) either. Although we cannot afford 9 parameters (3 levels of \verb@varb@ times 3 predictor variables) for the interaction, we can afford 6, only putting the phenotype variables in at level \verb@fruit@ and \verb@seed@. Because we are fitting an unconditional aster model, the effects of these terms are passed down to \verb@fecund@. See the example in \citet{gws} for discussion of this phenomenon. <>= foo <- as.numeric(as.character(chamae$varb) == "fruit") chamae$LOGLVSfr <- chamae$LOGLVS * foo chamae$LOGSLAfr <- chamae$LOGSLA * foo chamae$STG1Nfr <- chamae$STG1N * foo foo <- as.numeric(as.character(chamae$varb) == "seed") chamae$LOGLVSsd <- chamae$LOGLVS * foo chamae$LOGSLAsd <- chamae$LOGSLA * foo chamae$STG1Nsd <- chamae$STG1N * foo out6 <- aster(resp ~ varb + BLK + LOGLVSfr + LOGSLAfr + STG1Nfr + LOGLVSsd + LOGSLAsd + STG1Nsd, pred, fam, varb, id, root, data = chamae, famlist = famlist) summary(out6) @ When we analyzed the Minnesota-Minnesota subset alone (the subset of these data consisting of only the Minnesota population) the there was no statistically significant effect of the phenotypic predictors on seed count. In these data that effect is significant. <>= out5 <- aster(resp ~ varb + BLK + LOGLVSfr + LOGSLAfr + STG1Nfr, pred, fam, varb, id, root, data = chamae, famlist = famlist) summary(out5) anova(out5, out6) @ A similar test <>= out4 <- aster(resp ~ varb + BLK + LOGLVSsd + LOGSLAsd + STG1Nsd, pred, fam, varb, id, root, data = chamae, famlist = famlist) summary(out4) anova(out4, out6) @ shows that the effect of these variables on fruit is significant. Now we consider quadratic terms. Since the variable \verb@STG1N@ has only a few values <>= sort(unique(chamae$STG1N)) tabulate(chamae$STG1N) @ there is little sense adding terms quadratic in this variable. The test <>= out7 <- aster(resp ~ varb + BLK + LOGLVSfr + LOGSLAfr + I(LOGLVSfr^2) + I(LOGSLAfr^2) + I(LOGLVSfr * LOGSLAfr) + STG1Nfr + LOGLVSsd + LOGSLAsd + STG1Nsd, pred, fam, varb, id, root, data = chamae, famlist = famlist) summary(out7, info.tol = 1e-9) anova(out6, out7) @ shows that there appears to be a quadratic effect on fruit. The similar test <>= out8 <- aster(resp ~ varb + BLK + LOGLVSsd + LOGSLAsd + I(LOGLVSsd^2) + I(LOGSLAsd^2) + I(LOGLVSsd * LOGSLAsd) + STG1Nsd + LOGLVSfr + LOGSLAfr + STG1Nfr, pred, fam, varb, id, root, data = chamae, famlist = famlist) summary(out8, info.tol = 1e-9) anova(out6, out8) @ shows that there appears to also be a quadratic effect on seed. And <>= out9 <- aster(resp ~ varb + BLK + LOGLVSfr + LOGSLAfr + I(LOGLVSfr^2) + I(LOGSLAfr^2) + I(LOGLVSfr * LOGSLAfr) + STG1Nfr + LOGLVSsd + LOGSLAsd + I(LOGLVSsd^2) + I(LOGSLAsd^2) + I(LOGLVSsd * LOGSLAsd) + STG1Nsd, pred, fam, varb, id, root, data = chamae, famlist = famlist) summary(out9, info.tol = 1e-9) anova(out6, out7, out9) anova(out6, out8, out9) @ Shows that the model that is quadratic in the effects on both fruit and seed is supported by the data. There is some question about these because the Fisher information is close to singular (as evidenced by our need to supply the \verb@info.tol@ argument to the \verb@summary@ command), but we will go with \verb@out9@ as out ``best fitting'' model. \subsection{Maximum Likelihood Estimation of Size} \label{sec:mle} The \verb@aster@ function does not calculate the correct likelihood when the size parameters are considered unknown, because it drops terms that do not involve the exponential family parameters. However, the full log likelihood is easily calculated in R. <>= x <- out9$x logl <- function(alpha.fruit, alpha.seed, theta, x) { x.fecund <- x[ , 1] theta.fecund <- theta[ , 1] p.fecund <- 1 / (1 + exp(- theta.fecund)) logl.fecund <- sum(dbinom(x.fecund, 1, p.fecund, log = TRUE)) x.fruit <- x[x.fecund == 1, 2] theta.fruit <- theta[x.fecund == 1, 2] p.fruit <- (- expm1(theta.fruit)) logl.fruit <- sum(dnbinom(x.fruit, size = alpha.fruit, prob = p.fruit, log = TRUE) - pnbinom(2, size = alpha.fruit, prob = p.fruit, lower.tail = FALSE, log = TRUE)) x.seed <- x[x.fecund == 1, 3] theta.seed <- theta[x.fecund == 1, 3] p.seed <- (- expm1(theta.seed)) logl.seed <- sum(dnbinom(x.seed, size = alpha.seed, prob = p.seed, log = TRUE) - pnbinom(0, size = alpha.seed, prob = p.seed, lower.tail = FALSE, log = TRUE)) logl.fecund + logl.fruit + logl.seed } @ We then calculate the profile likelihood for the two size parameters (\verb@alpha.fruit@ and \verb@alpha.seed@), maximizing over the other parameters. Evaluating the profile log likelihood on a grid of points. We do not do this if the results would be the same as we got last time and have stored in the variable \verb@logl.seq@. \label{pg:ok2} <>= ok <- exists("alpha.fruit.save") && (alpha.fruit.save == alpha.fruit) && exists("alpha.seed.save") && (alpha.seed.save == alpha.seed) && exists("coef.save") && isTRUE(all.equal(coef.save, coefficients(out9))) print(ok) alpha.fruit.seq <- seq(1.5, 3.5, 0.25) alpha.seed.seq <- seq(10, 30, 0.5) if (! ok) { logl.seq <- matrix(NA, nrow = length(alpha.fruit.seq), ncol = length(alpha.seed.seq)) for (i in 1:length(alpha.fruit.seq)) { for (j in 1:length(alpha.seed.seq)) { famlist.seq <- famlist famlist.seq[[3]] <- fam.truncated.negative.binomial(size = alpha.seed.seq[j], truncation = 0) famlist.seq[[4]] <- fam.truncated.negative.binomial(size = alpha.fruit.seq[i], truncation = 2) out9.seq <- aster(out9$formula, pred, fam, varb, id, root, data = chamae, famlist = famlist.seq, parm = out9$coefficients) theta.seq <- predict(out9.seq, model.type = "cond", parm.type = "canon") dim(theta.seq) <- dim(x) logl.seq[i, j] <- logl(alpha.fruit.seq[i], alpha.seed.seq[j], theta.seq, x) } } } ##### interpolate ##### alpha.fruit.interp <- seq(min(alpha.fruit.seq), max(alpha.fruit.seq), 0.01) alpha.seed.interp <- seq(min(alpha.seed.seq), max(alpha.seed.seq), 0.01) logl.foo <- matrix(NA, nrow = length(alpha.fruit.interp), ncol = length(alpha.seed.seq)) for (i in 1:length(alpha.seed.seq)) logl.foo[ , i] <- spline(alpha.fruit.seq, logl.seq[ , i], n = length(alpha.fruit.interp))$y logl.bar <- matrix(NA, nrow = length(alpha.fruit.interp), ncol = length(alpha.seed.interp)) for (i in 1:length(alpha.fruit.interp)) logl.bar[i, ] <- spline(alpha.seed.seq, logl.foo[i, ], n = length(alpha.seed.interp))$y imax.fruit <- row(logl.bar)[logl.bar == max(logl.bar)] imax.seed <- col(logl.bar)[logl.bar == max(logl.bar)] alpha.fruit.save <- alpha.fruit alpha.seed.save <- alpha.seed alpha.fruit <- alpha.fruit.interp[imax.fruit] alpha.seed <- alpha.seed.interp[imax.seed] coef.save <- coefficients(out9) ##### save ##### if (! ok) { save(alpha.fruit, alpha.seed, alpha.fruit.save, alpha.seed.save, coef.save, logl.seq, file = "chamae-alpha.rda", ascii = TRUE) } @ At the end of this chunk we save the maximum likelihood estimates in a file which is read in at the beginning of this document. We also save some extra information so there is no need to do this step every time if there is no change in the alphas. Figure~\ref{fig:contour:too} (page~\pageref{fig:contour:too}) shows the profile log likelihood for the size parameters. \begin{figure} \begin{center} <>= # image(alpha.fruit.interp, alpha.seed.interp, logl.bar - max(logl.bar), # xlab = "size parameter for fruit", ylab = "size parameter for seed") lev <- pretty(logl.bar - max(logl.bar), 10) lev <- lev[lev != 0] lev <- lev[lev > min(logl.bar) - max(logl.bar)] lev <- sort(c(-5, -2, lev)) contour(alpha.fruit.interp, alpha.seed.interp, logl.bar - max(logl.bar), xlab = "size parameter for fruit", ylab = "size parameter for seed", levels = lev) points(alpha.fruit, alpha.seed, pch = 19) @ \end{center} \caption{Profile log likelihood for size parameters for the negative binomial distributions of fruit and seed. Solid dot is maximum likelihood estimate.} \label{fig:contour:too} \end{figure} \subsection{The Fitness Landscape} \label{sec:land-sim} If we had ``aster-friendly'' data in which expected fitness was a mean value parameter of the aster model, we could immediately calculate the fitness landscape using the predict function (as in Chapter~3 of TR~658). Unfortunately, fitness, which in this example we take to be the product of \verb@fruit@ and \verb@seed@ divided by 3 (because seeds were counted for three fruits), has expectation that is not a mean value parameter (because the expectation of a product is not the product of the expectations). Nevertheless, we can calculate its expectation by simulation (Monte Carlo). We calculate for just one value of \verb@BLK@ and \verb@STG1N@. <>= theblk <- "1" thestg <- 1 @ Figure~\ref{fig:surf} (page~\pageref{fig:surf}) shows the scatter plots of the two phenotypic variables (\verb@LOGLVS@ and \verb@LOGSLA@, labeled \verb@LN@ and \verb@SLA@ because that is what they are called in the paper). It is made by the following code. <>= plot(chamaew$LOGLVS, chamaew$LOGSLA, xlab = "log(LN)", ylab = "log(SLA)") @ \begin{figure} \begin{center} <>= <> @ \end{center} \caption{Scatterplot of phenotypic variables.} \label{fig:surf} \end{figure} The point of making the plot Figure~\ref{fig:surf} is that we want to add contour lines showing the estimated fitness landscape. To do that we first start with a grid of points across the figure. <>= ufoo <- par("usr") nx <- 101 ny <- 101 z <- matrix(NA, nx, ny) x <- seq(ufoo[1], ufoo[2], length = nx) y <- seq(ufoo[3], ufoo[4], length = ny) xx <- outer(x, y^0) yy <- outer(x^0, y) xx <- as.vector(xx) yy <- as.vector(yy) n <- length(xx) @ Then we create an appropriate \verb@newdata@ argument for the \verb@predict.aster@ function to ``predict'' at these points <>= newdata <- data.frame(BLK = factor(rep(theblk, n), levels = levels(chamae$BLK)), STG1N = rep(thestg, n), LOGLVS = xx, LOGSLA = yy, fecund = rep(1, n), fruit = rep(3, n), seed = rep(5, n)) renewdata <- reshape(newdata, varying = list(vars), direction = "long", timevar = "varb", times = as.factor(vars), v.names = "resp") renewdata <- data.frame(renewdata, root = 1) foo <- as.numeric(as.character(renewdata$varb) == "fruit") renewdata$LOGLVSfr <- renewdata$LOGLVS * foo renewdata$LOGSLAfr <- renewdata$LOGSLA * foo renewdata$STG1Nfr <- renewdata$STG1N * foo foo <- as.numeric(as.character(renewdata$varb) == "seed") renewdata$LOGLVSsd <- renewdata$LOGLVS * foo renewdata$LOGSLAsd <- renewdata$LOGSLA * foo renewdata$STG1Nsd <- renewdata$STG1N * foo @ Then we predict the conditional canonical parameter $\theta$ which is needed for simulation using the \verb@raster@ function. <>= theta <- predict(out9, newdata = renewdata, varvar = varb, idvar = id, root = root, model.type = "conditional", parm.type = "canonical") theta <- matrix(theta, nrow = nrow(newdata), ncol = ncol(out9$x)) @ Then we carry out a Monte Carlo approximation of the fitness landscape. Because this function may take a lot of time to run, we store the results in the current working directory, and simply load them if they exist. <>= root <- matrix(1, nrow(theta), ncol(theta)) nsim <- 5e5 options(show.error.messages = FALSE, warn = -1) try(load("zzz.rda")) options(show.error.messages = TRUE, warn = 0) ok <- exists("zfit") && exists("stime") && exists("nsim.save") && (nsim == nsim.save) && exists("theta.save") && isTRUE(all.equal(theta.save, theta)) if (! ok) { zfit <- double(n) stime <- system.time( for (isim in 1:nsim) { xnew <- raster(theta, pred, fam, root = root, famlist = famlist) zfit <- zfit + xnew[ , 2] * xnew[ , 3] / 3 } ) zfit <- zfit / nsim nsim.save <- nsim theta.save <- theta save(zfit, nsim.save, theta.save, stime, file = "zzz.rda") } @ The vector \verb@zfit@ is the Monte Carlo estimate; Figure~\ref{fig:surf2} (page~\pageref{fig:surf2}), which is made by the following code, shows it. <>= plot(chamaew$LOGLVS, chamaew$LOGSLA, xlab = "log(LN)", ylab = "log(SLA)", pch = ".") zfit <- matrix(zfit, nrow = length(x)) contour(x, y, zfit, add = TRUE) contour(x, y, zfit, levels = c(5, 10, 25), add = TRUE) @ \begin{figure} \begin{center} <>= <> @ \end{center} \caption{Scatterplot of phenotypic variables with contours of fitness landscape estimated by Monte Carlo.} \label{fig:surf2} \end{figure} The time spent doing the Monte Carlo calculation of the likelihood surface was <