\chapter{Fitness Landscapes and Lande-Arnold Analysis using Aster Models} \label{ch:chamae2} <>= options(keep.source = TRUE, width = 70) ps.options(pointsize = 15) @ \section{Introduction} \citet{la} proposed using the vector $\beta$ of ordinary least squares (OLS) multiple regression coefficients (not including the intercept) from the regression of relative fitness on quantitative phenotypic characters as ``a general solution to the problem of measuring the forces of directional selection acting directly on the characters'' \citep[p.~1213]{la}. Lande and Arnold actually give two different arguments justifying their procedure. The first argument, parts of which go back to Pearson (1903, cited by Lande and Arnold), involves a quantitative genetic model for the vector $z$ of phenotypic characters. The second argument, which may have been independently invented by Lande and Arnold, but involves what statisticians call Stein's lemma \citep{stein}, does not involve a quantitative genetic model, and is the one we follow here. The assumptions made by \citet[p.~1213]{la}, somewhat generalized, are as follows. \begin{enumerate} \item[(i)] The vector of phenotypic characters $z$ is multivariate normal. \item[(ii)] The conditional expectation of relative fitness $w$ given $z$, denote it \begin{equation} \label{eq:u} u(z) = E(w \mid z), \end{equation} is a continuously differentiable function. \item[(iii)] The expectation \begin{equation} \label{eq:beta-def} \beta = E \{ \nabla u(z) \} \end{equation} exists. \item[(iv)] The limits \begin{align*} \lim_{z_i \to \infty} & u(z) f(z) \\ \lim_{z_i \to - \infty} & u(z) f(z) \end{align*} exist and are zero, letting each coordinate $z_i$ go to plus or minus infinity, holding the rest fixed (\citealp{la}, assume $u$ is bounded, which is a sufficient condition for the limits above to be zero, but not a necessary condition). \end{enumerate} Then it follows by integration by parts \citep[p.~1213]{la} and from the definition of relative fitness, which requires $E(w) = 1$, that \begin{equation} \label{eq:beta} \beta = E \{ \nabla u(z) \} = \var(z)^{-1} \cov(w, z) \end{equation} and for this reason Lande and Arnold call $\beta$ the \emph{directional selection gradient}. They also observe that the obvious empirical estimate of $\beta$, which is \begin{equation} \label{eq:beta-hat} \hat{\beta}_{\text{OLS}} = \widehat{\var}(z)^{-1} \widehat{\cov}(w, z), \end{equation} where the hats on the $\var$ and $\cov$ operators denote the empirical analogs, is the ordinary least squares (OLS) regression estimate of $w$ on $z$, done with intercept, which is not considered part of $\hat{\beta}$. They therefore seem to recommend \eqref{eq:beta-hat} as an estimator of \eqref{eq:beta}; this is not completely clear since they do not actually distinguish between \eqref{eq:beta} and \eqref{eq:beta-hat}, between the parameter $\beta$ and its estimate $\hat{\beta}_{\text{OLS}}$. They do, however, use \eqref{eq:beta-hat} as an estimator of \eqref{eq:beta} in their examples, and a large literature has followed them in this practice (Google Scholar, \url{http://scholar.google.com}, says ``Cited by 816'' as this is written). Assumption (i) above, which we should also note in passing is also required by the ``first argument'' in \citet{la}, implies that $\widehat{\var}(z)$ will be a good estimator of $\var(z)$, so long as the dimension of $z$ is not too large relative to the amount of data. Assumptions (ii) through (iv) above are very weak. They do not even imply, for example, that $\var(w \mid z)$ is finite. They do imply, by the integration by parts argument, that $\cov(w, z)$ exists, which we should also note in passing is also required by the ``first argument'' in \citet{la}, but this does not imply that $\widehat{\cov}(w, z)$ is a good estimator of it. It may be a very bad estimator, in which case \eqref{eq:beta-hat} will be a very bad estimator of \eqref{eq:beta}. In fact, this should be expected when the distribution of $w$ is heavy tailed (since relative fitness $w$ is nonnegative, this means heavy upper tail). To start a search for better estimators, we make the following observation (which follows from the iterated conditional expectation theorem) \begin{equation} \label{eq:iterated} \cov(w, z) = \cov\bigl( u(z), z \bigr), \end{equation} where $u(z)$ is defined by \eqref{eq:u}. This says that we can just as well use \begin{equation} \label{eq:beta-hat-new} \hat{\beta} = \widehat{\var}(z)^{-1} \widehat{\cov}\bigl( \hat{u}(z), z\bigr) \end{equation} as an estimator of \eqref{eq:beta}, where $\hat{u}(z)$ is any estimator of $u(z)$ that we can construct, such as, predicted mean values in any model we happen to have for the conditional distribution of $w$ given $z$. If one uses a conventional linear, normal-theory, OLS regression model, then substituting \eqref{eq:beta-hat-new} for \eqref{eq:beta-hat} will do nothing (regressing OLS predicted values on predictors gives the same regression coefficients as the original regression), but there may be very large differences between \eqref{eq:beta-hat-new} and \eqref{eq:beta-hat} if one uses any other kind of regression model, for example, a generalized linear model \citep{mn} or an aster model \citep{gws}. We make one more generalization, which is not new here, having been used in many examples in the literature \citep{ms}, but for which an adequate theoretical explanation has not yet been provided. Suppose we also have a vector of other variables $e$ that we also wish to add to the regression. These may not be normally distributed, they may be categorical, for example, and may also be nonrandom (fixed by experimental design). Thus they cannot be considered part of the $z$ in the Lande-Arnold argument (because $z$ must be multivariate normal). They may also not be phenotypic characters. They may be environmental (hence our use of $e$ to denote them), but they may also be just ``other.'' We apply the Lande-Arnold argument separately for each value of $e$. We assume \begin{enumerate} \item[(i)] The vector of phenotypic characters $z$ is conditionally multivariate normal given $e$. \item[(ii)] The conditional expectation of relative fitness $w$ given $z$ and $e$, denote it \begin{equation} \label{eq:u-e} u_e(z) = E(w \mid z, e) \end{equation} is a continuously differentiable function of $z$. \item[(iii)] The expectation \begin{equation} \label{eq:beta-def-e} \beta(e) = E \{ \nabla u_e(z) \mid e \} \end{equation} exists for each value of $e$. \item[(iv)] The limits \begin{align*} \lim_{z_i \to \infty} & u_e(z) f(z) \\ \lim_{z_i \to - \infty} & u_e(z) f(z) \end{align*} exist and are zero, letting each coordinate $z_i$ go to plus or minus infinity, holding the rest of the coordinates of $z$ fixed and all coordinates of $e$ fixed. \end{enumerate} Then applying the Lande-Arnold argument for each value of $e$ gives \begin{equation} \label{eq:beta-e} \beta(e) = E \{ \nabla u_e(z) \} = \var(z \mid e)^{-1} \cov(w, z \mid e) \end{equation} This trivial extension of Lande-Arnold theory may not at first sight seem to do much. Typically, there will be far too many $e$ values for $\beta(e)$ to be estimated naively using OLS regression to work. Since we have already seen that OLS regression may be a very bad idea for other reasons, this is no impediment. We proceed as everywhere else in statistics trying to find simple models for the gradient-vector-valued function (of ``other'' variables~$e$) that is $\beta(e)$. For example, the simplest model is that $\beta(e)$ is a constant function, which seems to get us back to using \eqref{eq:beta-hat-new}, but it does not. Recall that $\beta$ or $\hat{\beta}$ uses OLS regression of something (preferably predicted mean values for a good model of the conditional distribution of $w$ given $z$ and $e$) on $z$ and $e$ and drops the intercept. Thus we model $\beta(e)$ as a constant function of $e$ if we do OLS regression of something on $z$ and $e$ with only the intercept depending on $e$. Note that this allows the intercept term to depend on $e$ in an arbitrarily complicated way. We just don't have ``interaction'' terms between $z$ and $e$ in the regression model. If we were going to write the kind of models just discussed in conventional terms, we would write $$ w = \alpha(e) + \beta^T z + \text{error} $$ where $\alpha(e)$ is an arbitrary function (of course, we don't believe the ``+ error'' part of this ``model,'' which is given only to help the reader understand the structure of the assumed regression function). Of course we don't have to stop with this simplest model for $\beta(e)$, but it is interesting that even the simplest case of this theory (with $e$) is nontrivial. When one uses the OLS regression estimator \eqref{eq:beta-hat}, the question arises about how to do statistical inference about $\beta$ when the normality assumptions required for OLS $F$-tests are clearly violated. There is a large literature on this subject \citep[see][and references cited therein]{ms,st}. Transforming $w$ to make it more normal, for example, doing OLS regression of $\log w$ on $z$ is biologically wrong \citep{st}, destroying the logic of the Lande-Arnold theory which requires OLS regression of $w$ (untransformed) on $z$. \citet{la} say that $z$ may be transformed to make it more normal, hence more closely satisfy condition (i) above, but this may or may not make the distribution of $w$ given (transformed) $z$ more nearly satisfy OLS regression assumptions. Sometimes, as in our example (Section~\ref{sec:data} below), it is impossible to transform $w$ to conditional normality given $z$. This happens, in particular, when a sizable fraction of individuals have (measured) fitness zero. The normal distribution is continuous. Any transformation of $w$ has a large atom at whatever point zero is transformed to, hence is far from being continuous. The standard trick $\tilde{w} = \log(w + 1) - 1$, may make the distribution of $\tilde{w}$ have a more normal looking upper tail, but can do nothing about the fact that $\tilde{w}$ is nonnegative with a large atom at zero. Having already decided to abandon OLS regression and use estimators of the form \eqref{eq:beta-hat-new} or their analogs for estimating $\beta(e)$, which we have not provided equations for, we do not have the problem of depending on a procedure whose assumptions are badly violated. If we can find an adequate model for the conditional distribution of $w$ given $z$, we can simulate the sampling distribution of our estimator of $\beta$ or $\beta(e)$ and use the simulations to make confidence intervals or do tests of significance. Strictly speaking, our simulation distribution depends on estimated parameters, hence is a parametric bootstrap. Our simulation will be valid if and only if our model actually is adequate and its estimated parameters are close enough to the true unknown parameters so that the simulation distribution is close to the true unknown distribution of $w$ given~$z$. \citet{ms} considered nonparametric bootstrap and jackknife methods in this situation, but the conditions for those procedures may be violated in many applications, particularly when fitness zero has appreciable probability. \citet{ms} ``wish to emphasize that resampling techniques are not a panacea'' and cite literature cautioning about violations of required assumptions, in particular ``both techniques are based on asymptotic theory (sample size must be `large'), they assume residuals are identically and independently distributed $\ldots.$'' That last point is a killer in the current context. There is no generally accepted way to resample residuals for generalized linear models, much less for aster models. Resampling cases does not mimic the conditional distribution of response given predictors, which is what is crucial in this application. \section{Data} \label{sec:data} We reanalyze a subset of the data analyzed by \citet{es}. These data are in the \texttt{chamae2} dataset in the \texttt{aster} contributed package to the R statistical computing environment. This dataset is restricted to the Minnesota site of the original (larger) data. These data are already in ``long'' format, no need to use the \texttt{reshape} function on them to do aster analysis. We will, however, need the ``wide'' format for Lande-Arnold analysis. So we do that, before making any changes (we will add newly defined variables) to \texttt{chamae2}. <>= library(aster) data(chamae2) chamae2w <- reshape(chamae2, direction = "wide", timevar = "varb", v.names = "resp", varying = list(levels(chamae2$varb))) names(chamae2w) @ Individuals of \emph{Chamaecrista fasciculata} (common name, partridge pea) were obtained from three locations in the country and planted in three field sites. Of the complete data we only reanalyze here individuals from one site (Minnesota). For each individual, many characteristics were measured, three of which we consider phenotypic characters (so our $z$ is three-dimensional), and others which combine to make up an estimate of fitness. The three phenotypic characters are reproductive stage (\verb@STG1N@), log leaf number (\verb@LOGLVS@), and log leaf thickness (\verb@LOGSLA@). ``At the natural end of the growing season, [they] recorded total pod number and seed counts from three representative pods; from these measures, [they] estimated [fitness]'' \citep[further explained in their note 12]{es}. % There are several complications in the estimation of fitness. % Some fruits (pods) had already dehisced by the time data were collected, % so seeds could not be counted. The number of dehisced fruits are not recorded % in the data we are working with (although they could be reconstructed from % the original data). Some individuals had fewer than three (non-dehisced) % fruits to count. Furthermore, preliminary analysis (not shown in this % chapter, see Chapter~\ref{ch:chamae}) seems to indicate that seed count % is nearly constant, having % little effect on fitness. Hence to avoid all of these complications, % in this chapter, we analyze fruit count only as a measure of fitness. In this chapter we take fruit count as the measure of fitness because that gives the best example of aster analysis. (In Chapter~\ref{ch:chamae} we combine fruit count and seed count, but the aster analysis is not as simple as in this chapter.) We model fruit count as having a zero-inflated negative binomial distribution. The zero inflation allows for excess (or deficit) of individuals having zero fruit (over and above the small number of zeros that would occur if the distribution were pure negative binomial). In an aster model this is done by having a Bernoulli node followed by a zero-truncated negative binomial node (each individual having a simple graph with two nodes). This means the event that an individual has one or more fruits is modeled as Bernoulli, and the distribution of the number of fruit given that the number is at least one is modeled as zero-truncated negative binomial. \section{Aster Analysis} Then we set up the aster model framework. <>= vars <- c("fecund", "fruit") pred <- c(0, 1) @ We need to choose the non-exponential-family parameters (sizes) for the negative binomial distributions, since the \verb@aster@ package only does maximum likelihood for exponential family parameters. We start with the following values, which were chosen with knowledge of the maximum likelihood estimates for these parameters, which we find in Section~\ref{app:mle}. <>= load("chamae2-alpha.rda") print(alpha.fruit) famlist <- list(fam.bernoulli(), fam.truncated.negative.binomial(size = alpha.fruit, truncation = 0)) fam <- c(1,2) @ We can now fit our first aster model. <>= out1 <- aster(resp ~ varb + BLK, pred, fam, varb, id, root, data = chamae2, famlist = famlist) summary(out1, show.graph = TRUE) @ The ``response'' \verb@resp@ is a numeric vector containing all the response variables (\verb@fecund@ and \verb@fruit@). The ``predictor'' \verb@varb@ is a factor with two levels distinguishing with \verb@resp@ which original response variable an element is. The predictor \verb@BLK@ has not been mentioned so far. It is block within the field where the plants were grown. % Delete as per Ruth criticism. % A more complicated model (that still has no phenotypic variables % as predictors) is \verb@varb * BLK@. % <>= % out1foo <- aster(resp ~ varb * BLK, pred, fam, varb, id, root, % data = chamae2, famlist = famlist) % summary(out1foo) % anova(out1, out1foo) % @ % As the analysis of deviance table shows, this model does not fit % significantly better, so we do not use it. Now we add phenotypic variables. <>= out2 <- aster(resp ~ varb + BLK + LOGLVS + LOGSLA + STG1N, pred, fam, varb, id, root, data = chamae2, famlist = famlist) summary(out2, info.tol = 1e-9) @ One might think we should use \verb@varb * (LOGLVS + LOGSLA + STG1N)@. <>= out2foo <- aster(resp ~ BLK + varb * (LOGLVS + LOGSLA + STG1N), pred, fam, varb, id, root, data = chamae2, famlist = famlist) summary(out2foo, info.tol = 1e-11) feig <- eigen(out2foo$fisher, symmetric = TRUE, only.values = TRUE)$values range(feig) anova(out2, out2foo) @ It turns out the Fisher information is nearly singular, as shown by the need for the \verb@info.tol = 1e-11@ argument to the \verb@summary@ command and also by the eigenvalues of the Fisher information matrix, the condition number (ratio of largest and smallest eigenvalues) being $\Sexpr{round((max(feig) / min(feig)) / 10^(floor(log10(max(feig) / min(feig)))), 1)} \times 10^{\Sexpr{floor(log10(max(feig) / min(feig)))}}$. Thus we do not use this model. % Delete as per Ruth criticism. % Moreover, we do not wish to use up all our degrees of freedom (which % we clearly now have) before trying quadratic models. An alternative model with the same number of parameters as \verb@out2@ puts in the regression coefficients only at the ``fitness'' level (here \verb@fruit@). This is similar to the example in \citet{gws}. Because we are fitting an unconditional aster model, the effects of these terms are passed down to \verb@fecund@. <>= foo <- as.numeric(as.character(chamae2$varb) == "fruit") chamae2$LOGLVSfr <- chamae2$LOGLVS * foo chamae2$LOGSLAfr <- chamae2$LOGSLA * foo chamae2$STG1Nfr <- chamae2$STG1N * foo out6 <- aster(resp ~ varb + BLK + LOGLVSfr + LOGSLAfr + STG1Nfr, pred, fam, varb, id, root, data = chamae2, famlist = famlist) summary(out6, info.tol = 1e-9) @ It is not possible to compare \verb@out2@ and \verb@out6@ by standard methods (likelihood ratio test) because the models are not nested. They seem to fit equally well, and \verb@out6@ more directly models the relation of fitness (here defined as \verb@fruit@) to phenotypic variables. <>= boo2 <- summary(out2, info.tol = 1e-11)$coefficients boo6 <- summary(out6, info.tol = 1e-9)$coefficients poo2 <- boo2["STG1N", "Pr(>|z|)"] poo6 <- boo6["STG1Nfr", "Pr(>|z|)"] poo2.exp <- floor(log10(poo2)) poo2.frac <- poo2 / 10^poo2.exp poo6.exp <- floor(log10(poo6)) poo6.frac <- poo6 / 10^poo6.exp @ In both \verb@out2@ and \verb@out6@ the regression coefficient for the phenotypic variable \verb@STG1N@ is statistically significant ($P = \Sexpr{round(poo2.frac, 1)} \times 10^{\Sexpr{poo2.exp}}$ and $P = \Sexpr{round(poo6.frac, 1)} \times 10^{\Sexpr{poo6.exp}}$, respectively, although one should treat with extreme caution any $P$-values printed out when the \verb@info.tol@ argument is used in the \verb@summary@ call). We now want to model the canonical parameter corresponding to fitness as a quadratic function of phenotype variables. Since \verb@STG1N@ is quite discrete, taking only six theoretically possible values with the vast majority of individuals having only two of these, <>= sort(unique(chamae2w$STG1N)) tabulate(chamae2w$STG1N) @ it does not make sense to model fitness as a quadratic function of \verb@STG1N@. Thus we drop this phenotypic variable from the analysis. <>= out7 <- aster(resp ~ varb + BLK + LOGLVSfr + LOGSLAfr, pred, fam, varb, id, root, data = chamae2, famlist = famlist) summary(out7) @ Now we add quadratic terms, looking for a maximum of the fitness function. <>= out8 <- aster(resp ~ varb + BLK + LOGLVSfr + LOGSLAfr + I(LOGLVSfr^2) + I(LOGSLAfr^2) + I(2 * LOGLVSfr * LOGSLAfr), pred, fam, varb, id, root, data = chamae2, famlist = famlist) summary(out8, info.tol = 1e-9) anova(out7, out8) @ The quadratic term is highly significant. We could try more experimentation with different models, but we deem this one satisfactory and base our analysis on this (\verb@out8@). See Section~\ref{sec:fit} below for residual analysis. \section{The Regression Function for Fitness} In this section we examine the regression function $u_e(z)$ that corresponds to the model. For simplicity, we answer a slightly different question: what is $E(W \mid z, e)$ changing from $w$ (relative fitness) to $W$ (raw fitness). Since $E(w \mid z, e)$ and $E(W \mid z, e)$ are proportional, they have maxima, minima, or saddle points (as the case may be) in the same place. By assumption (in this chapter) $W$ is the variable \verb@fruit@, a canonical statistic of the aster model. Thus its conditional expectation is the corresponding mean value parameter ($\tau$ in the notation of \citealp{gws}). Since the map from canonical parameter ($\varphi$ in the notation of \citealp{gws}) is monotone increasing, it maps maxima to maxima, minima to minima, and saddle points to saddle points (see Section~\ref{sec:mono} for \label{pg:mono} details). Thus we can look at $\varphi$ as a function of the phenotypic predictor variables (which is quadratic) to see which type of stationary point we have (if we have any, which we must because a quadratic function always has stationary points) and where the stationary point is. The regression function for canonical parameter, ignoring intercept terms (which do not affect the location or type of stationary point and which depend on the environmental or ``other'' predictors $e$), is quadratic, letting $\varphi_1$ denote the canonical parameter for \verb@LOGLVSfr@ and $\varphi_2$ denote the canonical parameter for \verb@LOGSLAfr@, it is of the form \begin{equation} \label{eq:regfun} \bolda^T \boldvarphi + \boldvarphi^T \boldA \boldvarphi \end{equation} where $\bolda$ is a vector and $\boldA$ a matrix defined as follows. <>= a1 <- out8$coefficients["LOGLVSfr"] a2 <- out8$coefficients["LOGSLAfr"] a <- c(a1, a2) A11 <- out8$coefficients["I(LOGLVSfr^2)"] A22 <- out8$coefficients["I(LOGSLAfr^2)"] A12 <- out8$coefficients["I(2 * LOGLVSfr * LOGSLAfr)"] A <- matrix(c(A11, A12, A12, A22), 2, 2) @ Since the eigenvalues of $\boldA$ are all negative <>= eigen(A, symmetric = TRUE, only.values = TRUE)$values @ the regression function has a (unique) maximum. The derivative of \eqref{eq:regfun} being $$ \bolda^T + 2 \boldvarphi^T \boldA $$ the maximum is achieved at the point $$ - \tfrac{1}{2} \boldA^{-1} \bolda $$ which is computed in R by <>= max8 <- (- solve(A, a) / 2) print(max8) @ Figure~\ref{fig:surf} (page~\pageref{fig:surf}) shows the scatterplot of data values for \texttt{LOGLVS} and \texttt{LOGSLA} and the contours of the estimated quadratic fitness function \eqref{eq:regfun}. It is made by the following code. <>= plot(chamae2w$LOGLVS, chamae2w$LOGSLA, xlab = "LN", ylab = "SLA") ufoo <- par("usr") nx <- 101 ny <- 101 z <- matrix(NA, nx, ny) x <- seq(ufoo[1], ufoo[2], length = nx) y <- seq(ufoo[3], ufoo[4], length = ny) points(max8[1], max8[2], pch = 19) for (i in 1:nx) { for (j in 1:ny) { b <- c(x[i], y[j]) z[i, j] <- sum(a * b) + as.numeric(t(b) %*% A %*% b) } } b <- as.numeric(max8) # z <- z - sum(a * b) + as.numeric(t(b) %*% A %*% b) contour(x, y, z, add = TRUE) contour(x, y, z, levels = c(0.325), add = TRUE) @ \begin{figure} \begin{center} <>= <> @ \end{center} \caption{Scatterplot of \texttt{LOGLVS} versus \texttt{LOGSLA} with contours of the estimated quadratic fitness function. Variable names in axis labels changes to \texttt{LN} and \texttt{SLA}, respectively to agree with paper, which used these names. Solid dot is the point where the estimated fitness function achieves its maximum. Note ``z'' coordinate is only a monotone function of fitness \eqref{eq:regfun}, not actual fitness (compare Figure~\ref{fig:nerf}).} \label{fig:surf} \end{figure} \section{Lande-Arnold Analysis} In contrast to the aster analysis, the Lande-Arnold analysis is very simple. <>= chamae2w$relfit <- chamae2w$fruit / mean(chamae2w$fruit) lout1 <- lm(relfit ~ LOGLVS + LOGSLA + STG1N, data = chamae2w) summary(lout1) @ The information contained in the printout of \verb@summary(lout1)@ with the exception of the \verb@Estimate@ column is unreliable because the OLS model assumptions are not satisfied, as acknowledged by \citet{es} and \citet{etterson}. Therefore measures of statistical significance including standard errors (\verb@Std. Error@ column), $t$-statistics (\verb@t value@ column), and $P$-values (\verb@Pr(>|t|)@ column) are erroneous. Also the \verb@(Intercept)@ regression is of no interest (not part of $\beta$ or $\gamma$). We can also estimate $\beta(e)$ as a constant function, where $e$ is \verb@BLK@, our comments about that applying to OLS regression estimates as well as to (our as yet to be determined ``better'' estimates). <>= lout2 <- lm(relfit ~ BLK + LOGLVS + LOGSLA + STG1N, data = chamae2w) summary(lout2) @ Note that if we paid attention to the putative $P$-values from the OLS regression (which we should not because the assumptions for OLS do not hold) we would make the wrong decision about which phenotypic variable to drop from the analysis (dropping \verb@LOGSLA@ and keeping \verb@STG1N@). Note also that there is no reason why predictors that are important for the regression function itself are important for its average gradient (even assuming that the assumptions for Stein's lemma hold). Thus there is no reason why our aster analysis should agree with the Lande-Arnold analysis. When the analyses disagree, our approach, which directly estimates the regression function $u(z)$ or $u_e(z)$ itself, has more direct evolutionary implications. \citet{la} also gave a justification for quadratic regression with an appeal to integration by parts (Stein's lemma). Let us try that, using the same predictors as for our aster model \verb@out8@ so that we get a direct comparison. <>= lout8 <- lm(relfit ~ BLK + LOGLVS + LOGSLA + I(LOGLVS^2) + I(LOGSLA^2) + I(2 * LOGLVS * LOGSLA), data = chamae2w) summary(lout8) @ Note that the Lande-Arnold analysis seems to have gotten the wrong sign for the coefficient of \verb@I(LOGLVS^2)@, where ``wrong'' means disagreeing with the aster analysis. Again, this is no surprise. The OLS model is not actually trying to fit the regression surface so there is no surprise that it tells us little about the regression surface and what it does say is misleading. Since we cannot tell whether a matrix is positive definite or indefinite by looking at its coefficients, we must form the matrix of coefficients of quadratic terms and look at its eigenvalues to see what we really have. <>= a1 <- lout8$coefficients["LOGLVS"] a2 <- lout8$coefficients["LOGSLA"] olsa <- c(a1, a2) A11 <- lout8$coefficients["I(LOGLVS^2)"] A22 <- lout8$coefficients["I(LOGSLA^2)"] A12 <- lout8$coefficients["I(2 * LOGLVS * LOGSLA)"] olsA <- matrix(c(A11, A12, A12, A22), 2, 2) @ Since the eigenvalues of $\boldA$ are all negative <>= eigen(olsA, symmetric = TRUE, only.values = TRUE)$values @ Since one eigenvalue is positive and the other negative the OLS regression suggests dispersive selection in one direction and stabilizing selection in another. Of course, since the assumptions for OLS regression are not met, there is no reason to trust this ``suggestion''. Figure~\ref{fig:gurf} (page~\pageref{fig:gurf}) is just like Figure~\ref{fig:surf} except it shows the quadratic fitness function estimated by OLS (the Lande-Arnold analysis) rather than by aster. \begin{figure} \begin{center} <>= plot(chamae2w$LOGLVS, chamae2w$LOGSLA, xlab = "LN", ylab = "SLA") ufoo <- par("usr") nx <- 101 ny <- 101 z <- matrix(NA, nx, ny) x <- seq(ufoo[1], ufoo[2], length = nx) y <- seq(ufoo[3], ufoo[4], length = ny) for (i in 1:nx) { for (j in 1:ny) { b <- c(x[i], y[j]) z[i, j] <- sum(olsa * b) + as.numeric(t(b) %*% olsA %*% b) } } contour(x, y, z, add = TRUE) # contour(x, y, z, levels = c(0.315, 0.3175), add = TRUE) @ \end{center} \caption{Scatterplot of \texttt{LOGLVS} versus \texttt{LOGSLA} with contours of the quadratic fitness function estimated by OLS. As in Figure~\ref{fig:surf} variable names in axis labels changes to \texttt{LN} and \texttt{SLA}, respectively to agree with paper, which used these names. This surface does not have a maximum, so, unlike Figure~\ref{fig:surf} no maximum is shown.} \label{fig:gurf} \end{figure} \section{More on Quadratic Lande-Arnold Analysis} With $u(z)$ defined by \eqref{eq:u}, \citet{la} defined \begin{subequations} \begin{align} \beta = E \{ \nabla u(z) \} \label{eq:beta-def-too} \\ \gamma = E \{ \nabla^2 u(z) \} \label{eq:gamma-def} \end{align} \end{subequations} calling \eqref{eq:beta-def-too} the \emph{directional selection gradient} and \eqref{eq:gamma-def} the \emph{stabilizing selection gradient}. Equation \eqref{eq:beta-def} just repeats \eqref{eq:beta-def} above. Note that $\beta$ is a vector and $\gamma$ is a matrix. Because $\beta$ and $\gamma$ are averages over the assumed multivariate normal distribution of $z$, it may not make sense to draw plots like our Figure~\ref{fig:gurf}. In fact, such a figure is the fitness landscape $u(z)$ only if $\nabla^2 u(z)$ is constant (not, in fact, a function of $z$) and the fitness landscape is actually quadratic. \citet{la} show that, if $z$ is jointly mean-zero multivariate normal, then $\beta$ and $\gamma$ can be estimated by the linear and quadratic regression coefficients in the OLS regression of relative fitness on $z$. This depends crucially on $z$ being jointly multivariate normal. In our data, $z$ is the two-dimensional vector with components \texttt{LOGLVS} and \texttt{LOGSLA}. For our data (and for most data sets), this multivariate normality assumption is not correct, although perhaps not too badly violated for these two variables. The scatterplot of these variables is the pattern of dots in Figure~\ref{fig:surf} or~\ref{fig:gurf} and although not elliptical is not too far from that. One technical issue that we ignored, is that \citet{la} require that predictor variables be centered, so that (again assuming joint multivariate normality) the estimates for $\beta$ will be the same whether or not the quadratic term is added. Thus we try this. <>= chamae2w$cLOGLVS <- chamae2w$LOGLVS - mean(chamae2w$LOGLVS) chamae2w$cLOGSLA <- chamae2w$LOGSLA - mean(chamae2w$LOGSLA) lout1too <- lm(relfit ~ BLK + cLOGLVS + cLOGSLA, data = chamae2w) summary(lout1too) lout1took <- lm(relfit ~ BLK + LOGLVS + LOGSLA, data = chamae2w) summary(lout1took) as.numeric(coefficients(lout1too)[4:5]) as.numeric(coefficients(lout1took)[4:5]) @ Note that centering does not change the estimates of $\beta$. Now for the quadratic regression. <>= lout8too <- lm(relfit ~ BLK + cLOGLVS + cLOGSLA + I(cLOGLVS^2) + I(cLOGSLA^2) + I(2 * cLOGLVS * cLOGSLA), data = chamae2w) summary(lout8too) as.numeric(coefficients(lout8too)[6:8]) as.numeric(coefficients(lout8)[6:8]) @ Note that centering does not change the estimates of $\gamma$. However, the estimates of $\beta$ do change depending on whether the quadratic terms are in or out. <>= as.numeric(coefficients(lout1too)[4:5]) as.numeric(coefficients(lout8too)[4:5]) @ They do not change much, but they do change. We may regard this as a failure of the multivariate normality assumption about the phenotype vector, since \citet{la} state that the coefficients would not change if $z$ were multivariate normal. \section{Goodness of Fit} \label{sec:fit} In this section we examine goodness of fit to the assumed conditional distributions for \verb@fruit@ given \verb@fecund == 1@ by looking at a residual plot. Residual analysis of generalized linear models (GLM) is tricky. (Our aster model becomes a GLM when we consider only the conditional distribution associated with one arrow.) Many different residuals have been proposed \cite{ds}. We start with the simplest, so called Pearson residuals. <>= xi.hat <- predict(out8, model.type = "cond", parm.type = "mean") xi.hat <- matrix(xi.hat, nrow = nrow(out8$x), ncol = ncol(out8$x)) theta.hat <- predict(out8, model.type = "cond", parm.type = "canon") theta.hat <- matrix(theta.hat, nrow = nrow(out8$x), ncol = ncol(out8$x)) @ <>= woof <- chamae2w$fruit[chamae2w$fecund == 1] range(woof) nwoof <- length(woof) woof.theta <- theta.hat[chamae2w$fecund == 1, 2] woof.xi <- xi.hat[chamae2w$fecund == 1, 2] wgrad <- double(nwoof) winfo <- double(nwoof) for (i in 1:nwoof) { wgrad[i] <- famfun(famlist[[2]], deriv = 1, woof.theta[i]) winfo[i] <- famfun(famlist[[2]], deriv = 2, woof.theta[i]) } all.equal(woof.xi, wgrad) pearson <- (woof - woof.xi) / sqrt(winfo) @ Figure~\ref{fig:fig-pearson-fruit} (page~\pageref{fig:fig-pearson-fruit}) shows the scatter plot of the Pearson residuals for fruit count plotted against the expected fruit count given that fruit count is nonzero (for each individual) for individuals with nonzero fitness only. \begin{figure} \begin{center} <>= plot(woof.xi, pearson, xlab = "fitted values", ylab = "Pearson residuals") @ \end{center} \caption{Pearson residuals for fruit count given nonzero fitness plotted against fitted values.} \label{fig:fig-pearson-fruit} \end{figure} \section{Maximum Likelihood Estimation of Size} \label{app:mle} The \verb@aster@ function does not calculate the correct likelihood when the size parameters are considered unknown, because it drops terms that do not involve the exponential family parameters. However, the full log likelihood is easily calculated in R. <>= x <- out8$x logl <- function(alpha.fruit, theta, x) { x.fecund <- x[ , 1] theta.fecund <- theta[ , 1] p.fecund <- 1 / (1 + exp(- theta.fecund)) logl.fecund <- sum(dbinom(x.fecund, 1, p.fecund, log = TRUE)) x.fruit <- x[x.fecund == 1, 2] theta.fruit <- theta[x.fecund == 1, 2] p.fruit <- (- expm1(theta.fruit)) logl.fruit <- sum(dnbinom(x.fruit, size = alpha.fruit, prob = p.fruit, log = TRUE) - pnbinom(0, size = alpha.fruit, prob = p.fruit, lower.tail = FALSE, log = TRUE)) logl.fecund + logl.fruit } @ We then calculate the profile likelihood for the size parameter \verb@alpha.fruit@ maximizing over the other parameters, evaluating the profile log likelihood on a grid of points. <>= alpha.fruit.seq <- seq(1.5, 4.5, 0.25) logl.seq <- double(length(alpha.fruit.seq)) for (i in 1:length(alpha.fruit.seq)) { famlist.seq <- famlist famlist.seq[[2]] <- fam.truncated.negative.binomial(size = alpha.fruit.seq[i], truncation = 0) out8.seq <- aster(out8$formula, pred, fam, varb, id, root, data = chamae2, famlist = famlist.seq, parm = out8$coefficients) theta.seq <- predict(out8.seq, model.type = "cond", parm.type = "canon") dim(theta.seq) <- dim(x) logl.seq[i] <- logl(alpha.fruit.seq[i], theta.seq, x) } ##### interpolate ##### alpha.foo <- seq(min(alpha.fruit.seq), max(alpha.fruit.seq), 0.01) logl.foo <- spline(alpha.fruit.seq, logl.seq, n = length(alpha.foo))$y imax <- seq(along = alpha.foo)[logl.foo == max(logl.foo)] alpha.fruit <- alpha.foo[imax] save(alpha.fruit, file = "chamae2-alpha.rda", ascii = TRUE) @ At the end of this chunk we save the maximum likelihood estimates in a file which is read in at the beginning of this chapter. Figure~\ref{fig:contour} (page~\pageref{fig:contour}) shows the profile log likelihood for the size parameters. \begin{figure} \begin{center} <>= plot(alpha.fruit.seq, logl.seq - max(logl.foo), ylab = "log likelihood", xlab = expression(alpha)) lines(alpha.foo, logl.foo - max(logl.foo)) points(alpha.foo[imax], 0, pch = 19) @ \end{center} \caption{Profile log likelihood for size parameter for the (zero-truncated) negative binomial distribution of fruit. Hollow dots are points at which the log likelihood was evaluated exactly. Curve is the interpolating spline. Solid dot is maximum likelihood estimate.} \label{fig:contour} \end{figure} \section{OLS Diagnostic Plots} \label{sec:ols-diag} Although unnecessary because we know the assumptions justifying OLS are badly violated, here are some diagnostic plots for the OLS regression. Figure~\ref{fig:foo1} (page~\pageref{fig:foo1}) shows the plot of residuals versus fitted values made by the R statement <>= plot(lout8, which = 1, add.smooth = FALSE, id.n = 0, sub.caption = "", caption = "") @ \begin{figure} \begin{center} <>= <> @ \end{center} \caption{Residuals versus Fitted plot for OLS fit with blocks.} \label{fig:foo1} \end{figure} Figure~\ref{fig:foo2} (page~\pageref{fig:foo2}) shows the Normal Q-Q (quantile-quantile) plot made by the R statement <>= plot(lout8, which = 2, id.n = 0, sub.caption = "") @ \begin{figure} \begin{center} <>= <> @ \end{center} \caption{Normal Q-Q plot for OLS fit with blocks.} \label{fig:foo2} \end{figure} % Both look terrible. Clearly the errors are highly non-normal % (a fact we did not need plots to know). \section{Monotone Transformation of Fitness} \label{sec:mono} This section makes explicit the monotonicity argument used on p.~\pageref{pg:mono} that allows us to look at the quadratic surface shown in Figure~\ref{fig:surf} rather than the fitness surface (landscape) itself. This argument holds whenever an unconditional aster model is used and one component of the canonical statistic is taken to be observed fitness and the corresponding canonical parameter the only parameter taken to be a function of phenotypic predictor variables $z$. The argument is very simple but a bit confusing since it applies to a aster model that is not the actual model used. Consider any unconditional aster model. Then $$ \varphi = a + M \beta $$ gives the relation between the unconditional canonical parameter vector $\varphi$ and the vector of regression coefficients $\beta$ and $$ Y = M^T X $$ gives the relation between the original data vector $X$ and the canonical statistic vector $Y$ (which is the minimal sufficient statistic). These are (8) and (9) in \citet{gws}. We are supposing that one component of $Y$, say $Y_k$ is observed fitness Let $$ w = E_\beta(Y_k) $$ be expected fitness. Then the relationship between expected fitness and the corresponding canonical parameter $\beta_k$ is given by (22) in \citet{gws} $$ \frac{\partial w}{\beta_k} = \frac{\partial E_\beta(Y_k)}{\partial \beta_k} > 0 $$ which means that, other components of $\beta$ being held fixed, $w$ is a strictly increasing function of $\beta_k$. Now suppose that we make $\beta_k$ and no other component of $\beta$ a function of phenotypic predictor variables $z$. Then by the argument given above $$ w(z) = E_{\beta(z)}(Y_k) $$ is a strictly increasing function of $\beta_k(z)$ other predictor variables being held fixed. Hence $w(z)$ has a maximum where $\beta_k(z)$ has a maximum, and so forth. That is the end of the argument, but there is still the tricky part to deal with. When we actually fit an aster model, we cannot have regression coefficients that are arbitrary functions of predictors. We have to write them as linear functions of other coefficients. So in our example (p.~\pageref{pg:mono}) we actually had $$ \beta_k(z) = \beta_{k 1} z_1 + \beta_{k 2} z_2 + \beta_{k 1 1} z_1^2 + 2 \beta_{k 1 2} z_1 z_2 + \beta_{k 2 2} z_2^2 $$ So what is one regression coefficient $\beta_k$ in our argument becomes several regression coefficients in the actual model. So let us repeat the argument keeping track of what is fitness in the actual example. There fitness is fruit count. Therefore our argument requires that phenotypic predictors only directly affect fruit count (only involve fruit count variables), and this is what we have done by creating the predictor variables \verb@LOGLVSfr@ and \verb@LOGSLAfr@. \section{Plotting the Fitness Landscape} \label{sec:fit-surf} If one does not want to use the argument of the preceding section or if one wants to see the actual (modeled) fitness surface rather than a plot like Figure~\ref{fig:surf}, which is only a monotone transformation of the fitness surface, this section shows how to do that. Figure~\ref{fig:nerf} (page~\pageref{fig:nerf}) shows the scatterplot of data values for \texttt{LOGLVS} and \texttt{LOGSLA} and the contours of the estimated fitness function that is the transform of \eqref{eq:regfun} to the mean value parameter scale. It is made by the following code. <>= plot(chamae2w$LOGLVS, chamae2w$LOGSLA, xlab = "LN", ylab = "SLA") points(max8[1], max8[2], pch = 19) ufoo <- par("usr") nx <- 101 ny <- 101 x <- seq(ufoo[1], ufoo[2], length = nx) y <- seq(ufoo[3], ufoo[4], length = ny) xx <- outer(x, y^0) yy <- outer(x^0, y) xx <- as.vector(xx) yy <- as.vector(yy) n <- length(xx) newdata1 <- data.frame( BLK = factor(rep("1", n), levels = levels(out8$data$BLK)), varb = factor(rep("fecund", n), levels = levels(out8$data$varb)), LOGLVSfr = rep(0, n), LOGSLAfr = rep(0, n), resp = rep(1, n), id = 1:n) newdata2 <- data.frame( BLK = factor(rep("1", n), levels = levels(out8$data$BLK)), varb = factor(rep("fruit", n), levels = levels(out8$data$varb)), LOGLVSfr = xx, LOGSLAfr = yy, resp = rep(1, n), id = 1:n) newdata <- rbind(newdata1, newdata2) pout <- predict(out8, newdata = newdata, varvar = varb, idvar = id, root = rep(1, 2 * n)) zz <- pout[newdata$varb == "fruit"] zz <- matrix(zz, nx, ny) contour(x, y, zz, add = TRUE) contour(x, y, zz, levels = c(10, 25), add = TRUE) @ \begin{figure} \begin{center} <>= <> @ \end{center} \caption{Scatterplot of \texttt{LOGLVS} versus \texttt{LOGSLA} with contours of the estimated fitness landscape. Variable names in axis labels changes to \texttt{LN} and \texttt{SLA}, respectively to agree with paper, which used these names. Solid dot is the point where the estimated fitness function achieves its maximum. Compare Figure~\ref{fig:surf}.} \label{fig:nerf} \end{figure} \section{Diagnostic Plots for Paper} Here we just put Figure~\ref{fig:fig-pearson-fruit} and Figure~\ref{fig:foo1} in one plot. \begin{figure} \begin{center} <>= par(mfrow = c(2, 1), mar = c(1, 3, 1, 1) + 0.1) plot(woof.xi, pearson, xlab = "", ylab = "", axes = FALSE) box() axis(side = 2) abline(h = 0) usr <- par("usr") xrelpos <- 0.9 xpos <- (1 - xrelpos) * usr[1] + xrelpos * usr[2] text(xpos, 6, "A", cex = 2) library(MASS) plot(fitted(lout8), stdres(lout8), xlab = "", ylab = "", axes = FALSE) box() axis(side = 2) abline(h = 0) usr <- par("usr") xpos <- (1 - xrelpos) * usr[1] + xrelpos * usr[2] text(xpos, 3, "B", cex = 2) @ \end{center} \caption{Diagnostic Plots. A: Pearson residuals for fruit count given nonzero fitness plotted against fitted values. B: standardized OLS residuals for fruit count plotted against fitted values.} \label{fig:diag} \end{figure}