\documentclass{article} \usepackage{amsmath} \usepackage{amscd} \usepackage[tableposition=top]{caption} \usepackage{ifthen} \usepackage{url} \usepackage[utf8]{inputenc} \usepackage[colorlinks=true]{hyperref} \let\code=\texttt %\VignetteEngine{knitr::knitr} \begin{document} \title{A Knitr Demo} \author{Charles J. Geyer} \maketitle \section{Licence} This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License \url{http://creativecommons.org/licenses/by-sa/4.0/}. \section{R} The version of R used to make this document is \Sexpr{getRversion()}. The version of the \texttt{knitr} package used to make this document is \Sexpr{packageVersion("knitr")}. The version of the \texttt{xtable} package used to make this document is \Sexpr{packageVersion("xtable")}. \section{Introduction} This is a demo for using the R package \verb@knitr@. To get started make a regular \LaTeX\ file (like this one) but give it the suffix \verb@.Rnw@ instead of \verb@.tex@ and then turn it into a PDF file (\verb@bar.pdf@) with the (unix) commands \begin{verbatim} R CMD Sweave --pdf bar.Rnw \end{verbatim} (as always with \LaTeX\ one may have to rerun this command multiple times until it stops complaining about needing to be rerun). For this to work R needs to have package \texttt{knitr} installed. This works because of the comment \begin{verbatim} %\VignetteEngine{knitr::knitr} \end{verbatim} (otherwise \texttt{R CMD Sweave} would expect \texttt{Sweave} rather than \texttt{knitr}). Now include R in our document. Here's a simple example <>= 2 + 2 @ This is not \LaTeX. It is a ``code chunk'' processed by \verb@knitr@. When \verb@knitr@ hits such a thing, it processes it, runs R to get the results, and stuffs (by default) the output in the \LaTeX\ file it is creating. The \LaTeX\ between code chunks is copied verbatim (except for \verb@Sexpr@, about which see below). Hence to create an Rnw document you just write plain old \LaTeX\ interspersed with ``code chunks'' which are plain old R. \section{Plots} \subsection{Make Up Data} Plots get a little more complicated. First we make something to plot (simulate regression data). <>= n <- 50 x <- seq(1, n) a.true <- 3 b.true <- 1.5 y.true <- a.true + b.true * x s.true <- 17.3 y <- y.true + s.true * rnorm(n) out1 <- lm(y ~ x) summary(out1) @ \subsection{Figure with Code to Make It Shown} Figure~\ref{fig:one} (p.~\pageref{fig:one}) is produced by the following code <>= plot(x, y) abline(out1) @ \begin{figure} \begin{center} <>= <> @ \end{center} \caption{Scatter Plot with Regression Line} \label{fig:one} \end{figure} Note that \verb@x@, \verb@y@, and \verb@out1@ are remembered from the preceding code chunk. We don't have to regenerate them. All code chunks are part of one R ``session''. This was a little tricky. We did this with two code chunks, one visible and one invisible. See the source file \texttt{bar.Rnw} to see how it is done. The first code chunk just shows the code (so we can see it above). It has the optional argument \verb@eval=FALSE@ which makes it not actually do anything. Then the second code chunk (which we do not see) actually makes the plot. It has optional argument \verb@echo=FALSE@ which says it should not be shown (because it would be shown in the figure, which is not what we want). The second code chunk does not repeat the code of the first chunk. Instead it ``quotes'' the first chunk, just reuses its code. This follows the DRY/SPOT rule (\emph{don't repeat yourself} or \emph{single point of truth}) so we only have one bit of code for generating the plot. What the reader sees is guaranteed to be the code that made the plot. If we had used cut-and-paste, just repeating the code, the duplicated code might get out of sync after edits. So making a figure is a bit more complicated in some ways but much simpler in others. Note the following virtues \begin{itemize} \item The figure is guaranteed to be the one described by the text (at least by the R in the text). \item No messing around with sizing or rotations. It just works! \end{itemize} \subsection{Figure with Code to Make It Not Shown} For this example we do a cubic regression on the same data. <>= out3 <- lm(y ~ x + I(x^2) + I(x^3)) summary(out3) @ \label{pg:cubic} \begin{figure} \begin{center} <>= plot(x, y) curve(predict(out3, newdata=data.frame(x=x)), add = TRUE) @ \end{center} \caption{Scatter Plot with Cubic Regression Curve} \label{fig:two} \end{figure} If you don't care to show the R code to make a figure, it is simpler still. Figure~\ref{fig:two} (p.~\pageref{fig:two}) shows the plot of the (cubic) regression function for these data. It is made by one code chunk with optional argument \verb@echo=FALSE@ so it makes a figure and does not show the code. Also note that every time we rerun \verb@knitr@ Figures~\ref{fig:one} and~\ref{fig:two} change because the simulated data are random. Everything just works. This should tell you the main virtue of \texttt{knitr}. It's always correct. There is never a problem with stale cut-and-paste. \section{R in Text} This section illustrates the command \verb@\Sexpr@, which allows running text to be computed by R. We show some numbers calculated by R interspersed with text. The quadratic and cubic regression coefficients in the preceding regression (page~\pageref{pg:cubic}) were $\beta_2 = \Sexpr{formatC(out3$coef[3], format = "f", digits = 4)}$ and $\beta_3 = \Sexpr{formatC(out3$coef[4], format = "f", digits = 4)}$. Magic! In order for your document to be truly reproducible, you must never cut-and-paste anything R computes. Always have R recompute it every time the document is processed, either in a code chunk or with \verb@Sexpr@. \section{Tables} The same goes for tables. The \verb@xtable@ command is used to make tables. Here is a ``table'' of sorts in some R printout. <>= out2 <- lm(y ~ x + I(x^2)) anova(out1, out2, out3) @ We want to turn that into a \LaTeX\ table. First we have to figure out what the output of the R function \texttt{anova} is and capture it so we can use it. <>= foo <- anova(out1, out2, out3) class(foo) @ So now we are ready to turn the data frame \verb@foo@ into Table~\ref{tab:one} using the R functions \texttt{xtable} and \texttt{print.xtable} (that is, the \verb@xtable@ method of the generic function \verb@print@), which are found in the R package \texttt{xtable}, which is a CRAN package. The code chunk that makes Table~\ref{tab:one} has optional arguments \verb@echo=FALSE@ and \verb@results=tex@ so we do not see it and the result is not put back in the code chunk but it just treated as ordinary \LaTeX. <>= library(xtable) print(xtable(foo, caption = "ANOVA Table", label = "tab:one", digits = c(0, 0, 2, 0, 2, 3, 3)), table.placement = "tbp", caption.placement = "top") @ The reason why we had to use both \texttt{xtable} and \texttt{print.xtable} is that we needed to use arguments for both. We used the \texttt{caption}, \texttt{label}, and \texttt{digits} arguments of the former and the \texttt{table.placement} and \texttt{caption.placement} arguments of the latter. \section{Reusing Code Chunks} Code chunks can quote other code chunks. Doing this is an example of following the DRY/SPOT rule (Wikipedia articles \href{https://en.wikipedia.org/wiki/Don\%27t_repeat_yourself} {Don't Repeat Yourself} and \href{https://en.wikipedia.org/wiki/Single_source_of_truth} {Single Point of Truth}. It has already been illustrated above in the section about \href{#figure-with-code-to-make-it-shown} {plotting figures and showing the code in two different code chunks}, but it can also be used with any code chunks Make some data <>= x <- rnorm(100) @ and then do something with it <>= summary(x) @ and then make some other data <>= x <- rnorm(50) @ and then do the same thing again following the DRY/SPOT rule <>= <> @ \section{Caching Code Chunks} When code chunks take a long time to run, the option \code{cache=TRUE} can be added to them. Then they are only run once, the results are saved (in a directory, also called folder, on your computer) and every other time the code chunk is processed the cached results are used (so no computer time is taken redoing the long calculation). If the code in the cached code chunk is changed, then it will be rerun. But caching is dangerous because it does not know when it should rerun the code when something else has changed. So some warnings. Never cache a code chunk that has important global side effects. In particular, all calls to R function \code{library} should not be in cached code chunks. They need to be executed every time \code{knitr} runs. So they should go in a different code chunk from any code chunks you want to cache. You can use the \code{dependson} argument to tell a cached code chunk what other code chunks it depends on. Then the code chunk is rerun whenever what is computed in those ``depends on'' code chunks change. The value of the \code{dependson} chunk option, like all knitr chunk options, is an R object, thus it has to be valid R syntax. The chunk names are character strings. If there are more than one of them, they must be collected into a vector using R function \code{c}. For example, in \code{cache=TRUE} the value \code{TRUE} is the R logical constant \code{TRUE}. Thus it is not quoted. In \code{dependson="try1"} the value \code{"try1"} is the name of a code chunk, so it is a character string. In \code{dependson=c("try1", "try2", "try3")} the value \code{c("try1", "try2", "try3")} is a vector of character strings, the vector being made using R function \code{c}. For a complete example using \code{cache} and \code{dependson} see the notes on \href{https://www.stat.umn.edu/geyer/3701/notes/mcmc-bayes.html} {Markov chain Monte Carlo}, which are in Rmarkdown rather than knitr but Rmarkdown is built on top of knitr and the chunk options are the same for both. The code chunks and their options are seen in the \href{https://www.stat.umn.edu/geyer/3701/notes/mcmc-bayes.Rmd} {Rmarkdown source for that}. \section{Summary} \verb@knitr@ is terrific, so important that we cannot get along without it or its older competitor \texttt{Sweave} or its newer competitor \texttt{Rmarkdown}. Its virtues are \begin{itemize} \item The numbers and graphics you report are actually what they are claimed to be. \item Your analysis is reproducible. Even years later, when you've completely forgotten what you did, the whole write-up, every single number or pixel in a plot is reproducible. \item Your analysis actually works---at least in this particular instance. The code you show actually executes without error. \item Toward the end of your work, with the write-up almost done you discover an error. Months of rework to do? No! Just fix the error and rerun \verb@knitr@ and \verb@pdflatex@. One single problem like this and you will have all the time invested in \verb@knitr@ repaid. \item This methodology provides discipline. There's nothing that will make you clean up your code like the prospect of actually revealing it to the world. \end{itemize} Whether we're talking about homework, a consulting report, a textbook, or a research paper. If they involve computing and statistics, this is the way to do it. \end{document}