This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License (http://creativecommons.org/licenses/by-sa/4.0/).
The version of R used to make this document is 4.1.0. The version of the
rmarkdown package used to make this document is 2.10. The version of the
knitr package used to make this document is 1.34.
This is a demo for using the R package
rmarkdown. To get started make a plain text file (like this one) with suffix
.Rmd, and then turn it into a PDF file using the R commands
library("rmarkdown") render("baz.Rmd", output_format="pdf_document")
If instead you wish to make an HTML document, change
"html_document". If instead you wish to have some other output format, how to do that is explained in the Rmarkdown documentation.
Now include R in our document. Here’s a simple example
2 + 2
##  4
This is a “code chunk” processed by
rmarkdown hits such a thing, it processes it, runs R to get the results, and stuffs the results (by default) in the file it is creating. The text between code chunks is markdown, a “lightweight markup language” that has become widely used in several variants (it is used by both
github, for example). The web site for the R variant is http://rmarkdown.rstudio.com/.
Plots get a little more complicated. First we make something to plot (simulate regression data).
n <- 50 x <- seq(1, n) a.true <- 3 b.true <- 1.5 y.true <- a.true + b.true * x s.true <- 17.3 y <- y.true + s.true * rnorm(n) out1 <- lm(y ~ x) summary(out1)
## ## Call: ## lm(formula = y ~ x) ## ## Residuals: ## Min 1Q Median 3Q Max ## -35.927 -11.381 1.089 12.212 27.260 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3.0232 4.5289 0.668 0.508 ## x 1.5256 0.1546 9.870 3.88e-13 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 15.77 on 48 degrees of freedom ## Multiple R-squared: 0.6699, Adjusted R-squared: 0.663 ## F-statistic: 97.42 on 1 and 48 DF, p-value: 3.881e-13
plot(x, y) abline(out1)
Sometimes we want to show the code, discuss it, and then show the figure. Or for some other reason we don’t want the code immediately followed by the figure. This shows how to do that.
The following figure is produced by the following code
plot(x, y) abline(out1)
(This code doesn’t actually do anything because we used the optional argument
eval=FALSE on this code chunk.) We could omit this showing of the code if we want.
For this example we do a cubic regression on the same data.
out3 <- lm(y ~ x + I(x^2) + I(x^3)) summary(out3)
Then we plot this figure with a hidden code chunk (so the R commands to make it do not appear in the document).
## ## Call: ## lm(formula = y ~ x + I(x^2) + I(x^3)) ## ## Residuals: ## Min 1Q Median 3Q Max ## -34.173 -10.923 1.242 12.740 29.307 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 10.3598598 9.7631435 1.061 0.294 ## x 0.1131178 1.6415075 0.069 0.945 ## I(x^2) 0.0614168 0.0743937 0.826 0.413 ## I(x^3) -0.0007396 0.0009594 -0.771 0.445 ## ## Residual standard error: 15.98 on 46 degrees of freedom ## Multiple R-squared: 0.6752, Adjusted R-squared: 0.654 ## F-statistic: 31.88 on 3 and 46 DF, p-value: 2.664e-11
Also note that every time we rerun
rmarkdown these two figures change because the simulated data are random. Everything just works. This should tell you the main virtue of
rmarkdown It’s always correct. There is never a problem with stale cut-and-paste.
This section illustrates how
rmarkdown can be used to have running text computed by R.
We show some numbers calculated by R interspersed with text. The quadratic and cubic regression coefficients in the preceding regression were \(\beta_2 = 0.0614\) and \(\beta_3 = -0.0007\) Magic! See the source
baz.Rmd for how the magic works.
In order for your document to be truly reproducible, you must never cut-and-paste anything R computes. Always have R recompute it every time the document is processed, either in a code chunk or with the technique illustrated in this section.
The same goes for tables. Here is a “table” of sorts in some R printout.
out2 <- lm(y ~ x + I(x^2)) anova(out1, out2, out3)
## Analysis of Variance Table ## ## Model 1: y ~ x ## Model 2: y ~ x + I(x^2) ## Model 3: y ~ x + I(x^2) + I(x^3) ## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 48 11941 ## 2 47 11900 1 40.581 0.1589 0.6920 ## 3 46 11748 1 151.760 0.5942 0.4447
We want to turn that into a table in output format we are creating. First we have to figure out what the output of the R function
anova is and capture it so we can use it.
foo <- anova(out1, out2, out3) class(foo)
##  "anova" "data.frame"
So now we are ready to turn the matrix
foo and the simplest way to do that seems to be the
kable option on our R chunk
|Res.Df||RSS||Df||Sum of Sq||F||Pr(>F)|
It has already been illustrated above in the section about plotting figures and showing the code in two different code chunks, but it can also be used with any code chunks
Make some data
x <- rnorm(100)
and then do something with it
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## -2.34631 -0.59170 -0.02185 0.05025 0.72997 2.38010
and then make some other data
x <- rnorm(50)
and then do the same thing again following the DRY/SPOT rule
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## -2.1235 -0.7779 -0.2420 -0.1642 0.3816 3.0519
Rmarkdown is terrific, so important that we cannot get along without it or its older competitors
Its virtues are
The numbers and graphics you report are actually what they are claimed to be.
Your analysis is reproducible. Even years later, when you’ve completely forgotten what you did, the whole write-up, every single number or pixel in a plot is reproducible.
Your analysis actually works—at least in this particular instance. The code you show actually executes without error.
Toward the end of your work, with the write-up almost done you discover an error. Months of rework to do? No! Just fix the error and rerun Rmarkdown. One single problem like this and you will have all the time invested in Rmarkdown repaid.
This methodology provides discipline. There’s nothing that will make you clean up your code like the prospect of actually revealing it to the world.
Whether we’re talking about homework, a consulting report, a textbook, or a research paper. If they involve computing and statistics, this is the way to do it.