---
title: "Stat 8054 Lecture Notes: R as a Functional Programming Language"
author: "Charles J. Geyer"
date: "`r format(Sys.time(), '%B %d, %Y')`"
output:
  html_document:
    number_sections: true
    md_extensions: -tex_math_single_backslash
    css: "../bar.css"
  pdf_document:
    number_sections: true
    md_extensions: -tex_math_single_backslash
---

# License

This work is licensed under a Creative Commons
Attribution-ShareAlike 4.0 International License
(http://creativecommons.org/licenses/by-sa/4.0/).

# R

 * The version of R used to make this document is `r getRversion()`.

 * The version of the `rmarkdown` package used to make this document is
   `r packageVersion("rmarkdown")`.

 * The version of the `magrittr` package used to make this document is
   `r packageVersion("magrittr")`.

 * The version of the `CatDataAnalysis` package used to make this document is
   `r packageVersion("CatDataAnalysis")`.

 * The version of the `glmbb` package used to make this document is
   `r packageVersion("glmbb")`.

# Reading

 * A blog post at R Bloggers: [Functional programming in R](https://www.r-bloggers.com/functional-programming-in-r/).

 * Three chapters in *Advanced R:*

    - [Functional programming](http://adv-r.had.co.nz/Functional-programming.html)

    - [Functionals](http://adv-r.had.co.nz/Functionals.html)

    - [Function Operators](http://adv-r.had.co.nz/Function-operators.html)

 * A question in the R FAQ: [What are the differences between R and S?](https://cran.r-project.org/doc/FAQ/R-FAQ.html#What-are-the-differences-between-R-and-S_003f).

 * Some sections of my 3701 handout on the basics of R: 

    - [4 Functions](http://www.stat.umn.edu/geyer/3701/notes/basic.html#functions)

    - [6 More on Functions](http://www.stat.umn.edu/geyer/3701/notes/basic.html#more-on-functions)

    - [7 Still More on Functions](http://www.stat.umn.edu/geyer/3701/notes/basic.html#still-more-on-functions)

    - [7.5 A Long Example (Maximum Likelihood Estimation)](http://www.stat.umn.edu/geyer/3701/notes/basic.html#a-long-example-maximum-likelihood-estimation),
      especially subsections
      [7.5.4](http://www.stat.umn.edu/geyer/3701/notes/basic.html#having-your-cake-and-eating-it-too-closures)
      and [7.5.5](http://www.stat.umn.edu/geyer/3701/notes/basic.html#the-same-except-even-more-mysterious)
      and [7.5.6](http://www.stat.umn.edu/geyer/3701/notes/basic.html#where-is-x)

 * Some sections of the book [*The R Language Definition*](https://cloud.r-project.org/doc/manuals/r-release/R-lang.html), which is one of the R manuals that
   can be [found at CRAN](https://cloud.r-project.org/manuals.html) and also
   in your own R installation (R function `help.start()` provides access to
   local versions of them):

    - [Section 2 Objects](https://cloud.r-project.org/doc/manuals/r-release/R-lang.html#Objects),
      which is very long, covering all the kinds of objects that R has, and is
      just background, not assigned reading

    - [Section 4.3.3 Argument evaluation](https://cloud.r-project.org/doc/manuals/r-release/R-lang.html#Argument-evaluation)

# Some Keywords

## Functional Programming

Functional programming has been defined many different ways
([Wikipedia page](https://en.wikipedia.org/wiki/Functional_programming)),
but at the very least definitions require that functions
are first-class objects.
You can do with a function whatever you can do with any other object.
R has this.

## Closures

> This \[closures\] is the best idea in the history
> of programming languages.
>
> — Douglas Crockford

Crockford was talking about Javascript, but R has the same idea.  Both
inherited it from Scheme (a flavor of LISP).  It is
the function named `function`, which creates functions.  Technically,
they are called "closures"
```{r "closure"}
typeof(function(x) x)
```
because they capture variables in the environment where they are defined
that are used in the function definition.  If that environment is the
R global environment, then this works as users (some of them anyway) expect
```{r "closure-demo", error=TRUE}
x <- 2.5
fred <- function(y) x + y
environment(fred)
fred(pi)
x <- 0
fred(pi)
rm(x)
fred(pi)
```
If that environment is another environment, then this works as users
(most of them anyway) do not expect
```{r "closure-demo-factory", error=TRUE}
fred_factory <- function(x) function(y) x + y
fred <- fred_factory(2.5)
environment(fred)
fred(pi)
x <- 0
fred(pi)
ls(envir = environment(fred))
environment(fred)$x
environment(fred)$x <- 0
fred(pi)
rm(x, envir = environment(fred))
ls(envir = environment(fred))
fred(pi)
rm(x)
fred(pi)
```

Nevertheless, closures (or functions as most R users call them) are super
important.  They enable many good programming practices.

## Pure Functional Programming

A functional programming language is "pure" if all functions act like math
functions:
for the same inputs (arguments) they always produce the same output (value).
They cannot do assignment (except to local variables inside the function
that are not visible to callers of the function), input, or output.
They cannot use random numbers.

Some would say that a functional programming language
is "pure" only if the whole
language behaves that way.  Pure functions are the only feature of the
language.

R is not a pure functional programming language.  In R functions do
everything.  Even assignment is really a function
```{r "assignment-demo"}
x <- 2
x
assign("x", 3)
x

`<-`
`<-`(x, 4)
x
```
The last bit you do not ever want to use in real life.  No reader of your
code would understand it.  But it does show that everything that happens
in R happens via a function call.

So the assign function named "<-" is not pure.  Its whole reason for
existence is to have a side effect: assignment.  Similarly all input, output,
and graphics functions are not pure.  Similarly, all random number functions
are not pure.  Similarly, R functions `date` and `Sys.getenv` and `file.create`
and `list.files` and many other functions that allow R to act as a scripting
language are not pure.

Nevertheless, most R functions used for computation are pure.
The only thing most R functions do that is observable from the outside
is return a value.  They do not refer to global variables.  If they
are using variables defined in their environment, then they are perhaps
not strictly pure (strictly speaking), but so long as users do not modify
that environment — and if it is not the R global environment, then most
users will not even know that environment exists — such a function is
practically pure.

## Anonymous Functions

A function does not need a name.  Like every function, the R function named
"function" returns a value.  In this case the value happens to be a function.
Other functions can also return functions as values.  Usually, these functions
are created by a call to the R function named "function" inside the function
returning a function as a value (as we have already seen in some of the
examples above).

Like every other R object, the value returned by the R function
named "function" does not get associated with a name (symbol) except
by assignment.
```{r "anonymous-demo"}
(function() 2)()
```
We don't want to do trickery like the above in real life.  No readers of
our code would understand.  But anonymous functions are quite useful as
arguments to other functions, as examples below will show.

## Immutability

Until recently, you never heard programmers talking about immutability.
Most programming languages don't have it.  Some languages like Javascript
have recently gotten it as a feature but not as the default.  In a few
languages like Haskell and Clojure, it is the default.

In R immutability has always been the default.  Most R objects cannot
be changed once created.  Only environments and reference classes, which
most R users do not understand and do not explicitly use, are mutable.
All assignments mutate an environment.  R function `rm` also mutates
environments.  So users mutate environments all the time,
but don't think about environments while doing so.

The less said
about S4 classes in general and reference classes in particular, the better
([they are in the process of being replaced, anyway](https://www.stat.umn.edu/geyer/8054/index.html#s7)).

So [R environments](#environments) are mutable, but most R objects are not.
```{r "mutability-demo"}
x <- y <- 1:5
x[3] <- NA
x
y
```
After the first assignment, "x" and "y" are *names* for the *same object*.

The second assignment does not mutate an object, it creates a *new* object
and assigns it to the name "x".

In Java or C++ the analogous code would mutate the object named both "x"
and "y".  Both "x" and "y" would name the same object before and after
it was mutated.  Their objects are mutable.
R objects (except environments and reference classes) are immutable.

This is something most programmers who have been introduced to programming
via Java or C++ think is bad about R.  They think it is inefficient.

But it is actually something that is very good about R.
It is what makes R a language that naive users can use.
Even expert Java and C++ programmers make lots of mistakes that
naive R programmers cannot make because of the nature of R.

## Recursion

In R, like in most programming languages (even Java and C++), functions can
call themselves.  This is called recursion.  It is a valuable programming
technique that can simplify many problems.  It is not widely used in R,
but R does have it.  R functions can call themselves.

This is not even a special feature of R.  Function bodies can have any
valid R code that can appear anywhere in R.  Function calls are part of
such.  Hence functions can call other functions.  If the function being
called from inside another function happens to be the same function,
then that is what other computer languages fuss about and call "recursion".
In R, recursive function calls are no different from any other function calls.

## Lazy Evaluation (Promises)

Like some other functional programming languages (Haskell), R has lazy
evaluation.  Function arguments are not evaluated until needed.
If not needed, they are not evaluated at all.
[Section 4.3.3 of the *R Language Definition*](https://cloud.r-project.org/doc/manuals/r-release/R-lang.html#Argument-evaluation) explains.

Mostly R users and programmers do not need to pay attention to lazy evaluation.
But occasionally they do.
[Section 4.3.3 of the *R Language Definition*](https://cloud.r-project.org/doc/manuals/r-release/R-lang.html#Argument-evaluation) gives an example
where it is necessary to force evaluation to get what one wants.

There is an R function named "force" that has no effect except forcing
evaluation that is useful for making clear the programmers intent.
I would change the example in the section cited in the preceding paragraph
so that the statement
```
label
```
is changed to
```
force(label)
```
Both have the same effect, but the latter makes the intention clearer.

Understanding lazy evaluation is the key to understanding how default
arguments to functions work in tricky situations.  That is illustrated
by the example cited in the preceding paragraph.  It is also the point
of the discussion of default arguments of R function `svd` which are
discussed in [Section 7.2 of my 3701 handout on the basics of R](http://www.stat.umn.edu/geyer/3701/notes/basic.html#default-values-for-arguments).

## Higher-Order Functions

Any function that takes a function as an argument or returns a function
as a value is called a *higher-order function*.  This terminology is not
widely used in R.  Most R functions are not higher-order functions.
Most higher-order functions in R are not called higher-order functions by users.

### R Functions that work on mathematical functions.

 * R functions `D` and `deriv` differentiate R expressions and return
   R expressions.  Since R expressions are not R functions, these are not
   higher-order functions, strictly speaking, except that `deriv` can
   be made to return a function rather than an expression by use of an
   optional argument.

 * R functions `grad` and `hessian` and `jacobian` in R package `numDeriv`
   differentiate R functions (approximately) and return the values of the
   derivatives evaluated at specified points.  These are vector or matrix
   valued derivatives of vector-to-scalar or vector-to-vector functions.

 * R function `integrate` does numerical integration of a mathematical
   function provided to it as an R function.

 * R functions `nlm` and `optim` and `optimize` and many other R functions that
   do optimization, optimize a mathematical function provided to them as
   R functions.

 * R function `uniroot` finds a root of a univariate mathematical function
   provided to it as an R function.

 * R function `boot` in R recommended package `boot` (installed by default
   in every installation of R) approximately simulates 
   the sampling distribution of any
   estimator provided to it as an R function
   applied to any data using Efron's nonparametric bootstrap.

 * R function `metrop` in [CRAN package mcmc](https://cloud.r-project.org/package=mcmc) simulates the distribution of any continuous random vector whose
log unnormalized probability density function is provided to it
as an R function.

 * R function `glm` that fits generalized linear models takes an argument
   named `family` that is a function that produces a family object,
   but this argument can also be a family object or the name of a family
   function, so this is not exactly a higher-order function but more
   R-ish.  That is, `glm(y ~ x, family = binomial)` works and here argument
   `binomial` is a function, so this is a higher-order function call.
   But `glm(y ~ x, family = "binomial")` and `glm(y ~ x, family = binomial())`
   also work, and those are not higher-order function calls because argument
   `family` is not, strictly speaking, a function.

So this is a very widely used R design pattern.  To tell an R function about
a mathematical function, provide it an R function that evaluates
that mathematical function (and that function should be pure).

### R Functions of the Apply Family

R function `apply` and friends (`eapply`, `lapply`, `mapply`, `rapply`,
`sapply`, `tapply`, and `vapply` in R package `base` and `mclapply`
and `clusterApply`
in R package `parallel`) apply an arbitrary function (provided as an
R function) to each element of some compound object (vector, list,
data frame, table, etc.)
or on each node of a cluster in parallel programming.

These also are not usually called higher-order functions by R users
(although, strictly speaking, they are higher-order functions).

### R Functions of the Higher-Order Family

In languages much newer than R a bunch of terminology for higher-order
functions has grown up, and R has copied it.  See `help("funprog")` for
the list.  The most widely used are `Map`, `Reduce`, and `Filter`.
Some of these are built on functions of the apply family.

### Some Odds and Ends

Some other R functions take functions as arguments
like `sweep`, `aggregate`, `outer`, and `kronecker`.

### Functionals

This is Hadley Wickham's name for functions that take a function argument
and perhaps some other arguments and return a vector.  It includes
many of the higher-order functions discussed above.

### Function Operators

This is Hadley Wickham's name for functions that take a function argument
and return a function value.  That AFAICS doesn't include any widely used
R higher-order functions.

Oops!  R function `Vectorize` is an example of a function operator.

### Function Factories

This is Hadley Wickham's name for functions whose only job is to create
other functions, like R function `fred_factory` in an example above.

In [Section 7.5.4 of my 3701 handout on the basics of R](http://www.stat.umn.edu/geyer/3701/notes/basic.html#having-your-cake-and-eating-it-too-closures) the
R function `make.logl` defined in its example is a function factory
(which makes log likelihood functions for gamma shape families),
although I didn't call it that when I wrote those notes.

# Functional Programming versus Imperative Programming

R has the stuff of imperative programming languages, such as loops and
assignment.  So it isn't *just* a functional programming language.

The way users trained in Java or C++ and unfamiliar with the way of R
program R, it looks just like C or Java or C++.  But expert R programmers
familiar with the [way](https://en.wikipedia.org/wiki/Tao) of R
use functions instead of loops.  They even use much less assignment.

I used to think I used far fewer loops than most R programmers, and I was
right.  But since writing the 3701 notes cited above, I use even fewer loops
and even more functions.  I am trying to exemplify the way of R.

# More on Functions, Closures, Promises, Names, Objects, and Environments

## Objects

> To understand computations in R, two slogans are helpful:
>
> * Everything that exists is an object.
> * Everything that happens is a function call.
>
> — John Chambers, quoted in Section 6.3 of *Advanced R* by Hadley Wickham

Unlike in many so-called object-oriented programming languages where not
*everything* is an object, in R *everything* is an object, even expressions
of the language itself.

[Section 2 Objects of *The R language Definition*](https://cloud.r-project.org/doc/manuals/r-release/R-lang.html#Objects) covers all the kinds of R objects,
but that is more than we want to know about here.

## Names

In R, objects can have *names*, also called *symbols*,
```{r "names"}
an <- as.name("arrg")
is.name(an)
mode(an)
typeof(an)
storage.mode(an)
class(an)
```

These correspond to what many users call "variables".  Where many users
say the R statement
```{r "x"}
x <- 2
```
creates a variable `x` which is assigned the value 2, the pedantically correct
thing to say is that the R *name* or *symbol* `x` has been associated with the
R object which is the numeric vector having one component 2.

A quotation from the R help for R function `make.names`

> A syntactically valid name consists of letters, numbers and the
> dot or underline characters and starts with a letter or the dot
> not followed by a number.  Names such as ‘".2way"’ are not valid,
> and neither are the reserved words.
>
> The definition of a _letter_ depends on the current locale, but
> only ASCII digits are considered to be digits.

To understand this we have to know what "reserved words" are, and the
R help page titled "Reserved" shows them (do `?Reserved` or `help("Reserved")`
to see that page).

For example, `function` is a reserved word, so it cannot be a valid R symbol.
So when we say "the R function named function" that is pedantically incorrect
(more on this below).

Anything can be a symbol if it is put in backquotes.  So
```{r "function-function"}
get("function")
`function`
```
But just saying `function` without the quotes is an incomplete R statement
(R requires a function definition to follow the reserved word `function`).

Similarly for other bits of R syntax
```{r "function-too"}
`+`
`[<-`
```

Sometimes names can be provided as character strings as arguments to other
functions (like we saw in `get("function")` above) but this is just what
certain particular functions require.  That function undertakes to turn
character strings into symbols or their corresponding objects.
There is no automatic conversion of character strings to symbols.
```{r "outer"}
outer(1:3, 4:5)
outer(1:3, 4:5, "+")
```

## The Other Kind of Names

Names, also called symbols, are not to be confused with names, *not* also
called symbols.  Any R object can have an attribute named "names" which
is helpful in indexing
```{r "names-other"}
x <- seq(along = letters)
names(x) <- letters
x["c"]
y <- as.list(x)
y[["c"]]
y$c
```
There is a weak connection between names of the first kind and names of
the second kind in that in order for `y$c` to work it is necessary that
`c` be a valid symbol name (`y$function` is invalid, even if `"function"`
is the name of some component of `y`).

## Environments

Environment is a type of R object
```{r "environ-show"}
typeof(globalenv())
```

A quotation from [Section 2.1.10 Environments of *The R Language Definition*](https://cloud.r-project.org/doc/manuals/r-release/R-lang.html#Environments)

> Environments can be thought of as consisting of two things. A *frame*,
> consisting of a set of symbol-value pairs, and an *enclosure*,
> a pointer to an enclosing environment. When R looks up the value for a symbol
> the frame is examined and if a matching symbol is found its value will
> be returned. If not, the enclosing environment is then accessed and the
> process repeated. Environments form a tree structure in which the enclosures
> play the role of parents. The tree of environments is rooted in an empty
> environment, available through `emptyenv()`, which has no parent.

So environments are where objects can be looked up by name, and the lookup
is not just in the specified environment, but in the parent, parent of parent,
parent of parent of parent, and so forth, of the specified environment.

R environments are very different from all other kinds of R objects.
When an environment is changed (by assignment or by R function `rm`)
the object is modified.  Unlike all other kinds of R objects, environments
are *mutable* (being mutable is their whole [*raison d'être*](https://www.merriam-webster.com/dictionary/raison%20d%27%C3%AAtre)).
Compare the example below with the one in
[the section on immutability](#immutability) above.
```{r "environment-mutability"}
foo <- bar <- new.env(parent = emptyenv())
foo$x <- 1:5
bar$x
```

A very special environment is the R global environment, which is returned
by R function `globalenv` or by the R global variable (name, symbol)
`.GlobalEnv`.  This is where assignments done at top level
(on the R command line) are done
```{r "global-example"}
x <- "This is the assignment we just made"
x
globalenv()$x
.GlobalEnv$x
```

The parent, parent of parent, and so forth, of the R global environment
depends on what packages have been loaded
```{r "libraries-example"}
search()
library("MASS")
search()
```

## Function Execution Environment

When a function is called (one also says *invoked*),
local variables go in the function *evaluation environment*,
whose parent is the function environment.
By local variables we mean those that are defined in the function body
and created by execution of statements in the function body.

## Promises

A promise is another kind of R object, but not one that can be accessed
by users.  They can be *used* by users and in fact are used whenever one
invokes a function that has one or more arguments.

A quotation from [Section 4.3.3 Argument Evaluation of *The R Language Definition*](https://cloud.r-project.org/doc/manuals/r-release/R-lang.html#Argument-evaluation)

> One of the most important things to know about the evaluation of arguments
> to a function is that supplied arguments and default arguments are treated
> differently. The supplied arguments to a function are evaluated in the
> evaluation frame of the calling function. The default arguments to a
> function are evaluated in the evaluation frame of the function.

It is this latter fact that makes some tricky definitions of default arguments
work, like those of R function `svd` as
discussed in [Section 7.2 of my 3701 handout on the basics of R](http://www.stat.umn.edu/geyer/3701/notes/basic.html#default-values-for-arguments)
which was already referred to above
in [the previous section on promises](#lazy-evaluation-promises).

## Pipes

To repeat part of the quotation from John Chambers above,
"everything that happens is a function call", but that depends on what
"everything" means.  There is also parsing.  The R parser (the part of
R that figures out what to do from the code you write) does some of the
work.  It is true that
```{r parse.plus}
2 + 2
`+`(2, 2)
```
are equivalent R statements.  They are equivalent because the R parser
turns the first into the second.  The code `2 + 2` means call R function
`` `+` `` with arguments 2 and 2.

Similarly, `+ 2` (that is unary plus) means call R function `` `+` `` with
one argument.
```{r parse.plus.unary}
+ 2
`+`(2)
```

But R function `` `+` `` is not R function `sum`.
```{r plus.ternary.question, error=TRUE}
`+`(1, 2, 3)
sum(1, 2, 3)
```
The only purpose of R function `` `+` `` is to be the function that backs
up the plus operation in code.

Similar syntax transformations are involved in turning all R code into
R function calls.

All of this is preliminary to discussing a new bit of R syntax transformation
that is introduced in R 4.1.0.  This is the pipe operation, which is denoted
`|>`.  This is part of base R that replaces the pipe operation from R
package `magrittr`, which appeared on CRAN in 2014 and since then has become
very popular.  As I write this, https://cran.r-project.org/package=magrittr
lists 2525 R packages that need R package `magrittr`

 * 61 reverse depends (52 CRAN, 9 Bioconductor)
 * 2147 reverse imports (1941 CRAN, 206, Bioconductor)
 * 317 reverse suggests (281 CRAN, 36 Bioconductor)

The new base R pipe operation replaces the main functionality of R package
`magrittr` (but not all of the functionality).  But it does it in a completely
different way.  The `magrittr` pipe operator is an R function `%>%` that
uses nonstandard evaluation to do what it does (pipe the output of the
function call on the left-hand side into the function call on
the right-hand side).  The new R pipe operation is not implemented as an R
function but rather as a syntax transformation, as the help page for this
new operation (`help("|>")`) shows in some of its examples
```{r pipe.syntax.transformation}
# the pipe operator is implemented as a syntax transformation:
quote(mtcars |> subset(cyl == 4) |> nrow())
quote(x |> (function(x) x + 1)())
```

Or more simply
```{r pipe.syntax.transformation.too}
quote(x |> f(y))
```
The R code `x |> f(y)` is literally turned into the code `f(x, y)` before
anything else happens.  The right-hand side of of a pipe expression must
be a function call.  It is an error for it to be a function *definition*
or a function *name*.
```{r pipe.syntax.transformation.error, error=TRUE}
mtcars |> class
```
The way to do this is to turn the right-hand side into a function *call*
rather than a function *name*.
```{r pipe.syntax.transformation.no.error}
mtcars |> class()
```

Note that this applies to anonymous functions too.  You cannot put an
anonymous function in a pipeline, only the *evaluation* of an anonymous
function
```{r anonymous.error,error=TRUE}
"Foo!" |> function(x) x
```
```{r anonymous.no.error}
"Foo!" |> (function(x) x)()
```

New in R-4.2.0 is the "placeholder" option in the right-hand-side (RHS) of
a pipe expression.  The placeholder character is the underscore character,
and this can be used exactly once in the RHS as the value of a named argument
(the argument must be named) to indicate
that the input is piped into this argument
rather than the first argument.  This is a `magritter` feature that the core
R pipe operator did not have before R-4.2.0 and was one of the most frequent
reasons for needing an anonymous function,  Use of the placeholder
is illustrated in the following example.

So why is this popular?  R function `%>%` in R package `magrittr` is already
very popular.  R operation `|>` in base R is much cleaner, much easier to
debug, and much more efficient.  So it should eventually replace most usages
of `%>%`.  It allows complicated nested function calls to be read left to right
rather than inside out.  Here is the first example from the `magrittr`
package vignette.  Here it is in `magrittr`
```{r pipe.syntax.transformation.example.magrittr, error=TRUE}
library(magrittr)
car_data <-
    mtcars %>%
    subset(hp > 100) %>%
    aggregate(. ~ cyl, data = ., FUN = . %>% mean %>% round(2)) %>%
    transform(kpl = mpg %>% multiply_by(0.4251)) %>%
    print
```
And here it is rewritten to use `|>` instead of `%>%`.
```{r pipe.syntax.transformation.example, error=TRUE}
car_data <-
    mtcars |>
    subset(hp > 100) |>
    aggregate(. ~ cyl, data = _,
        FUN = function(x) x |> mean() |> round(2)) |>
    transform(kpl = mpg |> (function(x) x * 0.4251)()) |>
    print()
```
and this is (supposedly) easier to read (see below) than without pipes
```{r pipe.syntax.transformation.example.nopipes, error=TRUE}
print(car_data <-
    transform(aggregate(. ~ cyl, 
            data = subset(mtcars, hp > 100), 
            FUN = function(x) round(mean(x), 2)), 
        kpl = mpg * 0.4251))
```

Our rewritten example is a bit clumsy because the `magrittr` example
uses pipes pointlessly in some cases just to be using them.  If we get
rid of some of the pipes, we can simplify our example.
```{r pipe.syntax.transformation.example.too, error=TRUE}
car_data <-
    mtcars |>
    subset(hp > 100) |>
    aggregate(. ~ cyl, data = _,
        FUN = function(x) mean(x) |> round(2)) |>
    transform(kpl = mpg * 0.4251) |>
    print()
```

But we still have an anonymous function definition that `magrittr` does
not need (it is hidden inside R function `%>%`).  Along with the pipe
operation R 4.1.0 introduced a new syntax for anonymous function definitions
that replaces `function (` with `\(`.  Using this, our example can be
shortened to
```{r pipe.syntax.transformation.example.too.too, error=TRUE}
car_data <-
    mtcars |>
    subset(hp > 100) |>
    aggregate(. ~ cyl, data = _,
        FUN = \(x) x |> mean() |> round(2)) |>
    transform(kpl = mpg * 0.4251) |>
    print()
```

Whether you like this stuff or not is a matter of taste.
Pipes are function composition
```{r composition}
quote(x |> f() |> g() |> h())
```
And function composition is the most important method of structuring programs,
far more important than object orientation and other such fads.  In fact,
[favor composition over
inheritance](https://en.wikipedia.org/wiki/Composition_over_inheritance) has
become a slogan of object-oriented design.  It says: don't use the
object-orientated feature (*inheritance*), instead use composition (which is
not particularly object-oriented), because *inheritance*, despite being a key
feature of object orientation, has many undesirable properties.
(The OOP slogan, refers to *object* composition rather than *function*
composition, so perhaps is not exactly on point.  But in OOP languages,
one may need object composition to do function composition.)

The oldest AFAIK method of structuring programs (at least higher-level
language programs as opposed to assembly language programs, which no one
writes today) is *assignment*.  Consequently, it is the most familiar method,
especially to R users.  We can structure this example as
```{r pipe.syntax.transformation.example.with.assignment, error=TRUE}
foo <- subset(mtcars, hp > 100)
bar <- aggregate(. ~ cyl, foo, FUN = function(x) round(mean(x), 2))
car_data <- transform(bar, kpl = mpg * 0.4251)
car_data
```
And we note

 * the silliness of having one column rounded differently than the rest and

 * the silliness of storing rounded data, where it may lead to
   inaccuracy if ever used.

Should be
```{r pipe.syntax.transformation.example.with.assignment.fixed}
foo <- subset(mtcars, hp > 100)
bar <- aggregate(. ~ cyl, foo, mean)
car_data <- transform(bar, kpl = mpg * 0.4251)
round(car_data, 2)
```

So now that we have used assignment to clean up this `magrittr` example
we can turn it back into pipes and it stays cleaner.
```{r pipe.syntax.transformation.example.cleanest}
car_data <- mtcars |> subset(hp > 100) |> aggregate(. ~ cyl, data = _, mean) |>
    transform(kpl = mpg * 0.4251)
car_data |> round(2)
```

Like it or not, use it or not, you will be seeing this in other people's code.

A pure functional language does not have assignment.  Its only method
of program structuring is definition of functions and composition of functions.
Pipelines are just composition in disguise.  So pipelining is more "functional"
than anything else you can do in R (except actual composition: function calls
as arguments to function calls).

As we can see from the example (not my example, `magrittr`'s example)
pipelines are not really easier to read than code with assignments,
especially for naive users.  They are more functional.  They do avoid
the clutter of a lot of unmotivated temporary variables.
But whether you use them or not is up to you.  It's just a matter of taste.

I myself have not gotten on the pipeline bandwagon (in R, I have been using
UNIX pipes for 35 years).  This is partly because I do not like to require
very recent versions of R for my CRAN packages (some users are slow to
update).  But also I do not like the ugly syntax required to use anonymous
functions in pipes (but a lot of this ugliness is no longer required because
of the new placeholder syntax).
I expect that I will eventually start using pipes
and they will be base R pipes rather than `magrittr` pipes (and this
despite the [super cool name of that
package](https://en.wikipedia.org/wiki/The_Treachery_of_Images)).

It might be an interesting exercise to point out where in my notes
pipelines might make code a lot neater.

One more example, this time from my categorical data analysis notes but
redone to use pipes.
```{r options.width,echo=FALSE}
options(width = 132)
```
```{r alligator}
library(CatDataAnalysis)
library(glmbb)
data(table_8.1)
gout <- table_8.1 |>
    transform(lake = factor(lake,
        labels = c("Hancock", "Oklawaha", "Trafford", "George"))) |>
    transform(gender = factor(gender, labels = c("Male", "Female"))) |>
    transform(size = factor(size, labels = c("<=2.3", ">2.3"))) |>
    transform(food = factor(food,
        labels = c("Fish", "Invertebrate", "Reptile", "Bird", "Other"))) |>
    glmbb(big = count ~ lake * gender * size * food,
        little = ~ lake * gender * size + food,
        family = poisson, data = _)
summary(gout)
```

## Named Arguments, Default Values for Arguments, Missing Arguments, Dot-Dot-Dot Arguments

These are covered in [Section 7 of the 3701 lecture notes](http://www.stat.umn.edu/geyer/3701/notes/basic.html#still-more-on-functions) cited above.