R Functions for Probability Distributions
Every distribution that R handles has four functions. There is a root
name, for example, the root name for the normal distribution
is norm
. This root is prefixed by one of the letters
-
p
forprobability
, the distribution function (DF) -
q
forquantile
, the inverse DF for a continuous random variable or the quantile function (see Deck 4 of the course slides and, in particular slide 6 for the definition of quantile function) for a discrete random variable -
d
fordensity
, the probability mass function (PMF) for a discrete random variable or the probability density function (PDF) for a continuous random variable -
r
forrandom
, a random variable having the specified distribution
For the normal distribution, these functions are
pnorm
,
qnorm
,
dnorm
, and
rnorm
.
For the binomial distribution,
these functions are
pbinom
,
qbinom
,
dbinom
, and
rbinom
.
And so forth.
For a continuous distribution (like the normal),
the most useful functions for doing problems involving probability
calculations are the
and p
functions
(DF and inverse DF), because the
the density (PDF) calculated by the
q
function can only be used to calculate probabilities
via integrals and R doesn't do integrals.
d
For a discrete distribution (like the binomial),
the
function calculates the density (PMF),
which in this case is a probability
d
and hence is useful in calculating probabilities.
R has functions to handle many probability distributions. The table below gives the names of the functions for each distribution and a link to the on-line documentation that is the authoritative reference for how the functions are used. But don't read the on-line documentation yet. First, try the examples in the sections following the table.
Distribution | Functions | |||
---|---|---|---|---|
Beta | pbeta
| qbeta
| dbeta
| rbeta
|
Binomial | pbinom
| qbinom
| dbinom
| rbinom
|
Cauchy | pcauchy
| qcauchy
| dcauchy
| rcauchy
|
Chi-Square | pchisq
| qchisq
| dchisq
| rchisq
|
Discrete Uniform | sample
| |||
Exponential | pexp
| qexp
| dexp
| rexp
|
F | pf
| qf
| df
| rf
|
Gamma | pgamma
| qgamma
| dgamma
| rgamma
|
Geometric | pgeom
| qgeom
| dgeom
| rgeom
|
Hypergeometric | phyper
| qhyper
| dhyper
| rhyper
|
Logistic | plogis
| qlogis
| dlogis
| rlogis
|
Log Normal | plnorm
| qlnorm
| dlnorm
| rlnorm
|
Negative Binomial | pnbinom
| qnbinom
| dnbinom
| rnbinom
|
Normal | pnorm
| qnorm
| dnorm
| rnorm
|
Poisson | ppois
| qpois
| dpois
| rpois
|
Student t | pt
| qt
| dt
| rt
|
Studentized Range | ptukey
| qtukey
| dtukey
| rtukey
|
Uniform | punif
| qunif
| dunif
| runif
|
Weibull | pweibull
| qweibull
| dweibull
| rweibull
|
Wilcoxon Rank Sum Statistic | pwilcox
| qwilcox
| dwilcox
| rwilcox
|
Wilcoxon Signed Rank Statistic | psignrank
| qsignrank
| dsignrank
| rsignrank
|
That's a lot of distributions. Fortunately, they all work the same way. If you learn one, you've learned them all.
Of course, the discrete distributions are discrete and the continuous distributions are continuous, so there's some difference just from that aspect alone, but as far as the computer is concerned, they're all the same. We'll do a continuous example first.
The Normal Distribtion
Direct Look-Up
pnorm
is
the R function that calculates the DF.
where X is normal. Optional arguments described on the on-line documentation specify the parameters of the particular normal distribution.
Both of the R commands in the box below do exactly the same thing.
They look up P(X < 27.4) when X is normal with mean 50 and standard deviation 20.
Example
Question: Suppose widgit weights produced at Acme Widgit Works have weights that are normally distributed with mean 17.46 grams and variance 375.67 grams squared. What is the probability that a randomly chosen widgit weighs more then 19 grams?
Question Rephrased: What is P(X > 19) when X has the N(17.46, 375.67) distribution?
Caution: R wants the standard deviation (SD) as the parameter, not the variance. We'll need to take a square root!
Answer:
Optional Argument lower.tail = FALSE
The pfunctions for R distributions, like
pnorm
, have an optional
argument lower.tail = FALSE
that gives better accuracy for calculating
upper tail values than subtraction, avoiding
catastrophic
cancellation when the result is small.
Compare
For the normal distribution, one can use symmetry to get the same results.
But for non-symmetric distributions, one must use
So given a number p between zero and one,
Question: Suppose IQ scores are normally distributed
with mean 100 and standard deviation 15. What is the 95th percentile
of the distribution of IQ scores?
Question Rephrased: What is F-1(0.95) when
X has the N(100, 152) distribution?
Answer:
This argument for the
Again, one can also do this using the symmetry of the normal distribution
But tricks like this are not available for non-symmetric distributions.
For them one must use
There's not much need for this function in doing calculations, because
you need to do integrals to use any PDF, and R doesn't
do integrals. In fact, there's not much use for the
For an example of the use of
We won't be using the
This generates 1000 independent and identically distributed (IID)
normal random numbers (first line),
plots their histogram (second line), and graphs the PDF of
the same normal distribution (third line).
Both of the R commands in the box below do exactly the same thing.
They look up P(X = 27) when X is has the
Bin(100, 0.25) distribution.
Question: Suppose widgits produced at Acme Widgit
Works have probability 0.005 of being defective.
Suppose widgits are shipped in cartons containing 25 widgits.
What is the probability that a randomly chosen carton
contains exactly one defective widgit?
Question Rephrased: What is P(X = 1) when
X has the Bin(25, 0.005) distribution?
Answer:
Both of the R commands in the box below do exactly the same thing.
They look up P(X ≤ 27) when X is has the
Bin(100, 0.25) distribution. (Note the less than or equal to
sign. It's important when working with a discrete distribution!)
Question: Suppose widgits produced at Acme Widgit
Works have probability 0.005 of being defective.
Suppose widgits are shipped in cartons containing 25 widgits.
What is the probability that a randomly chosen carton
contains no more than one defective widgit?
Question Rephrased: What is P(X ≤ 1) when
X has the Bin(25, 0.005) distribution?
Answer:
See above, under
The quantile is defined as the smallest value x such that
F(x) ≥ p, where F is the distribution function.
When the p-th quantile is nonunique, there is a whole interval
of values each of which is a p-th quantile. The documentation
says that
Question: What are the 10th, 20th, and so forth quantiles
of the Bin(10, 1/3) distribution?
Answer:
Note the nonuniqueness.
lower.tail = FALSE
to compute accurate answers far out in the upper tail.
Inverse Look-Up
qnorm
is
the R function that calculates the inverse DF
F-1 of the normal distribution.
The DF and the inverse DF are related by
x = F-1(p)
qnorm
looks up the p-th quantile of the normal distribution.
As with pnorm
, optional arguments specify the mean and
standard deviation of the distribution.
Example
Optional Argument
The lower.tail = FALSE
q
functions for R distributions, like qnorm
, also have an optional
argument lower.tail = FALSE
that gives better accuracy by avoiding
catastrophic
cancellation.
q
functions either works the same way or the opposite way
from the p
functions, depending on how you think about it.
When lower.tail = FALSE
Compare
p
functions the result which is
a probability is one minus what it would be otherwise.
q
functions the argument which is
a probability is one minus what it would be otherwise.
lower.tail = FALSE
.
Density
dnorm
is
the R function that calculates the PDF
f of the normal distribution.
As with pnorm
and qnorm
, optional arguments
specify the mean and standard deviation of the distribution.
function for
any continuous distribution (discrete distributions are entirely
another matter, for them the d
functions are very useful, see
the section about dbinom).
d
dnorm
, see the
following section.
Random Variates
rnorm
is
the R function that simulates random variates having a specified normal
distribution.
As with pnorm
, qnorm
, and dnorm
,
optional arguments specify the mean and standard deviation of the distribution.
functions (such as r
rnorm
)
much. So here we will only give an example without full explanation.
The Binomial Distribtion
Direct Look-Up, Points
dbinom
is
the R function that calculates the PMF of the binomial distribution.
Optional arguments described on the
on-line
documentation specify the parameters of the particular binomial
distribution.
Example
Direct Look-Up, Intervals
pbinom
is
the R function that calculates the DF of the binomial
distribution.
Optional arguments described on the
on-line
documentation specify the parameters of the particular binomial
distribution.
Example
Optional Argument
lower.tail = FALSE
p
functions for continuous distributions
and under q
functions for continuous distributions.
Inverse Look-Up
qbinom
is
the R function that calculates the quantile function
of the binomial distribution. How does it do that when the
DF is a step function and hence not invertible?
The
on-line
documentation for the binomial probability functions explains.
qbinom
(and other "q
" functions,
for that matter) returns the smallest of these values. That is one
sensible definition of the quantile function.
Example