Statistics 5101 (Geyer, Spring 2022) Examples: Probability Distributions in R

R Functions for Probability Distributions

Every distribution that R handles has four functions. There is a root name, for example, the root name for the normal distribution is norm. This root is prefixed by one of the letters

p for probability, the distribution function (DF)
q for quantile, the inverse DF for a continuous random variable or the quantile function (see Deck 4 of the course slides and, in particular slide 6 for the definition of quantile function) for a discrete random variable
d for density, the probability mass function (PMF) for a discrete random variable or the probability density function (PDF) for a continuous random variable
r for random, a random variable having the specified distribution

For the normal distribution, these functions are pnorm, qnorm, dnorm, and rnorm. For the binomial distribution, these functions are pbinom, qbinom, dbinom, and rbinom. And so forth.

For a continuous distribution (like the normal), the most useful functions for doing problems involving probability calculations are the p and q functions (DF and inverse DF), because the the density (PDF) calculated by the d function can only be used to calculate probabilities via integrals and R doesn't do integrals.

For a discrete distribution (like the binomial), the d function calculates the density (PMF), which in this case is a probability

f(x) = P(X = x)

and hence is useful in calculating probabilities.

R has functions to handle many probability distributions. The table below gives the names of the functions for each distribution and a link to the on-line documentation that is the authoritative reference for how the functions are used. But don't read the on-line documentation yet. First, try the examples in the sections following the table.

Distribution	Functions
Beta	`pbeta`	`qbeta`	`dbeta`	`rbeta`
Binomial (including Bernoulli)	`pbinom`	`qbinom`	`dbinom`	`rbinom`
Birthday	`pbirthday`	`qbirthday`
Cauchy	`pcauchy`	`qcauchy`	`dcauchy`	`rcauchy`
Chi-Square	`pchisq`	`qchisq`	`dchisq`	`rchisq`
Discrete Uniform	`sample`
Exponential	`pexp`	`qexp`	`dexp`	`rexp`
F	`pf`	`qf`	`df`	`rf`
Gamma	`pgamma`	`qgamma`	`dgamma`	`rgamma`
Geometric	`pgeom`	`qgeom`	`dgeom`	`rgeom`
Hypergeometric	`phyper`	`qhyper`	`dhyper`	`rhyper`
Logistic	`plogis`	`qlogis`	`dlogis`	`rlogis`
Log Normal	`plnorm`	`qlnorm`	`dlnorm`	`rlnorm`
Multinomial			`dmultinom`	`rmultinom`
Negative Binomial	`pnbinom`	`qnbinom`	`dnbinom`	`rnbinom`
Normal	`pnorm`	`qnorm`	`dnorm`	`rnorm`
Poisson	`ppois`	`qpois`	`dpois`	`rpois`
Kolmogorov-Smirnov Test Statistic	`psmirnov`	`qsmirnov`		`rsmirnov`
Student t	`pt`	`qt`	`dt`	`rt`
Studentized Range	`ptukey`	`qtukey`	`dtukey`	`rtukey`
Continuous Uniform	`punif`	`qunif`	`dunif`	`runif`
Weibull	`pweibull`	`qweibull`	`dweibull`	`rweibull`
Wilcoxon Rank Sum Statistic	`pwilcox`	`qwilcox`	`dwilcox`	`rwilcox`
Wilcoxon Signed Rank Statistic	`psignrank`	`qsignrank`	`dsignrank`	`rsignrank`
Wishart				`rWishart`

That's a lot of distributions. Fortunately, they all work the same way. If you learn one, you've learned them all.

Of course, the discrete distributions are discrete and the continuous distributions are continuous, so there's some difference just from that aspect alone, but as far as the computer is concerned, they're all the same.

That's everything in core R, but there are lots of CRAN for other distributions. One particularly notable one is R package of mvtnorm that does multivariate normal and multivariate t distributions.

We'll do a continuous example first.

The Normal Distribtion

Direct Look-Up

pnorm is the R function that calculates the DF.

F(x) = P(X ≤ x)

where X is normal. Optional arguments described on the on-line documentation specify the parameters of the particular normal distribution.

Both of the R commands in the box below do exactly the same thing.

They look up P(X < 27.4) when X is normal with mean 50 and standard deviation 20.

Example

Question: Suppose widgit weights produced at Acme Widgit Works have weights that are normally distributed with mean 17.46 grams and variance 375.67 grams squared. What is the probability that a randomly chosen widgit weighs more then 19 grams?

Question Rephrased: What is P(X > 19) when X has the N(17.46, 375.67) distribution?

Caution: R wants the standard deviation (SD) as the parameter, not the variance. We'll need to take a square root!

Answer:

Optional Argument `lower.tail = FALSE`

The p functions for R distributions, like pnorm, have an optional argument lower.tail = FALSE that gives better accuracy for calculating upper tail values than subtraction, avoiding catastrophic cancellation when the result is small.

Compare

For the normal distribution, one can use symmetry to get the same results.

But for non-symmetric distributions, one must use lower.tail = FALSE to compute accurate answers far out in the upper tail.

Inverse Look-Up

qnorm is the R function that calculates the inverse DF F^-1 of the normal distribution. The DF and the inverse DF are related by

p = F(x)
x = F^-1(p)

So given a number p between zero and one, qnorm looks up the p-th quantile of the normal distribution. As with pnorm, optional arguments specify the mean and standard deviation of the distribution.

Example

Question: Suppose IQ scores are normally distributed with mean 100 and standard deviation 15. What is the 95th percentile of the distribution of IQ scores?

Question Rephrased: What is F^-1(0.95) when X has the N(100, 15²) distribution?

Answer:

Optional Argument `lower.tail = FALSE`

The q functions for R distributions, like qnorm, also have an optional argument lower.tail = FALSE that gives better accuracy by avoiding catastrophic cancellation.

This argument for the q functions either works the same way or the opposite way from the p functions, depending on how you think about it. When lower.tail = FALSE

For p functions the result which is a probability is one minus what it would be otherwise.
For q functions the argument which is a probability is one minus what it would be otherwise.

Compare

Again, one can also do this using the symmetry of the normal distribution

But tricks like this are not available for non-symmetric distributions. For them one must use lower.tail = FALSE.

Density

dnorm is the R function that calculates the PDF f of the normal distribution. As with pnorm and qnorm, optional arguments specify the mean and standard deviation of the distribution.

There's not much need for this function in doing calculations, because you need to do integrals to use any PDF, and R doesn't do integrals. In fact, there's not much use for the d function for any continuous distribution (discrete distributions are entirely another matter, for them the d functions are very useful, see the section about dbinom).

For an example of the use of dnorm, see the following section.

As another example of the use of dnorm we show that location-scale families all have the same graph only the numbers on the axes change.

Random Variates

rnorm is the R function that simulates random variates having a specified normal distribution. As with pnorm, qnorm, and dnorm, optional arguments specify the mean and standard deviation of the distribution.

We won't be using the r functions (such as rnorm) much. So here we will only give an example without full explanation.

This generates 1000 independent and identically distributed (IID) normal random numbers (first line), plots their histogram (second line), and graphs the PDF of the same normal distribution (third line).

The Binomial Distribtion

Direct Look-Up, Points

dbinom is the R function that calculates the PMF of the binomial distribution. Optional arguments described on the on-line documentation specify the parameters of the particular binomial distribution.

Both of the R commands in the box below do exactly the same thing.

They look up P(X = 27) when X is has the Bin(100, 0.25) distribution.

Example

Question: Suppose widgits produced at Acme Widgit Works have probability 0.005 of being defective and defects are stochastically independent. Suppose widgits are shipped in cartons containing 25 widgits. What is the probability that a randomly chosen carton contains exactly one defective widgit?

Question Rephrased: What is P(X = 1) when X has the Bin(25, 0.005) distribution?

Answer:

Direct Look-Up, Intervals

pbinom is the R function that calculates the DF of the binomial distribution. Optional arguments described on the on-line documentation specify the parameters of the particular binomial distribution.

Both of the R commands in the box below do exactly the same thing.

They look up P(X ≤ 27) when X is has the Bin(100, 0.25) distribution. (Note the less than or equal to sign. It's important when working with a discrete distribution!)

Example

Question Rephrased: What is P(X ≤ 1) when X has the Bin(25, 0.005) distribution?

Answer:

Optional Argument `lower.tail = FALSE`

See above, under p functions for continuous distributions and under q functions for continuous distributions.

Note that for discrete distributions 1 − F(x) = Pr(X > x) and that is what lower.tail = FALSE calculates.

Hence if one wants Pr(X ≥ x) = 1 - Pr(X < x) = 1 - Pr(X ≤ x − 1) when X is integer-valued, then one has to do the look-up that way.

Example

Question Rephrased: What is Pr(X ≥ 2) when X has the Bin(25, 0.005) distribution?

Question Rephrased Again: What is Pr(X ≤ 1) when X has the Bin(25, 0.005) distribution?

Answer:

Inverse Look-Up

qbinom is the R function that calculates the quantile function of the binomial distribution. How does it do that when the DF is a step function and hence not invertible? The on-line documentation for the binomial probability functions explains.

The quantile is defined as the smallest value x such that F(x) ≥ p, where F is the distribution function.

When the p-th quantile is nonunique, there is a whole interval of values each of which is a p-th quantile. The documentation says that qbinom (and other "q" functions, for that matter) returns the smallest of these values. That is one sensible definition of the quantile function.

Example

Question: What are the 10th, 20th, and so forth quantiles of the Bin(10, 1/3) distribution?

Answer:

Note the nonuniqueness.

Statistics 5101 (Geyer, Spring 2022) Examples: Probability Distributions in R

R Functions for Probability Distributions

The Normal Distribtion

Direct Look-Up

Example

Optional Argument `lower.tail = FALSE`

Inverse Look-Up

Example

Optional Argument `lower.tail = FALSE`

Density

Random Variates

The Binomial Distribtion

Direct Look-Up, Points

Example

Direct Look-Up, Intervals

Example

Optional Argument `lower.tail = FALSE`

Example

Inverse Look-Up

Example

Navigation

Contents

Statistics 5101 (Geyer, Spring 2022) Examples: Probability Distributions in R

R Functions for Probability Distributions

The Normal Distribtion

Direct Look-Up

Example

Optional Argument lower.tail = FALSE

Inverse Look-Up

Example

Optional Argument lower.tail = FALSE

Density

Random Variates

The Binomial Distribtion

Direct Look-Up, Points

Example

Direct Look-Up, Intervals

Example

Optional Argument lower.tail = FALSE

Example

Inverse Look-Up

Example

Navigation

Contents

Optional Argument `lower.tail = FALSE`

Optional Argument `lower.tail = FALSE`

Optional Argument `lower.tail = FALSE`