# Statistics 3011 (Geyer and Jones, Spring 2006) Examples: Binomial Distributions

## How to Shoot Yourself in the Foot: Discrete versus Continuous

We already know how to calculate probabilities for one distribution: the normal distribution. Now we're going to learn another.

To a certain extent, you've seen one distribution, you've seen 'em all is true. The same issues arise. We have forward problems and backward problems. But there are two important differences between discrete and continuous distributions (more precisely, two aspects of the same issue).

• It makes sense to talk about the probability of a single outcome P(X = a). This is always zero for continuous distributions (homework problem 9.18 is about this phenomenon). But it is not zero for discrete distributions. Thus we have a new type of interesting problem (following section).
• For continuous distributions we always have P(X < a) = P(Xa). We don't need to distinguish between less than and less than or equal.

For discrete distributions we have

P(X < a) ≠ P(Xa)

in fact

P(X < a) + P(X = a) = P(Xa)

## Probabilities of Outcomes: `dbinom`

### Example 12.4

For our example we redo Example 12.4 in the textbook (Moore)

• The R function `dbinom` (on-line help) does direct look-up for single outcomes for binomial distributions. (See the section on `pbinom` below for probabilities of intervals.)
• The arguments are  `x` vector of outcomes we want answers for `size` number of trials (binomial sample size) `prob` probability of success in each trial

### Example 12.4 Continued

If we give `dbinom` a vector for its first argument `x`, then it will calculate the probability for each outcome.

If we round these numbers and put them in a nice table we get

 x P(x) 0 1 2 3 4 5 0.2373 0.3955 0.2637 0.0879 0.0146 0.001

Note that this table specifies the whole probability distribution because the sample space is the numbers zero to five (in five trials you can have between zero and five successes) and so the table specifies the probability of all possible outcomes.

This table is good for any binomial problem involving sample size 5 and success probability 0.25. We would get a different table for each different sample size (more or fewer outcomes for one thing) and for each different success probability.

## Probabilities of Intervals: `pbinom`

The type of problem we did in the preceding section has no analog for the normal distribution, or at least no interesting analog: the probability of any single outcome is zero for any continuous distribution.

In this section, we take up the kind of forward lookup problem for the binomial distribution that is like normal distribution forward lookups. The function has a similar name, `pbinom`, and works like `pnorm`.

### Example 12.5

For our example we redo Example 12.5 in the textbook (Moore)

The arguments for `dbinom` and `pbinom` are the same.

The difference is that

• `dbinom` looks up the probability of a single outcome P(X = a)
• `pbinom` looks up the probability of a lower tail area P(Xa)

Note especially that `pbinom`(a, n, p) looks up P(Xa) rather than P(X < a).

`pbinom` has another optional argument `lower.tail` that works like the same optional argument to `pnorm` (for examples, see the following section).

### Sample Size Matters

Psychologists studying how people think about statistical issues asked the following question to many people, both trained in statistics and untrained.

A certain town is served by two hospitals. In the larger hospital about 45 babies are born each day, and in the smaller hospital about 15 babies are born each day. As you know, about 50% of all babies are boys. The exact percentage of boys, however, varies from day to day. Sometimes it may be higher than 50%, sometimes lower.

For a period of 1 year, each hospital recorded the days on which more than 60% of the babies born were boys. Which hospital do you think recorded more such days?

Many people do not get the right answer that statistical variability decreases with sample size (as described by the square root law)

Here we do an exact calculation of probabilities. What is the probability that more than 60% of babies born were boys. Calculate for each hospital (n = 45 for the one and n = 15 for the other).

First we must turn a question about percents into a question about numbers.

0.6 × 45 = 27
0.6 × 15 = 9

As we should expect from the square root law there is a much smaller probability that the proportion for the larger sample is more than 0.10 above the population mean 0.50.

So much for the lesson about general principles of statistics, now for some fiddly details.

The statement `pbinom(27, 45, 0.5, lower.tail = FALSE)` calculates an upper tail area because of the `lower.tail = FALSE`. But what exactly does it calculate?

Its on-line help says it calculates

1 − P(X ≤ 27) = P(X > 27)

One must be very careful here. Lower tail areas calculated by `pbinom` include the boundary point. Upper tail areas don't. This makes sense in some ways (`lower.tail = FALSE` says to calculate the complementary event) but not in others (the two tails calculated aren't symmetrically related). You just have to be very careful and know what it does.

What if we changed the wording of the question to ask for the probability that 60% or more of the babies born were boys? Then the question asks for P(X ≥ 27) in the larger hospital and P(X ≥ 9) in the smaller hospital. How do we calulate that?

Because binomial random variables are integer-valued we have

P(X ≥ 27) = P(X > 26)
P(X ≥ 9) = P(X > 8)

Hence we calculate

```Rweb:> pbinom(26, 45, 0.5, lower.tail = FALSE)
[1] 0.1163466
Rweb:> pbinom(8, 15, 0.5, lower.tail = FALSE)
[1] 0.3036194
```

## Inverse Lookup: `qbinom`

Our textbook has no examples of backward problems for the binomial distribution because the technology it discusses is so primitive (calculators, spreadsheets). A real computer has no problem. (Actually, if the book had tables for binomial distributions like it has tables for normal distributions, you could do backward lookup in those tables, just like in normal tables).

### Hospitals Continued

With the same setup as in the hospitals and babies example above, let us ask a backward lookup question.

What is the number a of boy babies, such that the probability that fewer than a boy babies are born is 0.10? Calculate for each hospital (n = 45 for the one and n = 15 for the other).

The R function `qbinom` (on-line help) does backward look-up for for binomial distributions.
What the `qbinom` lookups above actually do is find the a such that