dbinom
pbinom
qbinom
(Not Covered)
We already know how to calculate probabilities for one distribution: the normal distribution. Now we're going to learn another.
To a certain extent, you've seen one distribution, you've seen 'em all
is true. The same issues arise. We have forward problems and backward
problems. But there are two important differences between discrete and
continuous distributions (more precisely, two aspects of the same issue).
For discrete distributions we have
in fact
by the addition rule. So we do need to constantly worry about this.
The how to shoot yourself in the foot
advertised in the section
title is simply to forget about this issue.
dbinom
For our example we redo Example 12.4 in the textbook (Moore)
dbinom
(on-line help)
does direct look-up for single outcomes
for binomial distributions.
(See the section on pbinom
below
for probabilities of intervals.)
x
| vector of outcomes we want answers for |
size
| number of trials (binomial sample size) |
prob
| probability of success in each trial |
If we give dbinom
a vector for its first argument x
,
then it will calculate the probability for each outcome.
If we round these numbers and put them in a nice table we get
x | 0 | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|---|
P(x) | 0.2373 | 0.3955 | 0.2637 | 0.0879 | 0.0146 | 0.0010 |
Note that this table specifies the whole probability distribution
because the sample space is the numbers zero to five (in five trials
you can have between zero and five successes
) and so the table
specifies the probability of all possible outcomes.
This table is good for any binomial problem involving sample size 5
and success probability 0.25. We would get a different table
for each different sample size (more or fewer outcomes for one thing)
and for each different success probability.
pbinom
The type of problem we did in the preceding section has no analog for the normal distribution, or at least no interesting analog: the probability of any single outcome is zero for any continuous distribution.
In this section, we take up the kind of forward lookup
problem for the binomial distribution that is like normal distribution
forward lookups. The function has a similar name, pbinom
,
and works like pnorm
.
For our example we redo Example 12.5 in the textbook (Moore)
The arguments for dbinom
and pbinom
are the same.
The difference is that
dbinom
looks up the probability
of a single outcome P(X = a)
pbinom
looks up the probability
of a lower tail area P(X ≤ a)
Note especially
that pbinom
(a, n, p)
looks up P(X ≤ a)
rather than P(X < a).
pbinom
has another optional argument lower.tail
that works like the same optional argument to pnorm
(for examples, see the following section).
Psychologists studying how people think about statistical issues asked the following question to many people, both trained in statistics and untrained.
A certain town is served by two hospitals. In the larger hospital about 45 babies are born each day, and in the smaller hospital about 15 babies are born each day. As you know, about 50% of all babies are boys. The exact percentage of boys, however, varies from day to day. Sometimes it may be higher than 50%, sometimes lower.
For a period of 1 year, each hospital recorded the days on which more than 60% of the babies born were boys. Which hospital do you think recorded more such days?
Many people do not get the right answer that statistical variability decreases with sample size (as described by the square root law)
Here we do an exact calculation of probabilities. What is the probability that more than 60% of babies born were boys. Calculate for each hospital (n = 45 for the one and n = 15 for the other).
First we must turn a question about percents into a question about numbers.
As we should expect from the square root law
there is a much smaller
probability that the proportion for the larger sample is more than 0.10 above
the population mean 0.50.
So much for the lesson about general principles of statistics, now for some fiddly details.
The statement pbinom(27, 45, 0.5, lower.tail = FALSE)
calculates an upper tail area because of
the lower.tail = FALSE
. But what exactly does it calculate?
Its on-line help says it calculates
One must be very careful here. Lower tail areas calculated by
pbinom
include the boundary point.
Upper tail areas don't.
This makes sense in some ways (lower.tail = FALSE
says to
calculate the complementary event) but not in others (the two tails calculated
aren't symmetrically related). You just have to be very careful and know
what it does.
What if we changed the wording of the question to ask for the probability that 60% or more of the babies born were boys? Then the question asks for P(X ≥ 27) in the larger hospital and P(X ≥ 9) in the smaller hospital. How do we calulate that?
Because binomial random variables are integer-valued we have
Hence we calculate
Rweb:> pbinom(26, 45, 0.5, lower.tail = FALSE) [1] 0.1163466 Rweb:> pbinom(8, 15, 0.5, lower.tail = FALSE) [1] 0.3036194
qbinom
Our textbook has no examples of backward problems for the binomial
distribution because the technology
it discusses is so primitive
(calculators, spreadsheets). A real computer has no problem.
(Actually, if the book had tables for binomial distributions like it
has tables for normal distributions, you could do backward lookup in
those tables, just like in normal tables).
With the same setup as in
the hospitals and babies example above,
let us ask a backward lookup
question.
What is the number a of boy babies, such that the probability that fewer than a boy babies are born is 0.10? Calculate for each hospital (n = 45 for the one and n = 15 for the other).
The R function qbinom
(on-line
help)
does backward look-up for for binomial distributions.
Because of discreteness, the probability is not exactly what was asked for. There is no a such that P(X < a) = 0.1.
What the qbinom
lookups above actually do is find the a
such that
This is the best that can be done with the question.