For our example we will use the data from Example 1.2 in the Textbook (Moore) which is in the URL
so we can use the URL external data entry method.
The R function pie
(on-line help)
draws pie charts.
The first argument is a vector of positive numbers that indicate the areas of the pie slices, and the second argument is a vector of labels for the pie slices (which are character strings). There are many other arguments described in the help linked above, but you don't need to know about them for this class.
Looking in the data URL linked above we see the numbers are in the variable
named Weight
and the category labels are in the variable named
Material
. That is where we got those variable names.
Note the variable names in the data are also printed at the beginning of
the Rweb output
Rweb:> names(X) [1] "Material" "Weight"
so they can be found there too. Also note that the names are case-sensitive. They must be used exactly as they appear in the data file.
Interestingly, the help linked above also says the following
Pie charts are a very bad way of displaying information. The eye is good at judging linear measures and bad at judging relative areas. A bar chart or dot chart is a preferable way of displaying this type of data.
Cleveland (1985), page 264:
Data that can be shown by pie charts always can be shown by a dot chart. This means that judgements of position along a common scale can be made instead of the less accurate angle judgements.This statement is based on the empirical investigations of Cleveland and McGill as well as investigations by perceptual psychologists.
(the cited work of Cleveland is given in the help linked above).
For our example we will use the same data as we used above.
The R function barplot
(on-line help)
draws bar plots.
As with pie
the first argument is a vector of positive
numbers indicating the heights of the bars.
Unlike with pie
, the other arguments we want to use are
farther along in the argument list, so we must use their names.
The argument names.arg
, which we abbreviate to names
as explained on the R Intro page, gives
the names of the bars (categories).
The argument col
specifies colors for the bars.
There are many possible specifications (any colors you want and a computer
can represent). Here we use rainbow(length(Material))
,
which specifies a sequence of the same length as Material
and Weight
that are rainbow
colors.
Of course, you don't have to specify colors if you don't want them. In particular, if you are color blind (or anyone in your audience is), then these particular colors are no good. But R has many other options, for example
col = rainbow(length(Material), start = 0.7, end = 0.1)
The mysterious Material <- as.character(Material)
says to coerce the R object Material
to be a vector of
character strings. This is surprising because it seems from looking
at the data file that it is originally a vector of character strings.
What's going on?
The short answer is you don't want to know
because it is complicated.
The long answer is
R is trying to be helpful in a way that isn't actually helpful here.
When data are read into Rweb, they go into a data.frame
called
X
, which is why names(X)
shows the variable names.
When R puts data in a data.frame
it coerces character strings
to so-called factors
. This is very helpful if you are going to
analyze the data with regression or analysis of variance (Chapters 21
and 22 in the textbook) but just get in the way right now.
In order to defeat R's unwanted helpfulness, we need to convert
Material
back to what it was originally before R tried to help.
The textbook doesn't seem to give a suitable example for the subject
of this section, which is barplots in which the bars are subdivided
to show subcategories. So we will take an example from the
(barplot
on-line help)
which uses one of the
(datasets that come with R), which gives death rates per 100 in Virginia
in 1940.
This is a bad example, because the heights of the bars are meaningless. The death rates in the different categories do not add because they have different denominators. They should be compared side by side, as we show below. We only give this example to illustrate this kind of bar plot.
One way to see that the heights of the bars make no sense is that some of them are over 100 and more than 100 deaths per 100 is nonsense.
The argument VADeaths
is an object of type "matrix"
,
which our web pages don't explain. Think of it as a bunch of numbers
in a rectangular array. Each column of the matrix specifies the heights
of the subbars of one bar. Matrices are usually made by the
matrix
function
or the
cbind
function
or the
rbind
function.
The barplot
on-line help example following the one in the preceding
section makes much more sense. It is fairly complicated, we copy it
verbatim from the example.
This is a good example. It clearly shows the death rate going up with age. One can still see that some population groups have higher death rates than others. The legend (the box showing what the colors mean) adds extra information.
The argument beside = TRUE
says to make this kind of side by
side bar plot.
The argument legend = rownames(VADeaths)
says to make the
legend box and what the legend labels are. If you hadn't known that
rownames(VADeaths)
does that,
legend = c("50-54", "55-59", "60-64", "65-69", "70-74")would do exactly the same thing.
The argument ylim = c(0, 100)
says to make the y-axis go
from 0 to 100 (the default would be some round numbers that contain the
data, in this case 0 to 70). This makes room for the legend box.
The title
function adds a main title.
The font.main = 4
is
IMHO gratuitous.
It just changes the font.
R can make publication quality graphics, but it requires some work and reading the documentation (especially looking at the examples and trying them out).