Statistics 3011 (Geyer and Jones, Spring 2006) Examples: Pie and Bar Charts

Contents

Pie Charts

For our example we will use the data from Example 1.2 in the Textbook (Moore) which is in the URL

http://www.stat.umn.edu/geyer/3011/mdata/chap01/eg01-02.dat

so we can use the URL external data entry method.

External Data Entry

Enter a dataset URL :

Or

Select a local file to submit:

Comments

The R function pie (on-line help) draws pie charts.

The first argument is a vector of positive numbers that indicate the areas of the pie slices, and the second argument is a vector of labels for the pie slices (which are character strings). There are many other arguments described in the help linked above, but you don't need to know about them for this class.

Looking in the data URL linked above we see the numbers are in the variable named Weight and the category labels are in the variable named Material. That is where we got those variable names. Note the variable names in the data are also printed at the beginning of the Rweb output

Rweb:> names(X) 
[1] "Material" "Weight"

so they can be found there too. Also note that the names are case-sensitive. They must be used exactly as they appear in the data file.

Interestingly, the help linked above also says the following

Pie charts are a very bad way of displaying information. The eye is good at judging linear measures and bad at judging relative areas. A bar chart or dot chart is a preferable way of displaying this type of data.

Cleveland (1985), page 264: Data that can be shown by pie charts always can be shown by a dot chart. This means that judgements of position along a common scale can be made instead of the less accurate angle judgements. This statement is based on the empirical investigations of Cleveland and McGill as well as investigations by perceptual psychologists.

(the cited work of Cleveland is given in the help linked above).

Bar Plots

Simple

For our example we will use the same data as we used above.

External Data Entry

Enter a dataset URL :

Or

Select a local file to submit:

Comments

The R function barplot (on-line help) draws bar plots.

As with pie the first argument is a vector of positive numbers indicating the heights of the bars.

Unlike with pie, the other arguments we want to use are farther along in the argument list, so we must use their names.

The argument names.arg, which we abbreviate to names as explained on the R Intro page, gives the names of the bars (categories).

The argument col specifies colors for the bars. There are many possible specifications (any colors you want and a computer can represent). Here we use rainbow(length(Material)), which specifies a sequence of the same length as Material and Weight that are rainbow colors.

Of course, you don't have to specify colors if you don't want them. In particular, if you are color blind (or anyone in your audience is), then these particular colors are no good. But R has many other options, for example

col = rainbow(length(Material), start = 0.7, end = 0.1)

The mysterious Material <- as.character(Material) says to coerce the R object Material to be a vector of character strings. This is surprising because it seems from looking at the data file that it is originally a vector of character strings. What's going on?

The short answer is you don't want to know because it is complicated.

The long answer is R is trying to be helpful in a way that isn't actually helpful here. When data are read into Rweb, they go into a data.frame called X, which is why names(X) shows the variable names. When R puts data in a data.frame it coerces character strings to so-called factors. This is very helpful if you are going to analyze the data with regression or analysis of variance (Chapters 21 and 22 in the textbook) but just get in the way right now. In order to defeat R's unwanted helpfulness, we need to convert Material back to what it was originally before R tried to help.

Complicated, Stacked

The textbook doesn't seem to give a suitable example for the subject of this section, which is barplots in which the bars are subdivided to show subcategories. So we will take an example from the (barplot on-line help) which uses one of the (datasets that come with R), which gives death rates per 100 in Virginia in 1940.

Comments

This is a bad example, because the heights of the bars are meaningless. The death rates in the different categories do not add because they have different denominators. They should be compared side by side, as we show below. We only give this example to illustrate this kind of bar plot.

One way to see that the heights of the bars make no sense is that some of them are over 100 and more than 100 deaths per 100 is nonsense.

The argument VADeaths is an object of type "matrix", which our web pages don't explain. Think of it as a bunch of numbers in a rectangular array. Each column of the matrix specifies the heights of the subbars of one bar. Matrices are usually made by the matrix function or the cbind function or the rbind function.

Complicated, Side by Side

The barplot on-line help example following the one in the preceding section makes much more sense. It is fairly complicated, we copy it verbatim from the example.

Comments

This is a good example. It clearly shows the death rate going up with age. One can still see that some population groups have higher death rates than others. The legend (the box showing what the colors mean) adds extra information.

The argument beside = TRUE says to make this kind of side by side bar plot.

The argument legend = rownames(VADeaths) says to make the legend box and what the legend labels are. If you hadn't known that rownames(VADeaths) does that,

legend = c("50-54", "55-59", "60-64", "65-69", "70-74")
would do exactly the same thing.

The argument ylim = c(0, 100) says to make the y-axis go from 0 to 100 (the default would be some round numbers that contain the data, in this case 0 to 70). This makes room for the legend box.

The title function adds a main title. The font.main = 4 is IMHO gratuitous. It just changes the font.

The Moral of the Story

R can make publication quality graphics, but it requires some work and reading the documentation (especially looking at the examples and trying them out).