Statistics 5601 (Geyer, Spring 2006) Examples: Survival Analysis

Contents

General Instructions

To do each example, just click the "Submit" button. You do not have to type in any R instructions or specify a dataset. That's already done for you.

Exponential versus IFR or DFR

Hypothesis Tests

Example 11.1 in Hollander and Wolfe.

External Data Entry

Enter a dataset URL :

IFR Point Estimate

Source: Marshall and Proschan (1965), Annals of Mathematical Statistics 65:69-77, pp. 70-71, esp. equations (3.6) and (3.2).

External Data Entry

Enter a dataset URL :

Summary

rate lower limit upper limit
0.0000 -∞ 42
0.0210 42 61
0.0333 61 66
0.0566 66 81
0.5000 81 82
82

The IFR point estimate is a function (the rate function), as in many cases, the nonparametric function estimate is a step function (like the empirical c. d. f.). Failure rate infinity past x = 82 means all individuals surviving to that time fail immediately. Similarly, failure rate zero before x = 42, means no failures occur before then.

Thus the failure time distribution is concentrated on the observed range of the data 42 < x < 82. For comparison, the estimator assuming constant failure rate on (0, &infin), the exponential failure time distribution, has failure rate 0.0154.

DFR Point Estimate

There is a similar DFR point estimate, also given by Marshall and Proschan (1965) cited above. Since we have decided that this example is IFR rather than DFR, we will skip it.

Kaplan-Meier

Point Estimate (Survival Curve)

The Kaplan-Meier survival curve is estimated using the survfit function in the survival library in R ( on-line help).

Example 11.7 in Hollander and Wolfe.

External Data Entry

Enter a dataset URL :

Confidence Interval

Example 11.7 in Hollander and Wolfe.

External Data Entry

Enter a dataset URL :

Comment

This is a pointwise not (!) simultaneous confidence interval for the curve. Hollander and Wolfe describe simultaneous confidence bands for the curve, but apparently the survival package in R does not implement them. (I have no idea why.)

Single Confidence Interval

Sometimes you just want the interval for one time, say 1000 days. The summary.survfit function (on-line help) does that, as shown in the last line of the example above.

Hypothesis Test

The log-rank or Mantel-Haenszel test of whether there is a difference between two or more survival curves is performed using the survdiff function in the survival library in R ( on-line help).

Example 11.7 in Hollander and Wolfe.

External Data Entry

Enter a dataset URL :

Summary

P = 0.00115 (Mantel-Haenszel test).

Comment

The reason this disagrees with the book (Hollander and Wolfe, Section 11.7, page 553) is that Hollander and Wolfe do a one-tailed test, and the survdiff function only does two-tailed tests.

Of course, one can always convert between the two using two tails is twice one tail. Indeed Hollander and Wolfe's P-value is half of R's.