Next: macintosh Up: MacAnova Help File Previous: logistic()   Contents

lowess()

Usage:
result <- lowess(x, y [,xpred:xp] [,fract:f] [,iter:m] [, delta:Delta])
  REAL nonMISSING vectors x, y, xp, nondecreasing x, xp, length(x)=
  length(y), REAL scalars f and Delta, 0 < f <= 1, Delta >= 0, integer
  iter > 0; result is structure(x:x,y:yfit [,xpred:xp,ypred:yp])



Keywords: regression, descriptive statistics, plotting
                              Introduction
lowess() uses the LOWESS smoother algorithm to summarize the dependence
of a REAL vector y on a non-decreasing REAL vector x of the same length.

The assumed model is y = g(x) + e, where g(x) is a "smooth" unknown
function of a predictor x and e is an random "error" with E[e] = 0.  The
smoothed output values are estimates of g(x).

Optionally lowess() also estimates E(yp|xp) = g(xp), where xp may differ
from all x[i], assuming approximately linear dependence near xp.

LOWESS is a resistant (to outliers) locally linear smoother.  See below
for more information.

                                  Usage
Result <- lowess(x,y) and Result <- lowess(structure(x,y)) use the
LOWESS smoother to find a vector ys with ys[i] = smoothed y value
corresponding to x[i].  x and y must be REAL vectors with the same
length and with no MISSING elements.  Also, x must be non-decreasing,
that is x[i] <= x[i+1].

Result = structure(x:x, y:ys), where x is identical to the input x
vector and ys is a REAL vector the same length as x and y.

Result <- lowess(x,y, xpred:xp) and Result <- lowess(structure(x,y),
xpred:xp) do the same but also compute a vector yp of fitted values
corresponding to REAL vector xp using weighted least squares.  xp must
have no MISSING values and must be nondecreasing.

With xpred:xp, Result = structure(x:x, y:y, xpred:xp, ypred:yp), where
yp is a REAL vector the same length as xp with yp[i] the predicted value
corresponding to xp[i].  When xp[i] = x[j], yp[i] = ys[j].  If xp[i] <
x[1] or xp[i] > x[n], where n = length(x), yp[i] is computed by
extrapolation using the straight line used to compute ys[1] or ys[n].

When there are tied x[i]'s, say x[i] = x[i+1] = ... x[i+k-1], ys[i] =
ys[i+1] = ... = ys[i+k-1].

You can modify the behavior of lowess() using keywords 'fract', 'delta'
and 'iter'.

When x is not sorted, you need to use lowess(sort(x),y[grade(x)]).

                                Keywords
The following summarizes the lowess() keyword phrases which modify the
smoothing algorithm.  All have REAL scalar values and can be used with
'xpred'.
  Keyword      Value      Default     Description
  -----------------------------------------------------------------------
  fract:f     0 < f <= 1    2/3       Fraction of points used to compute
                                      each smoothed value
  delta:Delta 0 <= Delta x_range/100  x values within Delta of other x
                                      values have smoothed values
                                      computed by interpolation rather than
                                      additional regressions
  iter:m      integer m > 0  3        number of iterations of robust
                                      fitting

These defaults are the same as those in R.

The larger f is, the smoother ys will be.  This is because each weighted
regression is based on r data pairs, (x[j], y[j]), j = j1, j1+1, ..., j2
= j1+r-1, where r = max(2,round(f*n)).

Non-zero values of delta can result in faster fitting.  See below.

When there are tied x's, the value of ys can differ for different
orderings of the ties x's.

'fract' and 'delta' can be abbreviated to 'f' and 'del', respectively.

                                 Method
lowess() uses iteratively reweighted least squares to fit a straight
line to data near each x[i].  The weight for x[j] depends on |x[j]-x[i]|
and, after the first iteration, on |y[j]-y_smooth[j]|, where y_smooth[j]
is the smoothed value from the previous iteration.  Larger distances and
larger residuals result in lower weights.  See the reference below for
details on the weights used.

ys[i] is computed from r pairs, (x[j], y[j]), j=j1,...,j2 = j1 + r - 1
and r = max(2, round(f*n)) with x[j1] <= x[i] <= x[j2], where j1 <=
n+1-r is chosen to minimize max(x[i] - x[j1], x[j2] - x[i]).

When these x[j]'s don't vary enough reliably to estimate a slope, ys[i]
is a weighted average of y[j], j1 <= j <= j2.

Some shortcuts are made:

When there are k tied x's, that is x[i] = x[i+1] = ... = x[i+k-1], the
regression computation is done only for x[i] and then ys[i+1], ...,
y[i+k-1] are set equal to ys[i].

When Delta > 0 and there are near ties, that is x[i1]-x[i] <= Delta, i1
> i, ys[i1] is computed by linear interpolation between ys[i] and ys[i2]
where x[i2] - x[i] > Delta (or between ys[i] and ys[n] when x[n]-x[i] <
Delta.

                                Reference
You can find more information about the lowess() method in the following
reference:
  Cleveland, W. S. (1979) Robust locally weighted regression and
  smoothing scatterplots. J. Amer. Statist. Assoc. 74, 829-836.

The code used is adapted from ratfor subroutines lowess() and lowest()
by W. S. Cleveland obtained from statlib.

                                 Example
  Cmd> irisdata <- getdata(irisdata, quiet:T) # Fisher Iris Data
  Read from file "/usr/libs/macanova/auxfiles/macanova.dat"

  Cmd> variety <- irisdata[,1]

  Cmd> x <- irisdata[variety==1,2] # I. setosa sepal length

  Cmd> y <- irisdata[variety==1,3] # I. setosa sepal width

  Cmd> plot(x,y,symbols:"\1",show:F)

  Cmd> addlines(lowess(sort(x),y[grade(x)]), show:F)

  Cmd> showplot(dumb:T,width:70,xlab:"Sepal length",ylab:"Sepal width",\
       title:"Iris Setosa Sepal width vs Sepal length with smooth")
                Iris Setosa Sepal width vs Sepal length with smooth
             +---+-------+------+------+-------+------+------+-------++
          4.5+                                                   o    +
             |                                                        |
             |                                            o           |
             |                                 o                      |
   S        4+                                                  .....o+
   e         |                                        o     ....      |
   p         |                             o      o ..o.....     o    |
   a         |           o          o  o     .......                  |
   l      3.5+                         o ..o.  o          o           +
             |           o      o     .o.  o   o      o               |
   w         |                ........ o   o                          |
   i         |   o ......o..o.         o                              |
   d        3+o..o.      o      o   o  o                              +
   t         |   o                                                    |
   h         |                                                        |
             |                                                        |
             |                                                        |
          2.5+                                                        +
             |       o                                                |
             +---+-------+------+------+-------+------+------+-------++
                4.4     4.6    4.8     5      5.2    5.4    5.6    5.8
                                   Sepal length

                            Cross references
See also regress().


Gary Oehlert 2006-01-30