Federal laws require that local units of government find alternatives to disposing of solid waste (also known as garbage) in landfills. Some local units of government have considered using solid waste as a fuel for energy plants. The idea is that the solid waste would be collected, then processed into a fuel, and finally the fuel would be used to produce electricity. Although the garbage is free, the cost of transportation and processing makes it much more costly than usual fuels (hydroelectric, coal, etc.). Consequently, burning garbage requires a subsidy.
This problem uses a set of data collected by a local unit of government to try understand how the amount of garbage produced by a business varies as a function of business characteristics for 147 randomly selected business properties. The eventual goal is to get a prediction equation for all businesses, and these businesses will be taxed in proportion to their estimated garbage production.
The data are contained in the file waste.lsp, which can be obtained from the class web site; go to the data page. On Linux, you can load the data file by typing the Arc command (load "waste"). Here is a listing of the variables:
FTE Variate 147 number of full time equivalent employees ImprV Variate 147 Value of the improvements to the parcel, in dollars LandV Variate 147 Land value in dollars Size Variate 147 Total size of all buildings on the parcel, sq. ft. Use Variate 147 Type of commercial use Waste Variate 147 waste production in tons per yearUse is a coded variable with Use = 2 if manfacturing, 3 if warehouse or storage, 4 if office building, 5 if retail, 6 if restaurant or entertainment. Your goal is to produce (and defend) (1) an understanding of how Waste depends on the other variables and (2) a prediction method.
What to Turn In. Your solution should consist of two parts, a ``Summary" and ``Supporting Evidence."
The summary will consist of: (1) a statement of your conclusions, with relevant summary statistics and probability statements. This should be at most 300 words. Your conclusions may be equivocal: for example, they might depend on whether or not a specific case is treated as an outlier. (2) AT MOST two graphical or numerical displays that are designed to convince someone familiar with statistical analysis that your analysis is sound, and that your conclusions are justified. Just giving a graph is NOT enough: you must explain what the graph shows and why it is interesting. The summary should be understandable by an intelligent public official.
Your supporting evidence will consist of
AT MOST 500 words explaining how you got your answer, with as
much computer output (text/figures) that you think is necessary to support
your text. Unlabeled or unreferenced computer output will count against
you.