# Stepwise Regression & Modeling

Homework on Multiple Regression

1. Fit the full model with all predictors;
b. Plot the studentized residuals and Cook’s D and determine if any influential observations exist;
c. Use Stepwise Regression to select variables important to the model;
d. Interpret the final model.

Solution

PROC IMPORT DATAFILE=”/folders/myfolders/sasuser.v94/13-69.xlsx”

DBMS=XLSX

OUT=WORK.IMPORT;

GETNAMES=YES;

RUN;

1. Fit thefull model withall predictors;

ODS GRAPHICS ON;
PROC REG DATA=WORK.IMPORT
PLOTS(label only)=(cooksd rstudentbypredicted);
ID OBS;
MODEL  PULSE=RUN SMOKE WEIGHT HEIGHT PHYS1 PHYS2/ influence;
RUN;
QUIT;
ODS GRAPHICS OFF;

The REG Procedure

Model: MODEL1

Dependent Variable: PULSE PULSE

 Number of Observations Read 30 Number of Observations Used 30
 Analysis of Variance Source DF Sum of Squares Mean Square F Value Pr > F Model 6 1869.88177 311.64696 6.71 0.0003 Error 23 1068.81823 46.47036 Corrected Total 29 2938.70000
 Root MSE 6.81692 R-Square 0.6363 Dependent Mean -9.9 Adj R-Sq 0.5414 Coeff Var -68.8577
 Parameter Estimates Variable Label DF Parameter Estimate Standard Error t Value Pr > |t| Intercept Intercept 1 -26.36485 36.53646 -0.72 0.4778 RUN RUN 1 11.76733 2.67552 4.40 0.0002 SMOKE SMOKE 1 -7.02414 2.70636 -2.60 0.0162 WEIGHT WEIGHT 1 0.05721 0.08294 0.69 0.4972 HEIGHT HEIGHT 1 -0.02090 0.59180 -0.04 0.9721 PHYS1 PHYS1 1 13.55492 4.21419 3.22 0.0038 PHYS2 PHYS2 1 7.89397 3.94366 2.00 0.0573

Plot thestudentizedresiduals andCook’s D anddetermineif any influential observationsexist;

Influential observations: Obs 13 exceed the cutoff value.

1. UseStepwiseRegressiontoselectvariables importanttothe model;

PROC REG DATA=WORK.IMPORT;

MODEL PULSE=RUN SMOKE WEIGHT HEIGHT PHYS1 PHYS2/  selection = forward;

RUN;

Forward Selection: Step 5

Variable WEIGHT Entered: R-Square = 0.6363 and C(p) = 5.0012

 Analysis of Variance Source DF Sum of Squares Mean Square F Value Pr > F Model 5 1869.82379 373.96476 8.40 0.0001 Error 24 1068.87621 44.53651 Corrected Total 29 2938.70000
 Variable Parameter Estimate Standard Error Type II SS F Value Pr > F Intercept -27.58559 11.60120 251.81104 5.65 0.0257 RUN 11.75976 2.61083 903.55898 20.29 0.0001 SMOKE -7.02214 2.64887 312.99291 7.03 0.0140 WEIGHT 0.05550 0.06578 31.69457 0.71 0.4072 PHYS1 13.55250 4.12503 480.73062 10.79 0.0031 PHYS2 7.91508 3.81615 191.59111 4.30 0.0490

No other variable met the 0.5000 significance level for entry into the model.

 Summary of Forward Selection Step Variable Entered Label Number Vars In Partial R-Square Model R-Square C(p) F Value Pr > F 1 RUN RUN 1 0.3969 0.3969 12.1359 18.43 0.0002 2 PHYS1 PHYS1 2 0.0815 0.4785 8.9816 4.22 0.0498 3 SMOKE SMOKE 3 0.0880 0.5665 5.4142 5.28 0.0299 4 PHYS2 PHYS2 4 0.0590 0.6255 3.6833 3.94 0.0583 5 WEIGHT WEIGHT 5 0.0108 0.6363 5.0012 0.71 0.4072

Interpret thefinal model.

Final model Height is not included in the final model and it is stronger than the original. R-squared is 63.63% vs 54.14%.

The significant influencing factors are run, phys1 and smoke, phys2 is borderline significant. (p close to 0.05)