Stepwise Regression & Modeling

Stepwise Regression & Modeling

Homework on Multiple Regression

  1. Fit the full model with all predictors;
    b. Plot the studentized residuals and Cook’s D and determine if any influential observations exist;
    c. Use Stepwise Regression to select variables important to the model;
    d. Interpret the final model. 

Solution

PROC IMPORT DATAFILE=”/folders/myfolders/sasuser.v94/13-69.xlsx”

DBMS=XLSX

OUT=WORK.IMPORT;

GETNAMES=YES;

RUN; 

  1. Fit thefull model withall predictors; 

ODS GRAPHICS ON;
PROC REG DATA=WORK.IMPORT
PLOTS(label only)=(cooksd rstudentbypredicted);
ID OBS;
MODEL  PULSE=RUN SMOKE WEIGHT HEIGHT PHYS1 PHYS2/ influence;
RUN;
QUIT;
ODS GRAPHICS OFF; 

The REG Procedure

Model: MODEL1

Dependent Variable: PULSE PULSE

Number of Observations Read 30
Number of Observations Used 30
Analysis of Variance
Source DF Sum of
Squares
Mean
Square
F Value Pr > F
Model 6 1869.88177 311.64696 6.71 0.0003
Error 23 1068.81823 46.47036
Corrected Total 29 2938.70000
Root MSE 6.81692 R-Square 0.6363
Dependent Mean -9.90000 Adj R-Sq 0.5414
Coeff Var -68.85775  
Parameter Estimates
Variable Label DF Parameter
Estimate
Standard
Error
t Value Pr > |t|
Intercept Intercept 1 -26.36485 36.53646 -0.72 0.4778
RUN RUN 1 11.76733 2.67552 4.40 0.0002
SMOKE SMOKE 1 -7.02414 2.70636 -2.60 0.0162
WEIGHT WEIGHT 1 0.05721 0.08294 0.69 0.4972
HEIGHT HEIGHT 1 -0.02090 0.59180 -0.04 0.9721
PHYS1 PHYS1 1 13.55492 4.21419 3.22 0.0038
PHYS2 PHYS2 1 7.89397 3.94366 2.00 0.0573

 Plot thestudentizedresiduals andCook’s D anddetermineif any influential observationsexist;

 

Influential observations: Obs 13 exceed the cutoff value. 

  1. UseStepwiseRegressiontoselectvariables importanttothe model;

PROC REG DATA=WORK.IMPORT;

MODEL PULSE=RUN SMOKE WEIGHT HEIGHT PHYS1 PHYS2/  selection = forward;

RUN;

Forward Selection: Step 5

Variable WEIGHT Entered: R-Square = 0.6363 and C(p) = 5.0012

Analysis of Variance
Source DF Sum of
Squares
Mean
Square
F Value Pr > F
Model 5 1869.82379 373.96476 8.40 0.0001
Error 24 1068.87621 44.53651
Corrected Total 29 2938.70000
Variable Parameter
Estimate
Standard
Error
Type II SS F Value Pr > F
Intercept -27.58559 11.60120 251.81104 5.65 0.0257
RUN 11.75976 2.61083 903.55898 20.29 0.0001
SMOKE -7.02214 2.64887 312.99291 7.03 0.0140
WEIGHT 0.05550 0.06578 31.69457 0.71 0.4072
PHYS1 13.55250 4.12503 480.73062 10.79 0.0031
PHYS2 7.91508 3.81615 191.59111 4.30 0.0490

No other variable met the 0.5000 significance level for entry into the model.

Summary of Forward Selection
Step Variable
Entered
Label Number
Vars In
Partial
R-Square
Model
R-Square
C(p) F Value Pr > F
1 RUN RUN 1 0.3969 0.3969 12.1359 18.43 0.0002
2 PHYS1 PHYS1 2 0.0815 0.4785 8.9816 4.22 0.0498
3 SMOKE SMOKE 3 0.0880 0.5665 5.4142 5.28 0.0299
4 PHYS2 PHYS2 4 0.0590 0.6255 3.6833 3.94 0.0583
5 WEIGHT WEIGHT 5 0.0108 0.6363 5.0012 0.71 0.4072

 Interpret thefinal model.

Final model Height is not included in the final model and it is stronger than the original. R-squared is 63.63% vs 54.14%.

The significant influencing factors are run, phys1 and smoke, phys2 is borderline significant. (p close to 0.05)