Regression Interpretation

Regression Interpretation

In R, open the IQ.RDATA file. We are primarily interested in therelationship between IQ and wage (monthly earnings).

1) Run the regression:

  1. Interpret the regression. i.e., provide a full interpretation of all the coefficients (note: this includes the intercept…which is a coefficient on the constant).
  2. Are the coefficients statistically significant? Explain.
  3. Do the results make sense? Explain.

2) Run the regression:

  1. Interpret the regression.
  2. Explain in plain English the interpretation of
  3. Are the coefficients statistically significant? Explain.
  4. How did the coefficient on IQ change from (1)? Based on our discussion in class, explain why youthink this happened (e.g., explain how and why in (1) was biased).

3) Use the two step regression method that we discussed in class to estimate the relationship between wageand IQ, net the effects of education (e.g., show how you estimate the same  as in (2) using the two stepprocess).
a. Explain your steps and provide results from each step.

4) Run the regression:

  1. Interpret the regression.
  2. Explain in plain English the interpretation of
  3. Are the coefficients statistically significant? Explain.
  4. How did the coefficient on IQ change from (2)? Based on our discussion in class, explain why youthink this happened.

5) Run the regression:

  1. Interpret the regression.
  2. Explain in plain English the interpretation of
  3. Are the coefficients statistically significant? Explain.
  4. How did the coefficient on IQ change from (2)? Based on our discussion in class, explain why youthink this happened.
  5. Write a function to calculate the R2 that uses only the regression object as the input.

i.     E.g.,

mylastname.rsquare<- function(reg){

  1. …your code here…
    }
    Hint: recall the equations for SST, SSE and SSR and how we
    accessed the objects that are saved within the regressionobject.
  2. Interpret the R2.
  3. Calculate the adjusted R2 using a function similar to the one you wrote for part (e). Explain why the R2 and adjusted R2
  4. Perform an F-test on the joint significance of the regression.
    1. What are H0 and H1?
    2. What is the critical value for your test? Explain.
  • Draw the F-distribution and highlight the relevant rejection region for this test.
  1. What is your conclusion? Explain.

In Stata, open the wage.dta data. These are different data with which we will examine the relationship betweenwage, education experience and job tenure.
6) Run the regression:

  1. Interpret the regression.
  2. Are the coefficients statistically significant? Explain.
  3. Do the results make sense? Explain.

7) Run the regression:

  1. Interpret the regression.
  2. Explain in plain English the interpretation of
  3. Are the coefficients statistically significant? Explain.
  4. How did the coefficient on educchange from (6)? Based on our discussion in class, explain why you think this happened.

8) DON’T run, but examine the regression:

  1. Use a scatter plot and LOESS (lowess in Stata) fit line to explain your expectations for the signs of 

Solution

 1) Run the regression: .

  1. Interpret the regression. i.e., provide a full interpretation of all the coefficients (note: this includes the intercept…which is a coefficient on the constant).

The R script that estimates the regression model and its results follow:

> reg1 <- lm(data$wage~data$IQ)

>summary(reg1)

Call:

lm(formula = data$wage ~ data$IQ)

Residuals:

Min     1Q Median     3Q    Max

-898.7 -256.5  -47.3  201.1 2072.6

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 116.9916    85.6415   1.366    0.172

data$IQ       8.3031     0.8364   9.927   <2e-16 ***

Signif.codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 384.8 on 933 degrees of freedom

Multiple R-squared:  0.09554, Adjusted R-squared:  0.09457

F-statistic: 98.55 on 1 and 933 DF,  p-value: < 2.2e-16

The regression provides the best fitting line for the average wage as a linear function of IQ. The intercept 8.3031 indicates the monthly earnings of a person with an IQ of zero. Since an IQ of zero lies outside the data set, the intercept is merely the extension of the OLS line to the vertical axis, which represents wage here. The t statistic of 9.927 and its associated p value, 2.2e-16, indicate that the regression line is statistically significantly positive, or does not go through the origin.

The slope on IQ is 116.9916, which indicates the change in the average monthly wage with each 1 unit increase in IQ. The t statistic of 1.366 has an associated p value of 0.172, which implies that there is only very weak sample evidence of a positive correlation between IQ and wage.

  1. Are the coefficients statistically significant? Explain.

As discussed in a, the constant is significantly different than 0, but the intercept is not statistically significantly different than zero at conventional levels of statistical significance such as 0.1 or 0.05.

  1. Do the results make sense? Explain.

Most people would anticipate that there would be a statistically significantly positive relationship between IQ and wage, and the results do not support that prior belief. However, there are no other variables in the regression, which implies that a positive relationship between IQ and wage may be masked by other variables that are omitted.

2) Run the regression: 

  1. Interpret the regression.

The R script and its results follow:

> reg2 <- lm(data$wage~data$IQ+data$educ)

>summary(reg2)

Call:

lm(formula = data$wage ~ data$IQ + data$educ)

Residuals:

Min      1Q  Median      3Q     Max

-860.29 -251.00  -35.31  203.98 2110.38

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -128.8899    92.1823  -1.398    0.162

data$IQ        5.1380     0.9558   5.375 9.66e-08 ***

data$educ     42.0576     6.5498   6.421 2.15e-10 ***

Signif.codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 376.7 on 932 degrees of freedom

Multiple R-squared:  0.1339,   Adjusted R-squared:  0.132

F-statistic: 72.02 on 2 and 932 DF,  p-value: < 2.2e-16

The regression provides the best fitting line for the average wage as a linear function of IQ and education. The estimated intercept, -128.8899 has a t statistic -1.398 associated with a p value of 0.162. Therefore at conventional levels of significance, there is only very weak evidence that the best fitting line for the average monthly wage does not go through the origin.

The estimated coefficient on IQ, 5.138,indicates that the average monthly wage increases by 5.138 with each one point increase in IQ, holding the years of education constant. The t statistic for IQ is 5.375 with an associated p value of 9.66e-08, which implies that the null hypothesis that the slope is zero can be overwhelmingly rejected.

The estimated coefficient on educ (i.e., years of education) implies that the average monthly wage increases by 42.0576 with each additional year of education, holding IQ constant. The t statistic 6.421, associated with a p value of 2.15e-10, indicates that the null hypothesis that this slope is equal to zero can be overwhelmingly rejected.

The conclusion is that the average monthly wage is positively related to both IQ and years of education when both of these variables are included in the regression model.

  1. Explain in plain English the interpretation of 

is the estimated coefficient on IQ (5.138) and indicates that the average monthly wage increases by 5.138 with each one point increase in IQ, holding the years of education constant.

  1. Are the coefficients statistically significant? Explain.

As discussed in part a, the intercept is not statistically significantly different than zero (p value = 0.162) but both slopes are statistically significantly different than zero.

  1. How did the coefficient on IQ change from (1)? Based on our discussion in class, explain why you think this happened (e.g., explain how and why (1) was biased).

In question 1, the wage is regressed on IQ only and the estimated coefficient is 8.3031. But when years of education are added to the model, the effect of IQ declines to 5.1380. It is very likely that IQ and years of education are positively correlated, because a higher IQ makes additional education both easier and more rewarding. But because both IQ and education are positively correlated with the monthly wage and IQ and years of education are positively correlated, when either variable is omitted from the regression equation, the variable that is include in the regression equation will pick up some of the effect on wage of the other variable. Thus when years education is omitted in question 1, the estimated effect of IQ on the average monthly wage includes part of the effect of years education on the average monthly wage.

3)  Use the two step regression method that we discussed in class to estimate the relationship between wage and IQ, net the effects of education (e.g., show how you estimate the same).

  1. Explain your steps and provide results from each step.

To net out the effect of years of education on the average monthly wage so that the effect of IQ on the average monthly wage can be estimated without years of education being included, the following steps are undertaken:

Step 1: Estimate the regression of monthly wage on years education and save  the residuals as y_resids. The residuals from this regression, y_resids, are by construction and definition of the least squares estimator uncorrelated with years education – they are the part of the variation in monthly wages that cannot be explained by years of education. The R commands that do this follow:

> reg3_educ <- lm(data$wage~data$educ)

>y_resids<- residuals(reg3_educ)

Step 2: Estimate the regression of IQ on years education and save the residuals as IQ_resids.The residuals from this regression, IQ_resids, are by construction and definition of the least squares estimator uncorrelated with years education – they are the part of the variation in IQ that cannot be explained by years of education. The R commands that do this follow:

> reg3_IQbyEDUC <- lm(data$IQ~data$educ)

>IQ_resids<- residuals(reg3_IQbyEDUC)

Step 3: Regress the residuals from step 1, y_resids, on the residuals from step 2, IQ_resids. Because both of these variables are uncorrelated with years education by construction, the estimated coefficients are the effects of IQ on monthly wage, controlling for years education. The R commands to accomplish this and their results follow:

> reg3 <- lm(y_resids~IQ_resids)

>summary(reg3)

Call:

lm(formula = y_resids ~ IQ_resids)

Residuals:

Min      1Q  Median      3Q     Max

-860.29 -251.00  -35.31  203.98 2110.38

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 2.702e-14  1.231e+01   0.000        1

IQ_resids   5.138e+00  9.553e-01   5.378 9.51e-08 ***

Signif.codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 376.5 on 933 degrees of freedom

Multiple R-squared:  0.03007,  Adjusted R-squared:  0.02903

F-statistic: 28.93 on 1 and 933 DF,  p-value: 9.509e-08

Observe that as claimed, the estimated coefficient on IQ_resids, 5.138, is exactly the same as the estimated coefficient on IQ in question 2, part a, where monthly wage was regressed on IQ and years education.

4) Run the regression:

  1. Interpret the regression.

The R commands to perform the regression and the regression results follow:

> reg4 <- lm(data$wage~data$IQ+data$educ+data$exper)

>summary(reg4)

Call:

lm(formula = data$wage ~ data$IQ + data$educ + data$exper)

Residuals:

Min      1Q  Median      3Q     Max

-922.10 -240.77  -44.92  191.56 2128.11

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -539.4111   116.7171  -4.622 4.35e-06 ***

data$IQ        5.0688     0.9408   5.388 9.03e-08 ***

data$educ     58.1038     7.0562   8.234 6.07e-16 ***

data$exper    17.4171     3.1155   5.590 2.98e-08 ***

Signif.codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 370.8 on 931 degrees of freedom

Multiple R-squared:  0.162,    Adjusted R-squared:  0.1593

F-statistic: 59.99 on 3 and 931 DF,  p-value: < 2.2e-16

The regression provides the best fit in the least squares sense to monthly wages as a linear function of IQ, years education, and years of experience. The estimated intercept is negative, -539.4111, with a t statistic equal to 4.622 and n associated p value of 4.35e-06, which implies that the intercept is statistically significantly different than 0. If it was possible for all other independent variables to be equal to 0, the intercept would be the average monthly wage. Because it is not possible for the other variables to all simultaneously be equal to 0, the intercept is merely the extension of the “line” to the dependent variable monthly wage’s axis.

The other estimated coefficients provide the slopes for each of the independent variables, holding all other independent variables constant. The estimated coefficient on IQ, 5.0688, implies that the average monthly wage increases by 5.0688 for each 1 point increase in IQ, holding all other variables constant. The t statistic for the estimate, 5.388, and its associated p value, 9.03e-08, imply that the sample provides overwhelming evidence of a positive relationship between average monthly wage and IQ, controlling for years of education and years of experience. The estimated coefficient on years of education, 58.1038, implies that the average monthly wage increases by 58.1038 for each 1 year increase in years of education, holding all other variables constant. The t statistic for the estimate, 8.234, and its associated p value, 6.07e-16, imply that the sample provides overwhelming evidence of a positive relationship between average monthly wage and years of education, controlling for IQ and years of experience. The estimated coefficient on years of experience, 17,4171, implies that the average monthly wage increases by 17.4171 for each 1 year increase in experience, holding all other variables constant. The t statistic for the estimate, 5.590, and its associated p value, 2.98e-08, imply that the sample provides overwhelming evidence of a positive relationship between average monthly wage and years of experience, controlling for IQ and years of education.

  1. Explain in plain English the interpretation of .

The estimated coefficient on IQ, 5.0688, implies that the average monthly wage increases by 5.0688 for each 1 point increase in IQ, holding years of education and years of experience constant.

  1. Are the coefficients statistically significant? Explain.

All of the coefficients are statistically significant at conventional levels. The p value is the probability of obtaining a t statistic as extreme (i.e., as large or larger in absolute value) as the t statistic obtained if the null hypothesis is true. For example, the p value for IQ, 9.3e-08 = 0.000000093, is the probability of obtaining a t statistic as large or larger than 5.388 in absolute value if the null hypothesis that the population slope on Q is zero is true – clearly rather than believing that a miraculously rare event has occurred, the null hypothesis should be rejected. The p values on the remaining variables are similarly very low probabilities if the null hypothesis is true.

  1. How did the coefficient on IQ change from (2)? Based on our discussion in class, explain why you think this happened.

The estimated coefficient on IQ in question 2 is 5.138, and it decreases to 5.068 when years of experience is added to the same regression. Clearly in our sample years of experience is positively correlated with IQ, and when years of experience is omitted, IQ gets attributed some of the effect of years of experience on monthly wage through its positive correlation with years of experience and years of experience positive correlation with monthly wage.

5) Run the regression: 

  1. Interpret the regression.

The R commands that estimate the regression and display the results follow, with the results:

> reg5 <- lm(data$wage~data$IQ+data$educ+data$exper+data$KWW)

>summary(reg5)

Call:

lm(formula = data$wage ~ data$IQ + data$educ + data$exper + data$KWW)

Residuals:

Min      1Q  Median      3Q     Max

-858.21 -235.69  -38.74  183.98 2239.67

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -536.8475   115.3662  -4.653 3.74e-06 ***

data$IQ        3.7950     0.9671   3.924 9.35e-05 ***

data$educ     47.4723     7.3190   6.486 1.43e-10 ***

data$exper    13.7330     3.1740   4.327 1.68e-05 ***

data$KWW       8.7356     1.8234   4.791 1.93e-06 ***

Signif.codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 366.5 on 930 degrees of freedom

Multiple R-squared:  0.1822,   Adjusted R-squared:  0.1787

F-statistic: 51.79 on 4 and 930 DF,  p-value: < 2.2e-16

The regression model finds the linear function IQ, years of experience, years of education and KWW that best fits monthly wages in the sense of minimizing the sum of squares. The estimated coefficient on the intercept, -536.8475, has a p value of 3.74e-06, and is therefore statistically significantly different than zero.  The intercept indicates the value of the average monthly wage when all of the other included independent variables are equal to zero. The estimated coefficient on IQ is 3.795, which has a p value of 9.35e-05, which implies it is also statistically significantly different than zero. The estimated coefficient 3.795 indicates that the average monthly wage increase by 3.795 for each one point increase in IQ, holding years of education and experience and KWW constant. The estimated coefficient on years of education is 47.4723, which has a p value of 1.43e-10, which implies it is also statistically significantly different than zero. The estimated coefficient 47.4723 indicates that the average monthly wage increase by 47.4723 for each one year increase in years of education, holding IQ, years of experience and KWW constant. The estimated coefficient on years of experience is 13.733, which has a p value of 1.68e-05, which implies it is also statistically significantly different than zero. The estimated coefficient 13.733 indicates that the average monthly wage increase by 13.733 for each one year increase in years of experience, holding IQ, years of education and KWW constant. The estimated coefficient on KWW is 8.7356, which has a p value of 1.93e-06, which implies it is also statistically significantly different than zero. The estimated coefficient 8.7356 indicates that the average monthly wage increase by 8.7356 for each one unit increase in KWW, holding IQ and years of education and experience constant.

  1. Explain in plain English the interpretation of .

The estimated coefficient on IQ of 3.795 indicates that the average monthly wage increase by 3.795 for each one point increase in IQ, holding years of education and experience and KWW constant.

  1. Are the coefficients statistically significant? Explain.

Each of the estimated coefficients is statistically significant, with p values less than 0.0001, implying less than a 1 in ten thousand chance of obtaining a t statistic as extreme as was obtained if the null hypothesis of a zero population slope coefficient (or intercept) was true.

  1. How did the coefficient on IQ change from (2)? Based on our discussion in class, explain why you think this happened.

The estimated coefficient on IQ has decreased from 5.128 in question 2 to 3.795 here, which is consistent with positive correlation between IQ and years of experience and IQ and KWW. In short, the estimated coefficient on IQ in question 2 includes some of the effects of years of experience and KWW that are more properly attributed to years of experience and KWW when those variables are included in the regression in this question.

  1. Write a function to calculate the R2 that uses only the regression object as the input.
  2. E.g.,

mylastname.rsquare<- function(reg){ …your code here…

}

Hint: recall the equations for SST, SSE and SSR and how we accessed the objects that are saved within the regression object.

The R commands for the function and the output from an invocation of the function follow:

kloepfer.rsquare<- function(depvar, resids){

SST <- sum((depvar-mean(depvar))^2)

SSE <- sum(resids^2)

R_squared<- 1 – (SSE/SST)

return (R_squared)

}

>rsquare<- kloepfer.rsquare(data$wage, residuals(reg5))

>rsquare

[1] 0.1821676

  1. Interpret the R2.

The R2 provides the proportion of the total variation in monthly wages that is explained by the regression. Thus the R2 of 0.1822 implies that 18.22% of the variation in monthly wages is explained by the regression model.

  1. Calculate the adjusted R2 using a function similar to the one you wrote for part (e). Explain why the R2 and adjusted R2 differ.

The R commands for the adjusted R squared function along with its output follow:

kloepfer.adjrsquare<-function(Rsquare,n,K){

Adj_Rsquared<- 1 – (n-1)/(n-K)*(1-Rsquare)

return (Adj_Rsquared)

}

>adjrsquare<- kloepfer.adjrsquare(kloepfer.rsquare(data$wage, residuals(reg5)), 935,5)

>adjrsquare

[1] 0.17865

The adjusted R2 will never be higher than the raw R2 because the adjusted R2 adjusts the raw R2 by accounting for the number of variables in the model, with more variables leading to more of a penalty in adjusted R2 because additional variables should lead to a larger value of raw R2. Letting n be the number of observations in the sample and K be the number of independent variables, the relationship between R2 and adjusted R2 is given by

Because  is the unexplained variation in the dependent variable and  for all K> 1, adjusted R2 is less than R2 and increases as dependent variables are added to the model, all other factors held constant.

  1. Perform an F-test on the joint significance of the regression.
  2. What are H0 and H1?

The null hypothesis, H0 is that all population slope coefficients  are simultaneously equal to zero, or equivalently, that the model with the independent variables does no better at explaining monthly wage than the sample mean of monthly wage. The alternative hypothesis is that at least one population slope coefficient is not equal to zero, or that the model does a statistically significantly better job of explaining the monthly wage than the sample mean of the monthly wage.

  1. What is the critical value for your test? Explain.

The F statistic has 4 and 930 degrees of freedom. The value of the F statistic that puts 0.05 in the right tail of the distribution is the critical value, and can be shown to be 2.381. In other words, a value of  F greater than or equal to 2.381 would be expected to happen 5 times in 100 trials if the null hypothesis is true. Given the rarity of the outcome that F is greater than 2.381, the null hypothesis is rejected if F is greater than 2.381.

     iii. Draw the F-distribution and highlight the relevant rejection region for this test.

  1. What is your conclusion? Explain.

Here the F statistic is 51.70, so clearly the null hypothesis is rejected and the conclusion is that the regression model does a better job of explaining the monthly wage than the sample mean of the monthly wage, or equivalently, that at least one of the population slope coefficients is different than zero.

In Stata, open the wage.dta data. These are different data with which we will examine the relationship between wage, education experience and job tenure.

6) Run the regression: .

  1. Interpret the regression.

The Stata command and its results follow:

. regress wage Educ, cformat(%9.3f) pformat(%5.3f) sformat(%8.3f)

Source |       SS       df       MS              Number of obs =     505

————-+——————————           F(  1,   503) =   94.26

Model |  593.593502     1  593.593502           Prob> F      =  0.0000

Residual |  3167.68001   503  6.29757457           R-squared     =  0.1578

————-+——————————           Adj R-squared =  0.1561

Total |  3761.27351   504  7.46284427           Root MSE      =  2.5095

——————————————————————————

wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

————-+—————————————————————-

Educ |      0.396      0.041    9.709   0.000        0.316       0.476

_cons |      0.474      0.520    0.911   0.363       -0.548       1.495

——————————————————————————

The regression model finds the best fit of the monthly wage data to a linear function of years of education, Educ. The estimated coefficient for the intercept, 0.474,  would be the average wage for a person with 0 years of education if such a person existed. In the absence of such a person, it is just the extension of the line to the vertical wage axis when the independent variable Educ is equal to zero. The estimate of the intercept, 0.474, has a t statistic of 0.911 and an associated p value of 0.363,  which implies that there is only very weak evidence that the population intercept is different than zero. The estimated coefficient on Educ, 0.396, has a t statistic of 9.709 and an associated p value less than 0.001, which implies that we can reject the null hypothesis and conclude that the population slope coefficient for Educ is different than zero. The estimate 0.396 implies that each 1 year increase in the number of years of school completed raises the average wage by 0.396.

  1. Are the coefficients statistically significant? Explain. 

            As discussed in part a, the estimated intercept if not statistically significantly different than zero, but the slope is statistically significantly different than zero. This is indicated by the p value, which provides the probability of obtaining a t statistic as extreme as the t statistic obtained if the null hypothesis that the population parameter being estimated is equal to zero were true. For the slope, the p value 0.363 implies that there is a 36.3% chance of obtaining a t statistic this large even when the population intercept is zero. For the slope, the p value of 0.000 implies that there is less than 1 chance in a thousand (i.e., less than 0.001 probability) of obtaining a t statistic this large if the population slope is equal to zero. Therefore we are not able to reject  the null hypothesis that the intercept is equal to zero, but can reject the null hypothesis that the slope is equal to zero.

  1. Do the results make sense? Explain.

The estimate for Educ implies that the wage rises with education because it is positive. However, the estimate is not statistically significant than zero, which is a bit surprising.

7) Run the regression:

  1. Interpret the regression.

The Stata command and its results follow:

. regress wage Educ Tenure, cformat(%9.3f) pformat(%5.3f) sformat(%8.3f)

Source |       SS       df       MS              Number of obs =     505

————-+——————————           F(  2,   502) =   77.31

Model |  885.698074     2  442.849037           Prob> F      =  0.0000

Residual |  2875.57544   502  5.72823792           R-squared     =  0.2355

————-+——————————           Adj R-squared =  0.2324

Total |  3761.27351   504  7.46284427           Root MSE      =  2.3934

——————————————————————————

wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

————-+—————————————————————-

Educ |      0.431      0.039   10.999   0.000        0.354       0.508

Tenure |      0.121      0.017    7.141   0.000        0.088       0.154

_cons |     -0.515      0.515   -1.000   0.318       -1.527       0.496

——————————————————————————

The regression model finds the linear combination of years of education (Educ) and years of tenure (Tenure) that best fits, in the least sum of squares sense, the wage data. The estimated constant,

-0.515, is the extension of the best fit line to the wage axis where Educ and Tenure both equal zero. The t statistic for the estimated intercept, -1.0, has an associated p value of 0.318, which implies that there is only weak evidence that the intercept is different than zero. The estimated coefficient ion Educ, 0.431, implies that each additional year of school completed adds 0.431 to the average wage, holding Tenure constant. The t statistic for the estimate, 10.999, is associated with  a p value less than 0.001, which implies that we can reject the null hypothesis that the population parameter is equal to zero. The estimated coefficient ion Tenure, 0.121, implies that each additional year of tenure adds 0.121 to the average wage, holding education constant. The t statistic for the estimate, 7,141, is associated with  a p value less than 0.001, which implies that we can reject the null hypothesis that the population parameter is equal to zero.

  1. Explain in plain English the interpretation of .

The estimated coefficient ion Educ, 0.431, implies that each additional year of school completed adds 0.431 to the average wage, holding Tenure constant.

  1. Are the coefficients statistically significant? Explain.

As discussed in part a, the estimated slopes on Educ and Tenure are statistically significant but the estimated intercept is not statistically significantly different than zero. The p value for an estimate indicates the probability of obtaining as extreme a t statistic as was obtained if the corresponding population parameter is equal to zero. The p values for Educ and Tenure, both less than 0.001, indicate that we would be very unlikely to obtain such large t statistics if the corresponding population parameters were equal to zero.

  1. How did the coefficient on educchange from (6)? Based on our discussion in class, explain why you think this happened.

The coefficient on Educ increased from 0.396 to 0.431 when Tenure was added to the model. This may be explained by negative correlation between Educ and Tenure (i.e., newer workers tend to be more educated than longer tenured workers) and positive correlation between Tenure and wage. When Tenure is omitted, some of the positive effect of Tenure on wages is picked up by Educ, but since Educ tends to be lower when Tenure is higher, this makes the coefficient on Educ smaller when Tenure is not in the equation.

8) DON’T run, but examine the regression:

  1. Use a scatter plot and LOESS (lowess in Stata) fit line to explain your expectations for the signs of and

To examine the relationship between the wage and tenure and tenure squared, we first regress the wage on education and save the residuals as y_resids, then regress tenure on education and save the residuals as tenure_resids. The rationale for this procedure is similar to the reasoning given for question 3, part a. By fosuing on these residuals, we can examine the relationship between the wage and tenure iina two dimensional graph while abstracting from the effects of education. The Stata code to do this and the resulting graph follows:

regress wage Educ

predictresids_y, residuals

regress Tenure Educ

predictresids_Tenure, residuals

lowessresids_yresids_Tenure, addplot((scatter resids_yresids_Tenure)) xtitle(“Residuals from Tenure on Education regression”) ytitle(“Residuals from wage on Tenure regression”)

The best quadratic fit of the wage on tenure appears to have a u shape, which implies a negative linear term and a positive square term. In other words, at low levels of tenure, the negative linear term outweighs the positive square term and the lowess line is decreasing. But as tenure increases, the positive square term dominates the negative linear term and the relationship between wage and tenure is a positive one. Finally, observe that the lowess curve actually suggests a cubic relationship between the wage and tenure may fit better than a quadratic, as the wage reaches a local minimum at a value of the tenure residual of about -5 and then a local maximum at a value of the tenure residual of about 15.

Appendix 

A1 R script for results in text 

## The working directory is “C:/R”

## Load the data file

setwd(“C:/R”)

load(file=”IQ.RDATA.RData”)

View(data)

## Question 1. Run the regression of wage on IQ.

reg1 <- lm(data$wage~data$IQ)

summary(reg1)

## Question 2. Run the rgeression of wage on IQ and education.

reg2 <- lm(data$wage~data$IQ+data$educ)

summary(reg2)

## Question 3. Provide the steps to estimate the effect of IQ

## on the  wage without the effect of years education.

## Step 1: Regress monthly wage on years eductaion and

## save the residuals.

reg3_educ <- lm(data$wage~data$educ)

y_resids<- residuals(reg3_educ)

## Step 2: Regress IQ on education and save the residuals.

reg3_IQbyEDUC <- lm(data$IQ~data$educ)

IQ_resids<- residuals(reg3_IQbyEDUC)

## Step 3: Regress the residuals from step 1 on the

## resdiuals from step 2.

reg3 <- lm(y_resids~IQ_resids)

summary(reg3)

## Question 4. Estimate the regression of monthly wage

## on IQ, years education and experience.

reg4 <- lm(data$wage~data$IQ+data$educ+data$exper)

summary(reg4)

## Question 5, estimate the regression of monthly wage on

## IQ, years education and experience and KWW.

reg5 <- lm(data$wage~data$IQ+data$educ+data$exper+data$KWW)

summary(reg5)

## Question 5e, write a function to compute R squared.

## R squared is defined as the regression sum of squares

## divided by the total sum of squares, or equivalently as

## 1- (error sum of squares/total sum of squares). We will also

## compute the adjusted R squared in this function,

## where adjusted R squared is given by

## adjusted R squared = 1 – [(n-1)/(n-K)]*(1 – R squared)

kloepfer.rsquare<- function(depvar, resids){

SST <- sum((depvar-mean(depvar))^2)

SSE <- sum(resids^2)

R_squared<- 1 – (SSE/SST)

return (R_squared)

}

kloepfer.adjrsquare<-function(Rsquare,n,K){

Adj_Rsquared<- 1 – (n-1)/(n-K)*(1-Rsquare)

return (Adj_Rsquared)

}

rsquare<- kloepfer.rsquare(data$wage, residuals(reg5))

rsquare

adjrsquare<- kloepfer.adjrsquare(kloepfer.rsquare(data$wage, residuals(reg5)), 935,5)

adjrsquare

A2 Stata commands to produce results in text

/* Allow the command window to scroll uninhibited. */

set more off

/* Question 6, Regress wage on education.          */

regress wage Educ, cformat(%9.3f) pformat(%5.3f) sformat(%8.3f)

/* Question 7, add tenure to the regression in    */

/* question 6.                                    */

regress wage Educ Tenure, cformat(%9.3f) pformat(%5.3f) sformat(%8.3f)

/* Question 8, lowess graph of wage on tenure,    */

/* tenure squared.                                */

regress wage Educ

predictresids_y, residuals

regress Tenure Educ

predictresids_Tenure, residuals

/* Plot of resids_y on resids_Tenure.             */

lowessresids_yresids_Tenure, addplot((scatter resids_yresids_Tenure)) xtitle(“Residuals from Tenure on Education regression”) ytitle(“Residuals from wage on Tenure regression”)