**Gauss Markov Assumption**** **

**Question 1**

The file Wage 2017.sav contains data collected from a sample of 500workers. The variables are:

Lwage_{i} |
Natural log of monthly earnings of worker i |

IQ_{i} |
IQ score of worker i |

educ_{i} |
Years of education of worker i |

exper_{i} |
Years of experience of worker i |

tenure_{i} |
Years with current employer of worker i |

white_{i} |
= 1 if the worker is white; = 0 otherwise |

married_{i} |
= 1 if the worker is married/living as married; = 0 otherwise |

- Estimate the following model of monthly earnings:

Report the results of your estimated model, indicating the level of significance using stars (** denotes strong significance p<0.01 and * denotes significance p<0.05).

- Interpret the estimated coefficients.

- Comment on the explanatory power of this model.

- Add dummy variables for white and married to the model in (a). Estimate the new model; report the results and interpret the estimated coefficients associated with the
__dummy variables only__. Test for their individual significance. - Using the Park test, investigate whether there isheteroskedasticity in the model you estimated in (d). Assume that tenure is the variable causing the problem of heteroskedasticity. Provide an intuitive explanation for your findings.
- Does the presence of heteroskedasticity violate one of the Gauss Markov assumptions? Explain your answer. What implications would heteroskedasticity have for the results you reported in (d)?

**Question 2**

Nitrogen Dioxide (NO_{2}) is a pollutant that attacks the human respiratory system and increases the likelihood of respiratory illness. One common cause of nitrogen dioxide is car exhaust.

The file ** pollution.sav** contains data from 500 observations made from October 2001 to August 2003 in the US (data from CarnegieMellonUniversity archive).

The variables are:

LNO:_{2} |
Natural log of the concentration of NO_{2} (particles) |

LCARS: |
Natural log of the number of cars per hour |

TEMP: |
Temperature 2 metres above the ground (degrees C) |

TCHNG23: |
Temperature difference between 25 metres and 2 metres above the ground (degrees C) |

WNDSPD: |
Wind Speed (metres per second) |

WNDDIR: |
Wind direction (degrees between 0 and 360) |

HOUR: |
Hour of Day |

DAYS: |
Number of the day in the sequence of 500 days |

- Estimate the following model:

Report the results of your estimated model, indicating the level of significance using stars (** denotes strong significance p<0.01 and * denotes significance p<0.05).

- Interpret the estimated coefficients associated with the explanatory variables and conduct an appropriate test of their individual significance.

- Calculate the value of wind direction that optimises LNO
_{2}.

- Is there evidence that temperature is an important determinant of pollution?

- Using the model you estimated in (a), conduct a Durbin Watson test for serial correlation.If serial correlation were detected, what implications would it have for your results?

- Create a one period lag of
*LNO*and estimate the following model:_{2}

Report the results of your estimated model.

(g) Using the model you estimated in (f), conduct a Breusch-Godfrey test for serial correlation.

**Solution**** **

**Question 1**

- Estimate the following model of monthly earnings:

Report the results of your estimated model, indicating the level of significance using stars (** denotes strong significance p<0.01 and * denotes significance p<0.05).

Coefficients^{a} |
||||||

Model | Unstandardized Coefficients | Standardized Coefficients | t | Sig. | ||

B | Std. Error | Beta | ||||

1 | (Constant) | 5.092 | .170 | 30.012 | .0005** | |

IQ | .005 | .001 | .192 | 4.013 | .0005** | |

educ | .061 | .010 | .304 | 5.928 | .0005** | |

exper | .020 | .005 | .191 | 3.860 | .0005** | |

tenure | .013 | .005 | .126 | 2.786 | .006* | |

a. Dependent Variable: Lwage |

- Interpret the estimated coefficients.

The coefficients of the model are all significant, four coefficients are significant under p<0.01, hence the model was under 90% level of significance. The model can be explained to mean for any unit change in the natural log of wage , there equates a change in IQ score by a factor 0.005, education by 0.061, year of experience by 0.020 and tenure by 0.013.

- Comment on the explanatory power of this model.

Model Summary |
||||

Model | R | R Square | Adjusted R Square | Std. Error of the Estimate |

1 | .427^{a} |
.182 | .176 | .38433 |

a. Predictors: (Constant), exper, IQ, tenure, educ |

The model is weak based on the r-squared value of 0.182, meaning only 18.2% of the variability can be explained by the model.

- Add dummy variables for white and married to the model in (a). Estimate the new model; report the results and interpret the estimated coefficients associated with the
__dummy variables only__.

Coefficients^{a} |
||||||

Model | Unstandardized Coefficients | Standardized Coefficients | t | Sig. | ||

B | Std. Error | Beta | ||||

1 | (Constant) | 4.979 | .169 | 29.377 | .000 | |

IQ | .004 | .001 | .149 | 3.030 | .003 | |

educ | .061 | .010 | .304 | 6.025 | .000 | |

tenure | .013 | .005 | .121 | 2.727 | .007 | |

exper | .018 | .005 | .169 | 3.450 | .001 | |

white | .118 | .053 | .095 | 2.220 | .027** | |

married | .183 | .052 | .144 | 3.530 | .000** | |

a. Dependent Variable: Lwage |

The coefficients with the dummy variables as statistically significant viewed on the basis of the p values represented by **.

Using the Park test, investigate whether there is heteroskedasticity in the model you estimated in (d). Assume that tenure is the variable causing the problem of heteroskedasticity. Provide an intuitive explanation for your findings.

The model displayed in (b) experience a heteroskedasticity characteristic with regards to the circumstance in which the variability of the variables is unequal across the range of values of a other variables variable that predicts it.

- Does the presence of heteroskedasticity violate one of the Gauss Markov assumptions? Explain your answer. What implications would heteroskedasticity have for the results you reported in (d)?

The error term in the model is heteroskedastic because the variance isn’t constant. Instead, the variance of the error term in the model depends on the value of the independent variable(s).

The presence of heteroscedacity implies that the estimated SE is wrong. Because of this, confidence intervals and hypotheses tests cannot be relied on.

**Question 2**

Nitrogen Dioxide (NO_{2}) is a pollutant that attacks the human respiratory system and increases the likelihood of respiratory illness. One common cause of nitrogen dioxide is car exhaust.

The file pollution.sav contains data from 500 observations made from October 2001 to August 2003 in the US (data from Carnegie Mellon University archive).

The variables are:

- Estimate the following model:

Coefficients^{a} |
||||||

Model | Unstandardized Coefficients | Standardized Coefficients | t | Sig. | ||

B | Std. Error | Beta | ||||

(Constant) | .768 | .188 | 4.094 | .0005** | ||

LCARS | .427 | .023 | .619 | 18.619 | .0005** | |

TEMP | -.024 | .004 | -.211 | -5.595 | .0005** | |

TCHNG23 | .121 | .025 | .171 | 4.737 | .0005** | |

WNDSPD | -.124 | .014 | -.295 | -8.781 | .0005** | |

WNDDIR | .005 | .001 | .545 | 3.500 | .001** | |

Winddir2 | -1.239E-005 | .000 | -.470 | -3.059 | .002** | |

a. Dependent Variable: LNO2 |

- Interpret the estimated coefficients associated with the explanatory variables and conduct an appropriate test of their individual significance.

The coefficients of the model are all significant, four coefficients are significant under p<0.01, hence the model was under 90% level of significance. The model can be explained to mean for any unit change in Natural log of the concentration of NO_{2}, there equates a change in LCARS score by a factor 0.427, TEMP by -0.024, TCHNG23 by 0.121 , WNDSPD by -0.124, WNDDR by 0.005 and WNDDR^{2} by (-1.239E-005).

- Is there evidence that temperature is an important determinant of pollution?

This variable carries a small weight in being a determinant of pollution based on the coefficient allotted to it of –0.024, which in essence is indirectly proportional to the determinant of pollution.

- Using the model you estimated in (a), conduct a Durbin Watson test for serial correlation. If serial correlation were detected, what implications would it have for your results?

Model Summary^{b} |
|||||

Model | R | R Square | Adjusted R Square | Std. Error of the Estimate | Durbin-Watson |

1 | .383^{a} |
.147 | .138 | .697 | 1.352 |

a. Predictors: (Constant), Winddir2, TCHNG23, WNDSPD, TEMP, WNDDIR | |||||

b. Dependent Variable: LNO2 |

Residuals in this set of variables are increasingly positive correlated with a value of 1.352 per the Durbin Watson.

- Create a one period lag of
*LNO*and estimate the following model:_{2 }