Application of Econometrics
In a population, if observation is treated we have Di=1 and Di=0 otherwise. The outcome of interest is Y. We denote Y0 the outcome when an observation is not treated and Y1 when an observation is treated. Y0 is given by:
Y0 0.2 1.2 X
whereX is a variable that takes the value 1 with probability 0.6
and the value of 0 with probability 0.4.
Y1 is given by:
Y10.30.9X
whereX is distributed as above.
(a) Express the observed outcome as a function of the constants, D and X.
(b) Derive the average causal effect ofD.
(c) Consider thestatement:
In this example, the causal effect is constant for all individuals.
Is this statement TRUE or FALSE? Explain.
Suppose that treatment is a matter of personal choice, and all those who have X 1 choose
to be treated (i.e. have D=1), while all those who have X 0 choose not be treated (i.e. have
Di=0).
(d) Derive the observed average difference between treated and untreated,i.e.
E(Y1 | D 1) E(Y0 | D 0) .
(e) Derive the selection bias of the result in(d).
(f) Discuss the relationship between your answers in (b), (d) and(e).
2. [30%]
A recent report on the effects of postgraduate education on earnings included results from OLS regressions of log annual earnings at age 35 on educational attainment indicators and various personal characteristics. The estimates of the coefficient of the Master’s degree indicator (i.e. a variable taking the value 1 if a person’s highest educational attainment is a Master’s degree and zero otherwise) are given in the table below. Note that although the overall sample size was large, it was not large enough to allow estimates to vary by subject and university.
Dependent variable: log earnings | ||||
(1) | (2) | (3) | (4) | |
Women | 0.069 ∗∗∗ (0.003) | 0.056 ∗∗∗ (0.003) | 0.030 ∗∗∗ (0.003) | 0.015 ∗∗∗ (0.003) |
Men | 0.071 ∗∗∗ (0.003) | 0.023 ∗∗∗ (0.003) | 0.009 ∗∗∗ (0.003) | -0.023 ∗∗∗ (0.003) |
Controls | ||||
Age and year | x | x | x | x |
Background | x | x | x | x |
UG degree class | x | x | x | |
UG subject | x | x | ||
UG university |
Standard Errors
Standard errors are given in parentheses. ∗p< 0.1, ∗∗p< 0.05, ∗∗∗p< 0.01
The estimates are from separate regressions on a sample of women and a sample of men with postgraduate degrees.
‘x’ in a cell indicates that the corresponding variables are included in the regression. UG stands for undergraduate. Age and year controls include tax year of earnings, year graduated from the undergraduate degree and the age started undergraduate degree.
Background variables include: ethnicity, region of domicile when applying for university, year graduated from undergraduate degree, and a variable measuring the proportion of university participation rates of people from the local area an individual lived in when they applied to university for their undergraduate degree. The Master’s indicator took the value of 1 if the highest degree attained was a Master’s degree and zero otherwise. Other postgraduate degrees included in the regressions were Post Graduate Certificate in Education (PGCE) and PhD. Estimated coefficients for these degrees are not reported in the table.
(a) The regressions in the above table provide a range of estimates. In percentage terms, calculate the highest estimated effects of a Master’s degree on earnings at age 35 for men and women. Then calculate the lowest estimated effects for men and women. Explain.[10%]
(b) In your view, why do the estimates change in the way they do as controls are added? Explain.[25%]
(c) In your view, which if any, of these estimates can be interpreted as causal? Explain, particularly in light of the differential gender effects in column (4).[25%]
(d) Suppose you have a £1 million grant to research the effects of postgraduate education on earnings in the UK. Ethical considerations do not allow experiments but you can survey students and collect data from universities and tax authorities. Discuss what data you would like to collect and how you would such data to generate plausibly causal estimates in OLS regressions.[40%]
OLS Regressions
The following table presents results from 3 OLS regressions. Panel A shows the results from a regression of a university degree indicator on a female indicator. Panel B shows the results from a regression of a female indicator on the university degree indicator. Panel C shows the results from a regression of weekly earnings on the university degree indicator. All data are from the 2017 UK Labour Force Survey. The sample includes all those aged 25 to 45 with positive earnings.
Panel A | Dependent variable: university degree | |||
female | 0.024 (0.010) | TSS | 1982.24 | |
intercept | (0.300) (0.007) | RSS | 1980.97 | |
R squared | 0.0006 | |||
Panel B | Dependent variable: female | |||
Universitydegree | 0.027 (0.011) | TSS | 2298.84 | |
intercept | 0.526 (0.006) | RSS | 2297.37 | |
R squared | 0.0006 | |||
Panel C | Dependent variable: weekly earnings | |||
Universitydegree | 213.36 (9.154) | TSS | 16,246×10^{5} | |
intercept | 220.12 (5.111) | RSS | 15,344×10^{5} | |
R squared | 0.0555 | |||
N=9,239 |
Note: Standard errors in parentheses not heteroscedasticity robust.
Consider whether the statistics and estimates listed below can be derived using only the information provided in the table. If you believe the answer is yes, then derive them. Make sure you carefully explain your derivation. If you believe the answer is no, explain why and what additional information you would need to derive them.
(a) The sample correlation between the university degree and femaleindicators.
(b) The coefficients of the regression of weekly earnings on university degree andfemale.
(c) The omitted variable bias of the coefficient of university degree in the regression of weekly earnings on universitydegree.
4. [15%]
The conditional expectation function of Y on X is given by
E(Y | X ) 0 1 X.
We have random sample size N from the bivariate population (X,Y).
Let y YY
respectively.
andx X X, where Y and X are the sample means of Y andX
Using thissamplewerunan OLS regression ofyon xand obtain: yˆˆˆx
0 1
Using the same sample we run an OLS regression of Y on X and obtain: Yˆ ˆ0 ˆ1 X
Is the following statement TRUE or FALSE?
We must have ˆ ˆ and ˆ ˆ
0 0 1 1