# Regression Slope Coefficient Estimation

Documentation for CPS08 Data

Each month the Bureau of Labor Statistics in the U.S. Department of Labor

conducts the “Current Population Survey” (CPS), which provides data on labor force

characteristics of the population, including the level of employment, unemployment, and

earnings. Approximately 65,000 randomly selected U.S. households are surveyed each

month. The sample is chosen by randomly selecting addresses from a database

comprised of addresses from the most recent decennial census augmented with data on

new housing units constructed after the last census. The exact random sampling scheme

is rather complicated.

The survey conducted each March is more detailed than in other months and asks

questions about earnings during the previous year. The file CPS08 contains the data for

2008 (from the March 2009 survey). These data are for full-time workers, defined as

workers employed more than 35 hours per week for at least 48 weeks in the previous

year. Data are provided for workers whose highest educational achievement is (1) a high

school diploma, and (2) a bachelor’s degree.

Series in Data Set:

FEMALE: 1 if female; 0 if male

YEAR: Year

AHE : Average Hourly Earnings

BACHELOR: 1 if worker has a bachelor’s degree; 0 if worker has a high school degree

**Solution**** **** **

Q1.

(a)

On regressing the average hourly earnings (AHE) on Age:

AHE = 1.08228 + 0.604986 * Age

(i)

The p-value for testing the null hypothesis that average hourly earnings do not differ by age against the alternative that they do, is 2.63e-051 (very low)

Hence, at level of significance = 5%, we reject the null hypothesis that average hourly earnings do not differ by age against the alternative that they do, since the p-value for testing this hypothesis is less than 5%.

Thus, the estimated regression slope coefficient is statistically significant at 5% level.

(ii)

The 95% confidence interval for the slope coefficient is 0.526861 – 0.683111

The lower end of the 95% confidence interval for the slope coefficient is 0.526861

(iii)

The 99% confidence interval for the slope coefficient is 0.502303 – 0.707669

The lower end of the 95% confidence interval for the slope coefficient is 0.502303

(b)

On running a regression of average hourly earnings (AHE) on Age, gender (Female) and education (Bachelor):

AHE = – 0.635698 + 0.585214 * age – 3.66403 * female + 8.08300 * bachelor

Where female = 1 if gender is female and 0 if gender is male, and bachelor = 1 if worker has a bachelor’s degree and 0 if worker has a high school degree.

(i)

Since the coefficient of the variable age in the regression model is 0.585214, the estimated change due to a unit increase in Age in this regression is an increase of 0.585214 units in average hourly earnings.

(ii)

Bob is a 30-year old male worker with a high-school diploma.

Hence, the variables, age = 30, female = 0, bachelor = 0

So, using the estimated regression line, the predicted Bob’s earnings is equal to

AHE = – 0.635698 + 0.585214 * 30 – 3.66403 * 0 + 8.08300 * 0

= 16.920722

(iii)

Susan is a 35-year old female worker with a bachelors degree

Hence, the variables, age = 35, female = 1, bachelor = 1

So, using the estimated regression line, the predicted Susan’s earnings is equal to

AHE = – 0.635698 + 0.585214 * 35 – 3.66403 * 1 + 8.08300 * 1

= 24.265762