Regression Slope Coefficient Estimation

Regression Slope Coefficient Estimation

Documentation for CPS08 Data

Each month the Bureau of Labor Statistics in the U.S. Department of Labor
conducts the “Current Population Survey” (CPS), which provides data on labor force
characteristics of the population, including the level of employment, unemployment, and
earnings. Approximately 65,000 randomly selected U.S. households are surveyed each
month. The sample is chosen by randomly selecting addresses from a database
comprised of addresses from the most recent decennial census augmented with data on
new housing units constructed after the last census. The exact random sampling scheme
is rather complicated.
The survey conducted each March is more detailed than in other months and asks
questions about earnings during the previous year. The file CPS08 contains the data for
2008 (from the March 2009 survey). These data are for full-time workers, defined as
workers employed more than 35 hours per week for at least 48 weeks in the previous
year. Data are provided for workers whose highest educational achievement is (1) a high
school diploma, and (2) a bachelor’s degree.
Series in Data Set:
FEMALE: 1 if female; 0 if male
YEAR: Year
AHE : Average Hourly Earnings
BACHELOR: 1 if worker has a bachelor’s degree; 0 if worker has a high school degree

Solution  

Q1.

(a)

On regressing the average hourly earnings (AHE) on Age:

AHE = 1.08228 + 0.604986 * Age

(i)

The p-value for testing the null hypothesis that average hourly earnings do not differ by age against the alternative that they do, is 2.63e-051 (very low)

Hence, at level of significance = 5%, we reject the null hypothesis that average hourly earnings do not differ by age against the alternative that they do, since the p-value for testing this hypothesis is less than 5%.

Thus, the estimated regression slope coefficient is statistically significant at 5% level.

(ii)

The 95% confidence interval for the slope coefficient is 0.526861 – 0.683111

The lower end of the 95% confidence interval for the slope coefficient is 0.526861

(iii)

The 99% confidence interval for the slope coefficient is 0.502303 – 0.707669

The lower end of the 95% confidence interval for the slope coefficient is 0.502303

(b)

On running a regression of average hourly earnings (AHE) on Age, gender (Female) and education (Bachelor):

AHE = – 0.635698 + 0.585214 * age – 3.66403 * female + 8.08300 * bachelor

Where female = 1 if gender is female and 0 if gender is male, and bachelor = 1 if worker has a bachelor’s degree and 0 if worker has a high school degree.

(i)

Since the coefficient of the variable age in the regression model is 0.585214, the estimated change due to a unit increase in Age in this regression is an increase of 0.585214 units in average hourly earnings.

(ii)

Bob is a 30-year old male worker with a high-school diploma.

Hence, the variables, age = 30, female = 0, bachelor = 0

So, using the estimated regression line, the predicted Bob’s earnings is equal to

AHE = – 0.635698 + 0.585214 * 30 – 3.66403 * 0 + 8.08300 * 0

= 16.920722

(iii)

Susan is a 35-year old female worker with a bachelors degree

Hence, the variables, age = 35, female = 1, bachelor = 1

So, using the estimated regression line, the predicted Susan’s earnings is equal to

AHE = – 0.635698 + 0.585214 * 35 – 3.66403 * 1 + 8.08300 * 1

= 24.265762