Rsquared Value Regression
Documentation for CollegeDistance Data
These data are taken from the HighSchool and Beyondsurvey conducted by the
Department of Education in 1980, with a followup in 1986. The survey included
students from approximately 1100 high schools.
The data used here were supplied by Professor Cecilia Rouse of Princeton University and
were used in her paper “Democratization or Diversion? The Effect of Community
Colleges on Educational Attainment,” Journal of Business and Economic Statistics, April
1995, Vol. 12, No. 2, pp 217224.
The data in CollegeDistanceexclude students in the western states. The data in
CollegeDistanceWestincludes only those students in the western states.Series in Data Set
Name  Desrciption 
ed  Years of Education Completed (See below) 
female  1 = Female/0 = Male 
black  1 = Black/0 = NotBlack 
Hispanic  1 = Hispanic/0 = NotHispanic 
bytest  Base Year Composite Test Score. (These are achievement tests given to high school seniors in the sample) 
dadcoll  1 = Father is a College Graduate/ 0 = Father is not a College Graduate 
momcoll  1 = Mother is a College Graduate/ 0 = Mother is not a College Graduate 
incomehi  1 = Family Income > $25,000 per year/ 0 = Income ≤ $25,000 per year. 
ownhome  1= Family Owns Home / 0 = Family Does not Own Home 
urban  1 = School in Urban Area / = School not in Urban Area 
cue80  County Unempolyment rate in 1980 
stwmfg80  State Hourly Wage in Manufacturing in 1980 
dist  Distance from 4yr College in 10’s of miles 
tuition  Avg. State 4yr College Tuition in $1000’s 
Years of Education: Rouse computed years of education by assigning 12 years to all
members of the senior class. Each additional year of secondary education counted as a
one year. Student’s with vocational degrees were assigned 13 years, AA degrees were
assigned 14 years, BA degrees were assigned 16 years, those with some graduate
education were assigned 17 years, and those with a graduate degree were assigned 18
years.
Solution
Q1.
Yes, the last statement is true that comparing the average sales in the markets with increased marketing budget and average sales in the remaining markets will give an unbiased estimated of the true causal effect of increased marketing spending on sales because the selection of half of the markets to increase the marketing budget in those markets was done randomly and the number of regional markets is large.
Q2.
(a)
(b)
The estimated intercept is 13.95586
The estimated slope is 0.07337
The average value of years of completed schooling decreases by 0.07337 years if the colleges are built 1 unit (10 miles) closer to where the students go to high school.
(c)
Bob’s high school was 20 miles from the nearest college. Using the estimated regression, Bob’s years of completed education is 13.95586 – 0.07337 * 2 = 13.80912 years
If Bob lived 10 miles from the nearest college, then the prediction would increase by 0.07337 which means that the predicted years of completed education would be 13.88429 years
(d)
The Rsquared value for the regression model is 0.00745
Hence, the distance to college does not explain a large fraction of the variation in education attainment across individuals.
(e)
The standard error of the regression is 1.807 years.
Q3.




( 
( 

22.69  49.08  35.2208  64.6842  2278.231  4184.046  1240.507  
12.28  61.86  45.6308  51.9042  2368.432  2694.046  2082.173  
97.59  167.19  39.67917  53.4258  2119.891  2854.316  1574.437  
86.15  161.09  28.23917  47.3258  1336.441  2239.731  797.4507  
110.21  111.82  52.29917  1.9442  101.68  3.779914  2735.203  
80.72  190.05  22.80917  76.2858  1740.016  5819.523  520.2582  
95.96  156.04  38.04917  42.2758  1608.559  1787.243  1447.739  
17.48  29.51  40.4308  84.2542  3406.467  7098.77  1634.652  
18  88.01  39.9108  25.7542  1027.871  663.2788  1592.874  
8.72  12.15  49.1908  101.614  4998.487  10325.45  2419.738  
67.5  153.29  9.58917  39.5258  379.0196  1562.289  91.95218  
77.63  185.08  19.71917  71.3158  1406.288  5085.943  388.8457 