SocioDemographic Factors
Data Assignment in 25 Easy Steps
This data stems from the Economic Action Plan conducted in 20092011 by the Canadian government to stimulate the economy. The file “Raw EAP Data” contains all 14,932 projects that were advertised on the government website (the data was scrapped). The file “EAP Data” contains information on the 308 Canadian ridings and originates from this website: http://www12.statcan.gc.ca/fedcef/index.cfm?Lang=E&SID=1. The objective of this assignment is to better understand the factors that influenced the allocation of projects and practice coding skills. Write a .do file with comments showing the different steps and a report with explanations and outputs (use .png files in your .doc document).
 Import the file “Raw EAP Data” into Stata.
 Browse through the data.
 Determine how many types of initiatives there are.
 Draw a kernel density of the amount spent by the federal government. What shape does it have?
 Is there an outlier? What is it? Where is it located?
 Drop the outlier identified in question 5 and all projects with a negative value.
 How many projects were undertaken in the riding of Halifax?
 List all the projects in the riding of SaintLeonard—SaintMichel.
 For each riding, create a variable that calculates the number of projects, the total value of projects and the total federal contribution. (Hint: use “egen”)
 For each riding, keep only one row with the total number of projects, the total value of projects and the total federal contribution. (Hint: use “duplicate drops ”)
 Draw a kernel density of the number of projects per riding. What shape does it have?
 Create a new variable with the logarithm value. Draw a kernel density to make sure the distribution is more or less normal.
 Save the file.
 Import the file “EAP Data” into Stata
 Browse through the data.
 Create dummies for regions. (Hint: group the prairies, the Atlantic provinces and the Northern territories together). Quebec, Ontario and BC are big enough to be regions.
 Determine how many projects each region received.
 Merge this file with the one you saved in step 11 (Hint: use “merge”).
 Conduct a ttest to determine whether ridings in Ontario received more projects than ridings in Quebec. (Hint: compare the log variables)
 Get the same result as in question 18 running a regression.
 The difference between the percentage of the vote obtained by the conservative candidate and the best nonconservative candidate could explain the amount of money received by a riding. Draw a scatter plot of both variables with a linear (Hint: lfit) and a quadratic (Hint: qfit) trend. What do you think?
 Conduct a regression with relevant sociodemographic control variables to confirm the intuition from question 19.
 Conduct a regression to determine whether ridings that voted conservative obtained more projects. (Hint: control for relevant sociodemographic factors)
 Explain why you are keeping or dropping certain sociodemographic factors from your regression.
 Explain in words the coefficients on a few significant variables.
Solution
/* Set more off and set working directory to data location. */
set more off
cd “C:\STATA”
/* 1. Import the file “Raw EAP Data” into Stata. */
/* 2. Browse through the data. */
import excel “C:\STATA\Raw Data EAP.xlsx”, sheet(“Sheet1″) firstrow
/* 3. Determine how many types of initiatives there are. */
tab1 Initiative_Description, missing
/* 4. Draw a kernel density of the amount spent by the */
/* federal government. What shape does it have? */
/* 5. Is there an outlier? What is it? Where is it located? */
kdensityFed_Amount, bwidth(1000) recast(connected)
/* Thetabstat command provides the mean and stddev of */
/* Fed_Amount, which are used to tag an outlier. */
tabstatFed_Amount, statistics( mean sd min p25 p50 p75 max ) columns(statistics)
/* 6. Drop the outlier identified in question 5 and all */
/* projects with a negative value. */
drop if Fed_Amount<0  Fed_Amount ==350000000
/* 7. How many projects were undertaken in the riding of */
/* Halifax? */
tab1 Riding_Name if Riding_Name==”Halifax”  Riding_Name== “Halifax West”
/* 8. List all the projects in the riding of */
/* SaintLeonard—SaintMichel. */
listProject_DescriptionTotal_Amount if Riding_Name ==”SaintLeonard–SaintMichel”
/* 9. For each riding, create a variable that calculates the */
/* number of projects, the total value of projects and the */
/* total federal contribution. (Hint: use “egen”). */
byRiding_Number, sort : egen float TotalNumberByRiding = count(Riding_Number)
byRiding_Number, sort : egen float TotalAmountByRiding = total(Total_Amount)
byRiding_Number, sort : egen float TotalFedAmountByRiding = total(Fed_Amount)
label variable TotalNumberByRiding “Total number by Riding”
label variable TotalAmountByRiding “Total amount by Riding”
label variable TotalFedAmountByRiding “Total federal amount by Riding”
/* 10. For each riding, keep only one row with the total */
/* number of projects, the total value of projects and the */
/* total federal contribution. */
/* (Hint: use “duplicate drops ”) */
duplicates drop TotalNumberByRidingTotalAmountByRidingTotalFedAmountByRiding, force
/* 11. Draw a kernel density of the number of projects per */
/* riding. What shape does it have? */
histogramTotalNumberByRiding, width(20) start(0) fcolor(navy8) lcolor(bluishgray8) kdensitykdenopts(lcolor(maroon) lwidth(thick)) xlabel(0(20)160)
/* 12. Create a new variable with the logarithm value. Draw a */
/* kernel density to make sure the distribution is more or */
/* less normal. */
generatelog_TotalValue = log( TotalFedAmountByRiding)
label variable log_TotalValue “log of total value by Riding”
/* 13. Save the file. */
save “C:\STATA\total projects and amounts by riding.dta”
/* 14. Import the file “EAP Data” into Stata. */
/* 15. Browse through the data. */
insheet using “C:\STATA\EAP Data.csv”, comma clear
drop v40 httpwww12statcancacensusrecensem
/* 16. Create dummies for regions. (Hint: group the prairies, */
/* the Atlantic provinces and the Northern territories */
/* together). Quebec, Ontario and BC are big enough to be */
/* regions. */
/* This is a series of generate and replace commands to create */
/* the regional dummy variables. */
generateBC_region = 1 if province==”British Columbia”
replaceBC_region = 0 if province!=”British Columbia”
generateOntario_region = 1 if province==”Ontario”
replaceOntario_region = 0 if province!=”Ontario”
generateQuebec_region = 1 if province==”Quebec”
replaceQuebec_region = 0 if province!=”Quebec”
generateAtlantic_region = 1 if province==”New Brunswick”  province==”Newfoundland and Labrador”  province==”Nova Scotia”  province==”Prince Edward Island”
replaceAtlantic_region = 0 if !(province==”New Brunswick”  province==”Newfoundland and Labrador”  province==”Nova Scotia”  province==”Prince Edward Island”)
generatePrairie_region = 1 if province==”Alberta”  province==”Saskatchewan”  province==”Manitoba”
replacePrairie_region = 0 if !(province==”Alberta”  province==”Saskatchewan”  province==”Manitoba”)
generateNorthern_region = 1 if province==”Northwest Territories”  province==”Nunavut”  province==”Yukon”
replaceNorthern_region = 0 if !(province==”Northwest Territories”  province==”Nunavut”  province==”Yukon”)
label variable BC_region “Dummy, 1 if British Columbia”
label variable Ontario_region “Dummy, 1 if Ontario”
label variable Quebec_region “Dummy, 1 if Quebec”
label variable Atlantic_region “Dummy, 1 if New Brunswick, Newfoundland and Labrador, Nova Scotia, or PEI”
label variable Prairie_region “Dummy, 1 if Alberta, Saskatchewan, Manitoba”
label variable Northern_region “Dummy, 1 if Northwest territories, Nunavut or Yukon”
/* 17. Determine how many projects each region received. */
/* 18. Merge this file with the one you saved in step 11 */
/* (Hint: use “merge”). */
/* Q18 */
renameriding_numberRiding_Number
merge 1:1 Riding_Number using “C:\STATA\total projects and amounts by riding.dta”, keepusing(TotalNumberByRidingTotalAmountByRidingTotalFedAmountByRidinglog_TotalValue) generate(_merge_totalsbyriding)
/* Q17 */
/* Create a categorical variable for region. */
generate region = 1 if BC_region==1
replace region = 2 if Prairie_region==1
replace region = 3 if Ontario_region==1
replace region = 4 if Quebec_region==1
replace region = 5 if Atlantic_region==1
replace region = 6 if Northern_region==1
label define REGIONS 1 “British Columbia” 2 “Prairie provinces” 3 “Ontario” 4 “Quebec” 5 “Atlantic provinces” 6 “Northern provinces”
label values region REGIONS
/* Save the current file, then collapse the file by region and */
/* take the sum of the number of projects by riding. */
save “C:\STATA\merged EAP data.dta”
collapse (sum) TotalProjectsByRegion = TotalNumberByRiding, by(region)
list
/* Reopen the saved file. */
use “C:\STATA\merged EAP data.dta”, clear
/* 19. Conduct a ttest to determine whether ridings in */
/* Ontario received more projects than ridings in Quebec. */
/* (Hint: compare the log variables) */
/* First generate the log of the number of projects. */
generatelog_NumberProjects = log( TotalNumberByRiding)
/* Conduct the t test on the log of the number of projects for */
/* each riding */
ttestlog_NumberProjects if region == 3  region ==4, by(region)
/* 20. Get the same result as in question 18 running a */
/* regression. */
regresslog_NumberProjectsi.region if region==3region==4, cformat(%9.5f) pformat(%5.4f) sformat(%8.3f)
/* 21. The difference between the percentage of the vote */
/* obtained by the conservative candidate and the best */
/* nonconservative candidate could explain the amount of */
/* money received by a riding. Draw a scatter plot of both */
/* variables with a linear (Hint: lfit) and a quadratic. */
/* (Hint: qfit) trend. What do you think? */
generatelog_FedFunds = log( TotalFedAmountByRiding)
twoway (scatter diff_cons_best_08 log_FedFunds) (lfit diff_cons_best_08 log_FedFunds) (qfit diff_cons_best_08 log_FedFunds), ytitle(% votes Conservative minus % votes best opponent) xtitle(log of federal funds received by riding)
/* 22. Conduct a regression with relevant sociodemographic */
/* control variables to confirm the intuition from question 19.*/
regresslog_NumberProjectsi.regionunemployment_ratepercentage_immigrantparticipation_ratehigh_school_lower downtown if region==3  region==4
/* 23. Conduct a regression to determine whether ridings that */
/* voted conservative obtained more projects. */
/* (Hint: control for relevant sociodemographic factors) */
regresslog_NumberProjects i.conservative_08 i.regionunemployment_ratepercentage_immigrantparticipation_ratehigh_school_lower downtown
/* 24. Explain why you are keeping or dropping certain */
/* sociodemographic factors from your regression. */
regresslog_NumberProjects i.conservative_08 i.regionunemployment_ratepercentage_immigrantparticipation_rate downtown *
Data Assignment in 25 Easy Steps
Econ 4410
This data stems from the Economic Action Plan conducted in 20092011 by the Canadian government to stimulate the economy. The file “Raw EAP Data” contains all 14,932 projects that were advertised on the government website (the data was scrapped). The file “EAP Data” contains information on the 308 Canadian ridings and originates from this website: http://www12.statcan.gc.ca/fedcef/index.cfm?Lang=E&SID=1. The objective of this assignment is to better understand the factors that influenced the allocation of projects and practice coding skills. Write a .do file with comments showing the different steps and a report with explanations and outputs (use .png files in your .doc document).
 Import the file “Raw EAP Data” into Stata.
 Browse through the data.
The Stata commands to import the data follow:
import excel “C:\STATA\Raw Data EAP (2).xlsx”, sheet(“Sheet1”) firstrow
 Determine how many types of initiatives there are.
The Stata command to provide a frequency count of each different initiative follows, along with its results:
tab1 Initiative_Description, missing
>tabulation of Initiative_Description
Initiative_Description  Freq.Percent Cum.
—————————————+———————————–
Aboriginal Skills and Employment Partne  20 0.13 0.13
Aboriginal Skills and Training Strategi  93 0.62 0.76
Accelerating Approval Processes for Bui  98 0.66 1.41
Accelerating the Federal Contaminated S  269 1.80 3.21
Addressing First Nations’ Housing Needs  355 2.38 5.59
Agricultural Flexibility Program  7 0.05 5.64
Arctic Research Infrastructure Fund  37 0.25 5.89
Blue Water Bridge Canadian Plaza and Br  1 0.01 5.89
Building a Small Craft Harbour in Pangn  1 0.01 5.90
Canada Cultural Spaces Fund  120 0.80 6.70
Champlain Bridge  1 0.01 6.71
Clean Energy Fund Program  11 0.07 6.78
Communities Component of the Building C  1,386 9.28 16.07
Community Adjustment Fund  1,120 7.50 23.57
Enhancing Rail Safety  154 1.03 24.60
Enhancing the Accessibility of Federal  194 1.30 25.90
Federal Economic Development Agency for  404 2.71 28.60
First Nations Schools  13 0.09 28.69
First Nations Water and Wastewater Proj  18 0.12 28.81
Green Infrastructure Fund  21 0.14 28.95
Helping Municipalities Build Stronger C  138 0.92 29.88
Housing for LowIncome Seniors  168 1.13 31.00
Housing for Persons with Disabilities  29 0.19 31.19
Improvements to National Capital Area B  2 0.01 31.21
Improving Parks Canada’s National Histo  131 0.88 32.09
Improving the Peace Bridge Plaza  1 0.01 32.09
Industrial Research Assistance Program  36 0.24 32.33
Infrastructure Stimulus Fund  3,859 25.84 58.18
Infrastructure at Ports of Entry  4 0.03 58.20
Infrastructure at Remote Ports of Entry  3 0.02 58.22
Institute for Quantum Computing  1 0.01 58.23
Investing in Federal Bridges  4 0.03 58.26
Investing in Federal Buildings  297 1.99 60.25
Investing in First Nations and Inuit He  39 0.26 60.51
Investing in InterCity Passenger Rail  1 0.01 60.51
Investing in Remote Rail Passenger Serv  2 0.01 60.53
Knowledge Infrastructure Program  508 3.40 63.93
Marquee Tourism Events Program  108 0.72 64.65
Modernizing Federal Laboratories  134 0.90 65.55
National Historic Sites of Canada Cost  79 0.53 66.08
National Recreational Trails  1 0.01 66.09
New Environmental Response Barges for t  4 0.03 66.11
New Inshore Fisheries Science Vessels  1 0.01 66.12
New Small Boats for the Coast Guard  7 0.05 66.17
Northern Housing  103 0.69 66.86
Policing Infrastructure in First Nation  18 0.12 66.98
Recreational Infrastructure Canada  1,922 12.87 79.85
Refitting Large Coast Guard Vessels  122 0.82 80.67
Renovation and Retrofit of Social Housi  2,568 17.20 97.86
Small Craft Harbours Program  260 1.74 99.60
Strategic Investments in Northern Econo  26 0.17 99.78
Strengthening the Competitiveness of Ca  8 0.05 99.83
Supporting the Development of Internati  24 0.16 99.99
Twinning the TransCanada Highway Throu  1 0.01 100.00
—————————————+———————————–
Total  14,932 100.00.
The total of 14,932 projects are allocated among 54 different initiatives, with Infrastructure Stimulus Fund Renovation (3,859) and Retrofit of Social Housing (2,568) the most common.
 Draw a kernel density of the amount spent by the federal government. What shape does it have?
The Stata command and its results follow:
kdensityFed_Amount, bwidth(500) recast(connected)
To provide additional detail, the default bandwidth of 62,000 was reduced to 1,000. Roughly speaking, the graph looks like a rectangular hyperbola, with many small amounts and a very small number of large amounts. The additional detail afforded by the relatively narrow bandwidth implies that the relatively few small projects are clustered around what may be “focal point” amounts, 50,000,000 and 100,000,000 and 250,000,000.
 Is there an outlier? What is it? Where is it located?
The very largest amount, 350,000,000 can certainly be considered an outlier, as it lies nearly 50 standard deviations (7,113,769) above the mean (1,134,993).
 Drop the outlier identified in question 5 and all projects with a negative value.
The Stata command to drop negative observations and the observation of 350,000,000 and its results follow:
drop if Fed_Amount<0  Fed_Amount ==350000000
(4 observations deleted)
 How many projects were undertaken in the riding of Halifax?
The Stata command to obtain a frequency count for Halifax and Halifax West and its results follow:
tab1 Riding_Name if Riding_Name==”Halifax”  Riding_Name== “Halifax West”
>tabulation of Riding_Name if Riding_Name==”Halifax”  Riding_Name== “Halifax West”
Riding_Name  Freq.Percent Cum.
—————————————+———————————–
Halifax  63 75.90 75.90
Halifax West  20 24.10 100.00
—————————————+———————————–
Total  83 100.00
There were 63 projects in Halifax and 20 projects in Halifax West.
 List all the projects in the riding of SaintLeonard—SaintMichel.
The Stata command that lists the project type and its total amount along with its results follows:
listProject_DescriptionTotal_Amount if Riding_Name ==”SaintLeonard–SaintMichel”
+——————————————————————————–+
 Project_DescriptionTotal_~t 
——————————————————————————–
  Renovation of existing social housing – Societed’Entraide SOS OSBL 60317 
  Development of a soccer field 1356900 
  Hebert Stadium retrofit 4082445 
  Construction of a community centre 965000 
+——————————————————————————–+
 For each riding, create a variable that calculates the number of projects, the total value of projects and the total federal contribution. (Hint: use “egen”)
The Stata commands that create these three variables, named TotalNumberByRiding, TotalAmountByRiding and TotalFedAmountByRiding, follows:
byRiding_Number, sort : egen float TotalNumberByRiding = count(Riding_Number)
byRiding_Number, sort : egen float TotalAmountByRiding = total(Total_Amount)
byRiding_Number, sort : egen float TotalFedAmountByRiding = total(Fed_Amount)
 For each riding, keep only one row with the total number of projects, the total value of projects and the total federal contribution. (Hint: use “duplicate drops ”)
The Stata command to drop duplicate observations for the totals and its results follows:
. duplicates drop TotalNumberByRidingTotalAmountByRidingTotalFedAmountByRiding, force
Duplicates in terms of TotalNumberByRidingTotalAmountByRidingTotalFedAmountByRiding
(14620 observations deleted)
This results in a file with 308 observations.
 Draw a kernel density of the number of projects per riding. What shape does it have?
The Stata command to superimpose a kernel density over a histogram and the figure it produces follow:
histogramTotalNumberByRiding, width(20) start(0) fcolor(navy) lcolor(dknavy) kdensitykdenopts(lcolor(maroon) lwidth(thick)) xlabel(0(20)160)
(bin=9, start=0, width=20)
The graph looks much like a truncated normal distribution, with a lower limit of 0 because a negative number of projects is not possible, and an extended right tail and a single mode somewhere between 20 and 40 projects.
 Create a new variable with the logarithm value. Draw a kernel density to make sure the distribution is more or less normal.
The Stata commands to create the new variable and produce a kernel density superimposed on a histogram and their results follow:
generatelog_TotalValue = log( TotalFedAmountByRiding)
label variable log_TotalValue “log of total value by Riding”
histogram log_TotalValue, width(2) start(0) fcolor(navy8) lcolor(bluishgray8) kdensitykdenopts(lcolor(maroon) lwidth(thick) width(2)) xlabel(0(4)20) (bin=11, start=0, width=2)
The density appears acceptably close to normal.
 Save the file.
The Stata command and its results follow:
. save “C:\STATA\total projects and amounts by riding.dta”
file C:\STATA\total projects and amounts by riding.dta saved
 Import the file “EAP Data” into Stata
 Browse through the data.
The Stata commands and their results follow
insheet using “C:\STATA\EAP Data.csv”, comma clear
(41 vars, 308 obs)
Browsing through the data indicated the presence of a column of missing values under the variable name “v40” along with the source website in the 41^{st} column. These two variables were dropped with the following Stata command:
. drop v40 httpwww12statcancacensusrecensem
 Create dummies for regions. (Hint: group the prairies, the Atlantic provinces and the Northern territories together). Quebec, Ontario and BC are big enough to be regions.
The Stata code to create the regional dummies and label them follows:
generateBC_region = 1 if province==”British Columbia”
replaceBC_region = 0 if province!=”British Columbia”
generateOntario_region = 1 if province==”Ontario”
replaceOntario_region = 0 if province!=”Ontario”
generateQuebec_region = 1 if province==”Quebec”
replaceQuebec_region = 0 if province!=”Quebec”
generateAtlantic_region = 1 if province==”New Brunswick”  province==”Newfoundland and Labrador”  province==”Nova Scotia”  province==”Prince Edward Island”
replaceAtlantic_region = 0 if !(province==”New Brunswick”  province==”Newfoundland and Labrador”  province==”Nova Scotia”  province==”Prince Edward Island”)
generatePrairie_region = 1 if province==”Alberta”  province==”Saskatchewan”  province==”Manitoba”
replacePrairie_region = 0 if !(province==”Alberta”  province==”Saskatchewan”  province==”Manitoba”)
generateNorthern_region = 1 if province==”Northwest Territories”  province==”Nunavut”  province==”Yukon”
replaceNorthern_region = 0 if !(province==”Northwest Territories”  province==”Nunavut”  province==”Yukon”)
label variable BC_region “Dummy, 1 if British Columbia”
label variable Ontario_region “Dummy, 1 if Ontario”
label variable Quebec_region “Dummy, 1 if Quebec”
label variable Atlantic_region “Dummy, 1 if New Brunswick, Newfoundland and Labrador, Nova Scotia, or PEI”
label variable Prairie_region “Dummy, 1 if Alberta, Saskatchewan, Manitoba”
label variable Northern_region “Dummy, 1 if Northwest territories, Nunavut or Yukon”
 Determine how many projects each region received.
 Merge this file with the one you saved in step 11 (Hint: use “merge”).
The file saved earlier as “total projects and amounts by riding.dta” has the total number of projects by riding and the current file has the region dummies. The two files are merged first (question 18) and then project totals are computed by region dummy. The following code merges the two files using a onetoone merge by riding_number (note that the variable riding_number in the current file is renamed as Riding_Number to match the namein the earlier file).
renameriding_numberRiding_Number
. merge 1:1 Riding_Number using “C:\STATA\total projects and amounts by riding.dta”, keepusing(TotalNumberByRidingTotalAmountByRidingTotalFedAmountByRidinglog_TotalValue) generate(_merge_totalsbyriding)
Result # of obs.
—————————————–
not matched 0
matched 308 (_merge_totalsbyriding==3)
—————————————–
The results indicate that all of the records were successfully merged, which is anticipated because there is one record for each riding in each file.
To obtain the number of projects in each region, I first construct and label the values of a region categorical variable (which could also be used for the dummy variables using the “i.” prefix), region, and then collapse the file by region, taking the sum of number of projects, and then list the resulting file’s contents. The Stata commands and their results follow:
/* Create a categorical variable for region. */
generate region = 1 if BC_region==1
replace region = 2 if Prairie_region==1
replace region = 3 if Ontario_region==1
replace region = 4 if Quebec_region==1
replace region = 5 if Atlantic_region==1
replace region = 6 if Northern_region==1
label define REGIONS 1 “British Columbia” 2 “Prairie provinces” 3 “Ontario” 4 “Quebec” 5 “Atlantic provinces” 6 “Northern provinces”
label values region REGIONS
/* Collapse the file by region and take sum. */
collapse (sum) TotalProjectsByRegion = TotalNumberByRiding, by(region)
list
+——————————+
 regionTotalP~n 
——————————
  British Columbia 1631 
  Prairie provinces 3106 
  Ontario 5068 
  Quebec 2554 
  Atlantic provinces 2237 
——————————
  Northern provinces 332 
+——————————+
 Conduct a ttest to determine whether ridings in Ontario received more projects than ridings in Quebec. (Hint: compare the log variables)
The Stata commands and their results follow:
generatelog_NumberProjects = log( TotalNumberByRiding)
ttestlog_NumberProjects if region == 3  region ==4, by(region)
Twosample t test with equal variances
——————————————————————————
Variable  Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
———+——————————————————————–
Ontario  106 3.704091 .0580758 .5979264 3.588938 3.819245
Quebec  75 3.257182 .0939262 .8134248 3.07003 3.444334
———+——————————————————————–
combined  181 3.518908 .0540771 .7275323 3.412201 3.625614
———+——————————————————————–
diff  .4469091 .104892 .2399252 .6538931
——————————————————————————
diff = mean(Ontario) – mean(Quebec) t = 4.2607
Ho: diff = 0 degrees of freedom = 179
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 1.0000 Pr(T > t) = 0.0000 Pr(T > t) = 0.0000
The null hypothesis that the population average of the log of the number of projects in each riding is the same for ridings in Ontario and Quebec can be rejected against the twotailed alternative at the 0.0206 level of significance. Therefore there is very strong evidence that the average number of total projects in ridings in Quebec is not the same in Ontario, with Ontario having a larger average number of projects in its ridings.
 Get the same result as in question 18 running a regression.
The Stata command and its results follow:
regresslog_NumberProjectsi.region if region==3region==4, cformat(%9.5f) pformat(%5.4f) sformat(%8.3f)
Source  SS df MS Number of obs = 181
————+—————————— F( 1, 179) = 18.15
Model  8.77257404 1 8.77257404 Prob> F = 0.0000
Residual  86.5020188 179 .483251501 Rsquared = 0.0921
————+—————————— Adj Rsquared = 0.0870
Total  95.2745928 180 .529303293 Root MSE = .69516
——————————————————————————
log_Number~s  Coef. Std. Err. t P>t [95% Conf. Interval]
————+—————————————————————
4.region  0.44691 0.10489 4.261 0.0000 0.65389 0.23993
_cons  3.70409 0.06752 54.859 0.0000 3.57085 3.83733
——————————————————————————
Note that the estimate for the constant in the regression, 3.70409, is the average of the log of the number of projects for each riding in region 3, Ontario, whereas the estimated coefficient for region.4 (Quebec), 0.44691, is the same as the difference between the means shown as “diff” in question 19. The t statistic for the estimated coefficient on 4.region is 4.261, which is the same as the t statistic for the question 19 to 3 decimal places with the sole change being the difference of means has been reversed – question 19 looks at the difference Ontario minus Quebec, and question 20 looks at Quebec minus Ontario. Clearly the order of the difference affects only the sign of the t test, and since the t statistic has the same value and degrees of freedom in each question, it follows that the signficiance level and therefore the conclusion are the same in each question.
 The difference between the percentage of the vote obtained by the conservative candidate and the best nonconservative candidate could explain the amount of money received by a riding. Draw a scatter plot of both variables with a linear (Hint: lfit) and a quadratic (Hint: qfit) trend. What do you think?
I investigate the log of the federal funds received by the riding. The Stata commands to produce the graph and the graph follow:
generatelog_FedFunds = log( TotalFedAmountByRiding)
twoway (scatter diff_cons_best_08 log_FedFunds) (lfit diff_cons_best_08 log_FedFunds) (qfit diff_cons_best_08 log_FedFunds), ytitle(% votes Conservative minus % votes best opponent) xtitle(log of federal funds received by riding)
There is a positive relationship of modest strength between the percentage margin for the conservative and the log of federal funds received. The quadratic fit, shown by the green line, is slightly better than the linear fit, shown by the red line.
 Conduct a regression with relevant sociodemographic control variables to confirm the intuition from question 19.
The Stata command and results follow:
regresslog_NumberProjectsi.regionunemployment_ratepercentage_immigrantparticipation_ratehigh_school_lower downtown if region==3  region==4
Source  SS df MS Number of obs = 181
————+—————————— F( 6, 174) = 24.37
Model  43.5052401 6 7.25087335 Prob> F = 0.0000
Residual  51.7693527 174 .297525015 Rsquared = 0.4566
————+—————————— Adj Rsquared = 0.4379
Total  95.2745928 180 .529303293 Root MSE = .54546
————————————————————————————–
log_NumberProjects  Coef. Std. Err. t P>t [95% Conf. Interval]
———————+—————————————————————
4.region  .9301546 .1036139 8.98 0.000 1.134657 .7256526
unemployment_rate  4.286596 2.914278 1.47 0.143 1.465289 10.03848
percentage_immigrant 2.581618 .3038018 8.50 0.000 3.181229 1.982007
participation_rate 2.271657 1.233366 1.84 0.067 4.70594 .1626263
high_school_lower .7279007 .6812482 1.07 0.287 2.072474 .6166731
downtown  1.044341 .2392536 4.36 0.000 .5721283 1.516554
_cons  5.957877 1.062268 5.61 0.000 3.861288 8.054466
————————————————————————————–
The included variables are the region dummy for region 4, the unemployment rate, the percentage of the population that is immigrant, the labor force participation rate, the percentage of the population with high school or less education and a dummy indicating the riding is downtown. All of these variables except high school education and unemployment rate are significant at the 0.10 level of significance and have the expected signs. Even after these variables are controlled for, it remains the case that ridings the average of log number of projects received in in region 4, Quebec, iislkess than for Ontario.
 Conduct a regression to determine whether ridings that voted conservative obtained more projects. (Hint: control for relevant sociodemographic factors)
The Stata command and results follow:
regresslog_NumberProjects i.conservative_08 i.regionunemployment_ratepercentage_immigrantparticipation_ratehigh_school
> _lower downtown
Source  SS df MS Number of obs = 308
————+—————————— F( 11, 296) = 29.02
Model  85.977467 11 7.81613336Prob> F = 0.0000
Residual  79.7228817 296 .26933406 Rsquared = 0.5189
————+—————————— Adj Rsquared = 0.5010
Total  165.700349 307 .53974055 Root MSE = .51897
————————————————————————————–
log_NumberProjects  Coef. Std. Err. t P>t [95% Conf. Interval]
———————+—————————————————————
1.conservative_08  .0135065 .077975 0.17 0.863 .1399492 .1669621

region 
2  .1158196 .1239356 0.93 0.351 .128087 .3597262
3  .2398577 .1018492 2.36 0.019 .0394174 .440298
4  .6902288 .1238108 5.57 0.000 .9338899 .4465678
5  .1946797 .1630347 1.19 0.233 .5155338 .1261744
6  .5623275 .3432132 1.64 0.102 .1131198 1.237775

unemployment_rate  3.424306 1.476758 2.32 0.021 .5180295 6.330582
percentage_immigrant  2.75948 .2532239 10.90 0.000 3.257827 2.261133
participation_rate 2.027376 .8167286 2.48 0.014 3.634707 .4200456
high_school_lower  .0086662 .498141 0.02 0.986 .9716807 .989013
downtown  1.03105 .176185 5.85 0.000 .6843156 1.377784
_cons  5.290594 .6777025 7.81 0.000 3.956868 6.62432
————————————————————————————–
This question was put to the full sample, rather than just regions 3 and 4, and therefore the regional dummies were added to the regression as well as the dummy for went Conservative in ’08. The remaining independent variables are the same as for question 23.
Although the estimated coefficient on conservative_08, 0.0135065, iis positive, the p value for the estimate is 0.863, implying that there very little evidence here to conclude that the population effect of going conservative is different than 0.
 Explain why you are keeping or dropping certain sociodemographic factors from your regression.
Because its p value is 0.98 (i.e., no evidence it is different than zero) I dropped the high school education regression from the following regression:
regresslog_NumberProjects i.conservative_08 i.regionunemployment_ratepercentage_immigrantparticipation_rate downtown
Source  SS df MS Number of obs = 308
————+—————————— F( 10, 297) = 32.03
Model  85.9773855 10 8.59773855 Prob> F = 0.0000
Residual  79.7229633 297 .268427486 Rsquared = 0.5189
————+—————————— Adj Rsquared = 0.5027
Total  165.700349 307 .53974055 Root MSE = .5181
————————————————————————————–
log_NumberProjects  Coef. Std. Err. t P>t [95% Conf. Interval]
———————+—————————————————————
1.conservative_08  .0136077 .0776265 0.18 0.861 .1391599 .1663754

region 
2  .1162433 .1213143 0.96 0.339 .1225013 .3549878
3  .2399847 .1014162 2.37 0.019 .0403994 .4395701
4  .690597 .1217838 5.67 0.000 .9302654 .4509285
5  .1951041 .1609273 1.21 0.226 .5118064 .1215981
6  .5624509 .342562 1.64 0.102 .1117054 1.236607

unemployment_rate  3.431265 1.41916 2.42 0.016 .6383816 6.224148
percentage_immigrant  2.76117 .23346 11.83 0.000 3.220615 2.301725
participation_rate  2.02994 .8019674 2.53 0.012 3.608199 .4516815
downtown  1.030233 .169525 6.08 0.000 .6966103 1.363855
_cons  5.296436 .5876487 9.01 0.000 4.139954 6.452919
————————————————————————————–
.Dropping the education variable does not change the previous conclusion as the p value for the conservative in ’08 dummy is now 0.861. There is still very little evidence here to conclude that the population effect of going conservative is different than 0.
 Explain in words the coefficients on a few significant variables.
The estimated coefficient on the unemployment rate is 3.43, and the rate is measured as a fraction of 100, so that a 3% unemployment rate is in the data as 0.03. The estimated coefficient indicates the change in the average of the log of the number of projects that would result from an increase of 1 in the unemployment rate, holding all other factors constant. Given that the model is linear and the unemployment rate is expressed as a proportion of 100, we should divide the estimated coefficient by 100 to obtain the correct interpretation that the average for the log of the number of projects increases by 0.0343 when the unemployment rate increases by 1 point (e.g., from 5% to 6%), all other factors held constant.
The estimated coefficient on region 4, Quebec, is 0.691. The estimated coefficient implies that a riding in Quebec would have an average log of number of projects that is 0.691 less than the omitted region, British Columbia, if all other variables were at the same values for the riding in Quebec and the riding in British Columbia.
The estimated coefficient for downtown indicates that a riding that is in an urban area will have an average log of number of projects that is 1.03 higher than a riding that is not in an urban area, all other factors held constant.