# Socio-Demographic Factors

Data Assignment in 25 Easy Steps

This data stems from the Economic Action Plan conducted in 2009-2011 by the Canadian government to stimulate the economy. The file “Raw EAP Data” contains all 14,932 projects that were advertised on the government website (the data was scrapped). The file “EAP Data” contains information on the 308 Canadian ridings and originates from this website: https://www12.statcan.gc.ca/fed-cef/index.cfm?Lang=E&SID=1. The objective of this assignment is to better understand the factors that influenced the allocation of projects and practice coding skills. Write a .do file with comments showing the different steps and a report with explanations and outputs (use .png files in your .doc document).

1. Import the file “Raw EAP Data” into Stata.
2. Browse through the data.
3. Determine how many types of initiatives there are.
4. Draw a kernel density of the amount spent by the federal government. What shape does it have?
5. Is there an outlier? What is it? Where is it located?
6. Drop the outlier identified in question 5 and all projects with a negative value.
7. How many projects were undertaken in the riding of Halifax?
8. List all the projects in the riding of Saint-Leonard—Saint-Michel.
9. For each riding, create a variable that calculates the number of projects, the total value of projects and the total federal contribution. (Hint: use “egen”)
10. For each riding, keep only one row with the total number of projects, the total value of projects and the total federal contribution. (Hint: use “duplicate drops ”)
11. Draw a kernel density of the number of projects per riding. What shape does it have?
12. Create a new variable with the logarithm value. Draw a kernel density to make sure the distribution is more or less normal.
13. Save the file.
14. Import the file “EAP Data” into Stata
15. Browse through the data.
16. Create dummies for regions. (Hint: group the prairies, the Atlantic provinces and the Northern territories together). Quebec, Ontario and BC are big enough to be regions.
17. Determine how many projects each region received.
18. Merge this file with the one you saved in step 11 (Hint: use “merge”).
19. Conduct a t-test to determine whether ridings in Ontario received more projects than ridings in Quebec. (Hint: compare the log variables)
20. Get the same result as in question 18 running a regression.
21. The difference between the percentage of the vote obtained by the conservative candidate and the best non-conservative candidate could explain the amount of money received by a riding. Draw a scatter plot of both variables with a linear (Hint: lfit) and a quadratic (Hint: qfit) trend. What do you think?
22. Conduct a regression with relevant socio-demographic control variables to confirm the intuition from question 19.
23. Conduct a regression to determine whether ridings that voted conservative obtained more projects. (Hint: control for relevant socio-demographic factors)
24. Explain why you are keeping or dropping certain socio-demographic factors from your regression.
25. Explain in words the coefficients on a few significant variables.

Solution

/* Set more off and set working directory to data location.    */

set more off

cd “C:\STATA”

/* 1. Import the file “Raw EAP Data” into Stata.               */

/* 2. Browse through the data.                                 */

import excel “C:\STATA\Raw Data EAP.xlsx”, sheet(“Sheet1″) firstrow

/*  3. Determine how many types of initiatives there are.      */

tab1 Initiative_Description, missing

/*  4.      Draw a kernel density of the amount spent by the       */

/*  federal government. What shape does it have?               */

/*  5.      Is there an outlier? What is it? Where is it located?  */

kdensityFed_Amount, bwidth(1000) recast(connected)

/* Thetabstat command provides the mean and stddev of        */

/* Fed_Amount, which are used to tag an outlier.               */

tabstatFed_Amount, statistics( mean sd min p25 p50 p75 max ) columns(statistics)

/* 6. Drop the outlier identified in question 5 and all        */

/*    projects with a negative value.                          */

drop if Fed_Amount<0 |  Fed_Amount ==350000000

/* 7. How many projects were undertaken in the riding of       */

/*    Halifax?                                                 */

tab1 Riding_Name if Riding_Name==”Halifax” | Riding_Name== “Halifax West”

/* 8. List all the projects in the riding of                   */

/*    Saint-Leonard—Saint-Michel.                              */

listProject_DescriptionTotal_Amount if Riding_Name ==”Saint-Leonard–Saint-Michel”

/* 9. For each riding, create a variable that calculates the   */

/*    number of projects, the total value of projects and the  */

/*    total federal contribution. (Hint: use “egen”).          */

byRiding_Number, sort : egen float TotalNumberByRiding = count(Riding_Number)

byRiding_Number, sort : egen float TotalAmountByRiding = total(Total_Amount)

byRiding_Number, sort : egen float TotalFedAmountByRiding = total(Fed_Amount)

label variable TotalNumberByRiding “Total number by Riding”

label variable TotalAmountByRiding “Total amount by Riding”

label variable TotalFedAmountByRiding “Total federal amount by Riding”

/* 10. For each riding, keep only one row with the total       */

/*     number of projects, the total value of projects and the */

/*     total federal contribution.                             */

/*     (Hint: use “duplicate drops ”)                          */

duplicates drop TotalNumberByRidingTotalAmountByRidingTotalFedAmountByRiding, force

/* 11. Draw a kernel density of the number of projects per     */

/* riding. What shape does it have?                            */

histogramTotalNumberByRiding, width(20) start(0) fcolor(navy8) lcolor(bluishgray8) kdensitykdenopts(lcolor(maroon) lwidth(thick)) xlabel(0(20)160)

/* 12. Create a new variable with the logarithm value. Draw a  */

/* kernel density to make sure the distribution is more or     */

/* less normal.                                                */

generatelog_TotalValue = log( TotalFedAmountByRiding)

label variable log_TotalValue “log of total value by Riding”

/* 13. Save the file.                                          */

save “C:\STATA\total projects and amounts by riding.dta”

/* 14. Import the file “EAP Data” into Stata.                  */

/* 15. Browse through the data.                                */

insheet using “C:\STATA\EAP Data.csv”, comma clear

drop v40 httpwww12statcancacensusrecensem

/* 16. Create dummies for regions. (Hint: group the prairies,  */

/* the Atlantic provinces and the Northern territories         */

/* together). Quebec, Ontario and BC are big enough to be      */

/* regions.                                                    */

/* This is a series of generate and replace commands to create */

/* the regional dummy variables.                               */

generateBC_region = 1 if province==”British Columbia”

replaceBC_region = 0 if province!=”British Columbia”

generateOntario_region = 1 if province==”Ontario”

replaceOntario_region = 0 if province!=”Ontario”

generateQuebec_region = 1 if province==”Quebec”

replaceQuebec_region = 0 if province!=”Quebec”

generateAtlantic_region = 1 if province==”New Brunswick” | province==”Newfoundland and Labrador” | province==”Nova Scotia” | province==”Prince Edward Island”

replaceAtlantic_region = 0 if !(province==”New Brunswick” | province==”Newfoundland and Labrador” | province==”Nova Scotia” | province==”Prince Edward Island”)

generatePrairie_region = 1 if province==”Alberta” | province==”Saskatchewan” | province==”Manitoba”

replacePrairie_region = 0 if !(province==”Alberta” | province==”Saskatchewan” | province==”Manitoba”)

generateNorthern_region = 1 if province==”Northwest Territories” | province==”Nunavut” | province==”Yukon”

replaceNorthern_region = 0 if !(province==”Northwest Territories” | province==”Nunavut” | province==”Yukon”)

label variable BC_region “Dummy, 1 if British Columbia”

label variable Ontario_region “Dummy, 1 if Ontario”

label variable Quebec_region “Dummy, 1 if Quebec”

label variable Atlantic_region “Dummy, 1 if New Brunswick, Newfoundland and Labrador, Nova Scotia, or PEI”

label variable Prairie_region “Dummy, 1 if Alberta, Saskatchewan, Manitoba”

label variable Northern_region “Dummy, 1 if Northwest territories, Nunavut or Yukon”

/* 17. Determine how many projects each region received.       */

/* 18. Merge this file with the one you saved in step 11       */

/* (Hint: use “merge”).                                        */

/* Q18                                                         */

renameriding_numberRiding_Number

merge 1:1 Riding_Number using “C:\STATA\total projects and amounts by riding.dta”, keepusing(TotalNumberByRidingTotalAmountByRidingTotalFedAmountByRidinglog_TotalValue) generate(_merge_totalsbyriding)

/* Q17                                                         */

/* Create a categorical variable for region.                   */

generate region = 1 if BC_region==1

replace region = 2 if Prairie_region==1

replace region = 3 if Ontario_region==1

replace region = 4 if Quebec_region==1

replace region = 5 if Atlantic_region==1

replace region = 6 if Northern_region==1

label define REGIONS 1 “British Columbia” 2 “Prairie provinces” 3 “Ontario” 4 “Quebec” 5 “Atlantic provinces” 6 “Northern provinces”

label values region REGIONS

/* Save the current file, then collapse the file by region and */

/* take the sum of the number of projects by riding.            */

save “C:\STATA\merged EAP data.dta”

collapse (sum) TotalProjectsByRegion = TotalNumberByRiding, by(region)

list

/* Re-open the saved file.                                     */

use “C:\STATA\merged EAP data.dta”, clear

/* 19. Conduct a t-test to determine whether ridings in        */

/* Ontario received more projects than ridings in Quebec.      */

/* (Hint: compare the log variables)                           */

/* First generate the log of the number of projects.           */

generatelog_NumberProjects = log( TotalNumberByRiding)

/* Conduct the t test on the log of the number of projects for */

/* each riding                                                 */

ttestlog_NumberProjects if region == 3 | region ==4, by(region)

/* 20. Get the same result as in question 18 running a         */

/* regression.                                                 */

regresslog_NumberProjectsi.region if region==3|region==4, cformat(%9.5f) pformat(%5.4f) sformat(%8.3f)

/* 21. The difference between the percentage of the vote       */

/* obtained by the conservative candidate and the best         */

/* non-conservative candidate could explain the amount of      */

/* money received by a riding. Draw a scatter plot of both     */

/* variables with a linear (Hint: lfit) and a quadratic.       */

/* (Hint: qfit) trend. What do you think?                      */

generatelog_FedFunds = log( TotalFedAmountByRiding)

twoway (scatter diff_cons_best_08 log_FedFunds) (lfit diff_cons_best_08 log_FedFunds) (qfit diff_cons_best_08 log_FedFunds), ytitle(% votes Conservative minus % votes best opponent) xtitle(log of federal funds received by riding)

/* 22. Conduct a regression with relevant socio-demographic    */

/* control variables to confirm the intuition from question 19.*/

regresslog_NumberProjectsi.regionunemployment_ratepercentage_immigrantparticipation_ratehigh_school_lower  downtown if region==3 | region==4

/* 23. Conduct a regression to determine whether ridings that  */

/* voted conservative obtained more projects.                  */

/* (Hint: control for relevant socio-demographic factors)      */

regresslog_NumberProjects i.conservative_08 i.regionunemployment_ratepercentage_immigrantparticipation_ratehigh_school_lower  downtown

/* 24. Explain why you are keeping or dropping certain         */

/* socio-demographic factors from your regression.             */

regresslog_NumberProjects i.conservative_08 i.regionunemployment_ratepercentage_immigrantparticipation_rate  downtown                                            *

Data Assignment in 25 Easy Steps

Econ 4410

This data stems from the Economic Action Plan conducted in 2009-2011 by the Canadian government to stimulate the economy. The file “Raw EAP Data” contains all 14,932 projects that were advertised on the government website (the data was scrapped). The file “EAP Data” contains information on the 308 Canadian ridings and originates from this website: https://www12.statcan.gc.ca/fed-cef/index.cfm?Lang=E&SID=1. The objective of this assignment is to better understand the factors that influenced the allocation of projects and practice coding skills. Write a .do file with comments showing the different steps and a report with explanations and outputs (use .png files in your .doc document).

1. Import the file “Raw EAP Data” into Stata.
2. Browse through the data.

The Stata commands to import the data follow:

import excel “C:\STATA\Raw Data EAP (2).xlsx”, sheet(“Sheet1”) firstrow

1. Determine how many types of initiatives there are.

The Stata command to provide a frequency count of each different initiative follows, along with its results:

tab1 Initiative_Description, missing

->tabulation of Initiative_Description

Initiative_Description |      Freq.Percent        Cum.

—————————————-+———————————–

Aboriginal Skills and Employment Partne |         20        0.13        0.13

Aboriginal Skills and Training Strategi |         93        0.62        0.76

Accelerating Approval Processes for Bui |         98        0.66        1.41

Accelerating the Federal Contaminated S |        269        1.80        3.21

Addressing First Nations’ Housing Needs |        355        2.38        5.59

Agricultural Flexibility Program |          7        0.05        5.64

Arctic Research Infrastructure Fund |         37        0.25        5.89

Blue Water Bridge Canadian Plaza and Br |          1        0.01        5.89

Building a Small Craft Harbour in Pangn |          1        0.01        5.90

Canada Cultural Spaces Fund |        120        0.80        6.70

Champlain Bridge |          1        0.01        6.71

Clean Energy Fund Program |         11        0.07        6.78

Communities Component of the Building C |      1,386        9.28       16.07

Community Adjustment Fund |      1,120        7.50       23.57

Enhancing Rail Safety |        154        1.03       24.60

Enhancing the Accessibility of Federal  |        194        1.30       25.90

Federal Economic Development Agency for |        404        2.71       28.60

First Nations Schools |         13        0.09       28.69

First Nations Water and Wastewater Proj |         18        0.12       28.81

Green Infrastructure Fund |         21        0.14       28.95

Helping Municipalities Build Stronger C |        138        0.92       29.88

Housing for Low-Income Seniors |        168        1.13       31.00

Housing for Persons with Disabilities |         29        0.19       31.19

Improvements to National Capital Area B |          2        0.01       31.21

Improving Parks Canada’s National Histo |        131        0.88       32.09

Improving the Peace Bridge Plaza |          1        0.01       32.09

Industrial Research Assistance Program  |         36        0.24       32.33

Infrastructure Stimulus Fund |      3,859       25.84       58.18

Infrastructure at Ports of Entry |          4        0.03       58.20

Infrastructure at Remote Ports of Entry |          3        0.02       58.22

Institute for Quantum Computing |          1        0.01       58.23

Investing in Federal Bridges |          4        0.03       58.26

Investing in Federal Buildings |        297        1.99       60.25

Investing in First Nations and Inuit He |         39        0.26       60.51

Investing in Inter-City Passenger Rail  |          1        0.01       60.51

Investing in Remote Rail Passenger Serv |          2        0.01       60.53

Knowledge Infrastructure Program |        508        3.40       63.93

Marquee Tourism Events Program |        108        0.72       64.65

Modernizing Federal Laboratories |        134        0.90       65.55

National Historic Sites of Canada Cost- |         79        0.53       66.08

National Recreational Trails |          1        0.01       66.09

New Environmental Response Barges for t |          4        0.03       66.11

New Inshore Fisheries Science Vessels |          1        0.01       66.12

New Small Boats for the Coast Guard |          7        0.05       66.17

Northern Housing |        103        0.69       66.86

Policing Infrastructure in First Nation |         18        0.12       66.98

Recreational Infrastructure Canada |      1,922       12.87       79.85

Refitting Large Coast Guard Vessels |        122        0.82       80.67

Renovation and Retrofit of Social Housi |      2,568       17.20       97.86

Small Craft Harbours Program |        260        1.74       99.60

Strategic Investments in Northern Econo |         26        0.17       99.78

Strengthening the Competitiveness of Ca |          8        0.05       99.83

Supporting the Development of Internati |         24        0.16       99.99

Twinning the Trans-Canada Highway Throu |          1        0.01      100.00

—————————————-+———————————–

Total |     14,932      100.00.

The total of 14,932 projects are allocated among 54 different initiatives, with Infrastructure Stimulus Fund Renovation (3,859) and Retrofit of Social Housing (2,568) the most common.

1. Draw a kernel density of the amount spent by the federal government. What shape does it have?

The Stata command and its results follow:

kdensityFed_Amount, bwidth(500) recast(connected)

To provide additional detail, the default bandwidth of 62,000 was reduced to 1,000. Roughly speaking, the graph looks like a rectangular hyperbola, with many small amounts and a very small number of large amounts. The additional detail afforded by the relatively narrow bandwidth implies that the relatively few small projects are clustered around what may be “focal point” amounts, 50,000,000 and 100,000,000 and 250,000,000.

1. Is there an outlier? What is it? Where is it located?

The very largest amount, 350,000,000 can certainly be considered an outlier, as it lies nearly 50 standard deviations (7,113,769) above the mean (1,134,993).

1. Drop the outlier identified in question 5 and all projects with a negative value.

The Stata command to drop negative observations and the observation of 350,000,000 and its results follow:

drop if Fed_Amount<0 |  Fed_Amount ==350000000

(4 observations deleted)

1. How many projects were undertaken in the riding of Halifax?

The Stata command to obtain a frequency count for Halifax and Halifax West and its results follow:

tab1 Riding_Name if Riding_Name==”Halifax” | Riding_Name== “Halifax West”

->tabulation of Riding_Name if Riding_Name==”Halifax” | Riding_Name== “Halifax West”

Riding_Name |      Freq.Percent        Cum.

—————————————-+———————————–

Halifax |         63       75.90       75.90

Halifax West |         20       24.10      100.00

—————————————-+———————————–

Total |         83      100.00

There were 63 projects in Halifax and 20 projects in Halifax West.

1. List all the projects in the riding of Saint-Leonard—Saint-Michel.

The Stata command that lists the project type and its total amount along with its results follows:

listProject_DescriptionTotal_Amount if Riding_Name ==”Saint-Leonard–Saint-Michel”

+——————————————————————————–+

|                                                 Project_DescriptionTotal_~t |

|——————————————————————————–|

1. | Renovation of existing social housing – Societed’Entraide SOS OSBL 60317 |
2. | Development of a soccer field 1356900 |
3. | Hebert Stadium retrofit 4082445 |
4. | Construction of a community centre 965000 |

+——————————————————————————–+

1. For each riding, create a variable that calculates the number of projects, the total value of projects and the total federal contribution. (Hint: use “egen”)

The Stata commands that create these three variables, named TotalNumberByRiding, TotalAmountByRiding and TotalFedAmountByRiding, follows:

byRiding_Number, sort : egen float TotalNumberByRiding = count(Riding_Number)

byRiding_Number, sort : egen float TotalAmountByRiding = total(Total_Amount)

byRiding_Number, sort : egen float TotalFedAmountByRiding = total(Fed_Amount)

1. For each riding, keep only one row with the total number of projects, the total value of projects and the total federal contribution. (Hint: use “duplicate drops ”)

The Stata command to drop duplicate observations for the totals and its results follows:

. duplicates drop TotalNumberByRidingTotalAmountByRidingTotalFedAmountByRiding, force

Duplicates in terms of TotalNumberByRidingTotalAmountByRidingTotalFedAmountByRiding

(14620 observations deleted)

This results in a file with 308 observations.

1. Draw a kernel density of the number of projects per riding. What shape does it have?

The Stata command to superimpose a kernel density over a histogram and the figure it produces follow:

histogramTotalNumberByRiding, width(20) start(0) fcolor(navy) lcolor(dknavy) kdensitykdenopts(lcolor(maroon) lwidth(thick)) xlabel(0(20)160)

(bin=9, start=0, width=20)

The graph looks much like a truncated normal distribution, with a lower limit of 0 because a negative number of projects is not possible, and an extended right tail and a single mode somewhere between 20 and 40 projects.

1. Create a new variable with the logarithm value. Draw a kernel density to make sure the distribution is more or less normal.

The Stata commands to create the new variable and produce a kernel density superimposed on a histogram and their results follow:

generatelog_TotalValue = log( TotalFedAmountByRiding)

label variable log_TotalValue “log of total value by Riding”

histogram log_TotalValue, width(2) start(0) fcolor(navy8) lcolor(bluishgray8) kdensitykdenopts(lcolor(maroon) lwidth(thick) width(2)) xlabel(0(4)20) (bin=11, start=0, width=2)

The density appears acceptably close to normal.

1. Save the file.

The Stata command and its results follow:

. save “C:\STATA\total projects and amounts by riding.dta”

file C:\STATA\total projects and amounts by riding.dta saved

1. Import the file “EAP Data” into Stata
2. Browse through the data.

The Stata commands and their results follow

insheet using “C:\STATA\EAP Data.csv”, comma clear

(41 vars, 308 obs)

Browsing through the data indicated the presence of a column of missing values under the variable name “v40” along with the source website in the 41st column. These two variables were dropped with the following Stata command:

. drop v40 httpwww12statcancacensusrecensem

1. Create dummies for regions. (Hint: group the prairies, the Atlantic provinces and the Northern territories together). Quebec, Ontario and BC are big enough to be regions.

The Stata code to create the regional dummies and label them follows:

generateBC_region = 1 if province==”British Columbia”

replaceBC_region = 0 if province!=”British Columbia”

generateOntario_region = 1 if province==”Ontario”

replaceOntario_region = 0 if province!=”Ontario”

generateQuebec_region = 1 if province==”Quebec”

replaceQuebec_region = 0 if province!=”Quebec”

generateAtlantic_region = 1 if province==”New Brunswick” | province==”Newfoundland and Labrador” | province==”Nova Scotia” | province==”Prince Edward Island”

replaceAtlantic_region = 0 if !(province==”New Brunswick” | province==”Newfoundland and Labrador” | province==”Nova Scotia” | province==”Prince Edward Island”)

generatePrairie_region = 1 if province==”Alberta” | province==”Saskatchewan” | province==”Manitoba”

replacePrairie_region = 0 if !(province==”Alberta” | province==”Saskatchewan” | province==”Manitoba”)

generateNorthern_region = 1 if province==”Northwest Territories” | province==”Nunavut” | province==”Yukon”

replaceNorthern_region = 0 if !(province==”Northwest Territories” | province==”Nunavut” | province==”Yukon”)

label variable BC_region “Dummy, 1 if British Columbia”

label variable Ontario_region “Dummy, 1 if Ontario”

label variable Quebec_region “Dummy, 1 if Quebec”

label variable Atlantic_region “Dummy, 1 if New Brunswick, Newfoundland and Labrador, Nova Scotia, or PEI”

label variable Prairie_region “Dummy, 1 if Alberta, Saskatchewan, Manitoba”

label variable Northern_region “Dummy, 1 if Northwest territories, Nunavut or Yukon”

1. Determine how many projects each region received.
2. Merge this file with the one you saved in step 11 (Hint: use “merge”).

The file saved earlier as “total projects and amounts by riding.dta” has the total number of projects by riding and the current file has the region dummies. The two files are merged first (question 18) and then project totals are computed by region dummy. The following code merges the two files using a one-to-one merge by riding_number (note that the variable riding_number in the current file is renamed as Riding_Number to match the namein the earlier file).

renameriding_numberRiding_Number

. merge 1:1 Riding_Number using “C:\STATA\total projects and amounts by riding.dta”, keepusing(TotalNumberByRidingTotalAmountByRidingTotalFedAmountByRidinglog_TotalValue) generate(_merge_totalsbyriding)

Result                           # of obs.

—————————————–

not matched                             0

matched                               308  (_merge_totalsbyriding==3)

—————————————–

The results indicate that all of the records were successfully merged, which is anticipated because there is one record for each riding in each file.

To obtain the number of projects in each region, I first construct and label the values of a region categorical variable (which could also be used for the dummy variables using the “i.” prefix), region, and then collapse the file by region, taking the sum of number of projects, and then list the resulting file’s contents. The Stata commands and their results follow:

/* Create a categorical variable for region.                   */

generate region = 1 if BC_region==1

replace region = 2 if Prairie_region==1

replace region = 3 if Ontario_region==1

replace region = 4 if Quebec_region==1

replace region = 5 if Atlantic_region==1

replace region = 6 if Northern_region==1

label define REGIONS 1 “British Columbia” 2 “Prairie provinces” 3 “Ontario” 4 “Quebec” 5 “Atlantic provinces” 6 “Northern provinces”

label values region REGIONS

/* Collapse the file by region and take sum.                   */

collapse (sum) TotalProjectsByRegion = TotalNumberByRiding, by(region)

list

+——————————-+

|             regionTotalP~n |

|——————————-|

1. | British Columbia       1631 |
2. | Prairie provinces       3106 |
3. | Ontario       5068 |
4. | Quebec       2554 |
5. | Atlantic provinces 2237 |

|——————————-|

1. | Northern provinces 332 |

+——————————-+

1. Conduct a t-test to determine whether ridings in Ontario received more projects than ridings in Quebec. (Hint: compare the log variables)

The Stata commands and their results follow:

generatelog_NumberProjects = log( TotalNumberByRiding)

ttestlog_NumberProjects if region == 3 | region ==4, by(region)

Two-sample t test with equal variances

——————————————————————————

Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

———+——————————————————————–

Ontario |     106    3.704091    .0580758    .5979264    3.588938    3.819245

Quebec |      75    3.257182    .0939262    .8134248     3.07003    3.444334

———+——————————————————————–

combined |     181    3.518908    .0540771    .7275323    3.412201    3.625614

———+——————————————————————–

diff |            .4469091     .104892                .2399252    .6538931

——————————————————————————

diff = mean(Ontario) – mean(Quebec)                           t =   4.2607

Ho: diff = 0                                     degrees of freedom =      179

Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

Pr(T < t) = 1.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 0.0000

The null hypothesis that the population average of the log of the number of projects in each riding is the same for ridings in Ontario and Quebec can be rejected against the two-tailed alternative at the 0.0206 level of significance. Therefore there is very strong evidence that the average number of total projects in ridings in Quebec is not the same in Ontario, with Ontario having a larger average number of projects in its ridings.

1. Get the same result as in question 18 running a regression.

The Stata command and its results follow:

regresslog_NumberProjectsi.region if region==3|region==4, cformat(%9.5f) pformat(%5.4f) sformat(%8.3f)

Source |       SS       df       MS              Number of obs =     181

————-+——————————           F(  1,   179) =   18.15

Model |  8.77257404     1  8.77257404           Prob> F      =  0.0000

Residual |  86.5020188   179  .483251501           R-squared     =  0.0921

Total |  95.2745928   180  .529303293           Root MSE      =  .69516

——————————————————————————

log_Number~s |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

————-+—————————————————————-

4.region |   -0.44691    0.10489   -4.261   0.0000     -0.65389    -0.23993

_cons |    3.70409    0.06752   54.859   0.0000      3.57085     3.83733

——————————————————————————

Note that the estimate for the constant in the regression, 3.70409, is the average of the log of the number of projects for each riding in region 3, Ontario, whereas the estimated coefficient for region.4 (Quebec), -0.44691, is the same as the difference between the means shown as “diff” in question 19. The t statistic for the estimated coefficient on 4.region is -4.261,  which is the same as the t statistic for the question 19 to 3 decimal places with the sole change being the difference of means has been reversed – question 19 looks at the difference Ontario minus Quebec, and question 20 looks at Quebec minus Ontario. Clearly the order of the difference affects only the sign of the t test, and since the t statistic has the same value and degrees of freedom in each question, it follows that the signficiance level and therefore the conclusion are the same in each question.

1. The difference between the percentage of the vote obtained by the conservative candidate and the best non-conservative candidate could explain the amount of money received by a riding. Draw a scatter plot of both variables with a linear (Hint: lfit) and a quadratic (Hint: qfit) trend. What do you think?

I investigate the log of the federal funds received by the riding. The Stata commands to produce the graph and the graph follow:

generatelog_FedFunds = log( TotalFedAmountByRiding)

twoway (scatter diff_cons_best_08 log_FedFunds) (lfit diff_cons_best_08 log_FedFunds) (qfit diff_cons_best_08 log_FedFunds), ytitle(% votes Conservative minus % votes best opponent) xtitle(log of federal funds received by riding)

There is a positive relationship of modest strength between the percentage margin for the conservative and the log of federal funds received. The quadratic fit, shown by the green line, is slightly better than the linear fit, shown by the red line.

1. Conduct a regression with relevant socio-demographic control variables to confirm the intuition from question 19.

The Stata command and results follow:

regresslog_NumberProjectsi.regionunemployment_ratepercentage_immigrantparticipation_ratehigh_school_lower  downtown if region==3 | region==4

Source |       SS       df       MS              Number of obs =     181

————-+——————————           F(  6,   174) =   24.37

Model |  43.5052401     6  7.25087335           Prob> F      =  0.0000

Residual |  51.7693527   174  .297525015           R-squared     =  0.4566

Total |  95.2745928   180  .529303293           Root MSE      =  .54546

————————————————————————————–

log_NumberProjects |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

———————+—————————————————————-

4.region |  -.9301546   .1036139    -8.98   0.000    -1.134657   -.7256526

unemployment_rate |   4.286596   2.914278     1.47   0.143    -1.465289    10.03848

percentage_immigrant|  -2.581618   .3038018    -8.50   0.000    -3.181229   -1.982007

participation_rate|  -2.271657   1.233366    -1.84   0.067     -4.70594    .1626263

high_school_lower|  -.7279007   .6812482    -1.07   0.287    -2.072474    .6166731

downtown |   1.044341   .2392536     4.36   0.000     .5721283    1.516554

_cons |   5.957877   1.062268     5.61   0.000     3.861288    8.054466

————————————————————————————–

The included variables are the region dummy for region 4, the unemployment rate, the percentage of the population that is immigrant, the labor force participation rate, the percentage of the population with high school or less education and a dummy indicating the riding is downtown. All of these variables except high school education and unemployment rate are significant at the 0.10 level of significance and have the expected signs. Even after these variables are controlled for, it remains the case that ridings the average of log number of projects received in in region 4, Quebec, iislkess than for Ontario.

1. Conduct a regression to determine whether ridings that voted conservative obtained more projects. (Hint: control for relevant socio-demographic factors)

The Stata command and results follow:

regresslog_NumberProjects i.conservative_08 i.regionunemployment_ratepercentage_immigrantparticipation_ratehigh_school

> _lower  downtown

Source |       SS       df       MS              Number of obs =     308

————-+——————————           F( 11,   296) =   29.02

Model |   85.977467    11  7.81613336Prob> F      =  0.0000

Residual |  79.7228817   296   .26933406           R-squared     =  0.5189

Total |  165.700349   307   .53974055           Root MSE      =  .51897

————————————————————————————–

log_NumberProjects |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

———————+—————————————————————-

1.conservative_08 |   .0135065    .077975     0.17   0.863    -.1399492    .1669621

|

region |

2  |   .1158196   .1239356     0.93   0.351     -.128087    .3597262

3  |   .2398577   .1018492     2.36   0.019     .0394174     .440298

4  |  -.6902288   .1238108    -5.57   0.000    -.9338899   -.4465678

5  |  -.1946797   .1630347    -1.19   0.233    -.5155338    .1261744

6  |   .5623275   .3432132     1.64   0.102    -.1131198    1.237775

|

unemployment_rate |   3.424306   1.476758     2.32   0.021     .5180295    6.330582

percentage_immigrant |   -2.75948   .2532239   -10.90   0.000    -3.257827   -2.261133

participation_rate|  -2.027376   .8167286    -2.48   0.014    -3.634707   -.4200456

high_school_lower |   .0086662    .498141     0.02   0.986    -.9716807     .989013

downtown |    1.03105    .176185     5.85   0.000     .6843156    1.377784

_cons |   5.290594   .6777025     7.81   0.000     3.956868     6.62432

————————————————————————————–

This question was put to the full sample, rather than just regions 3 and 4, and therefore the regional dummies were added to the regression as well as the dummy for went Conservative in ’08. The remaining independent variables are the same as for question 23.

Although the estimated coefficient on conservative_08, 0.0135065, iis positive, the p value for the estimate is 0.863, implying that there very little evidence here to conclude that the population effect of going conservative is different than 0.

1. Explain why you are keeping or dropping certain socio-demographic factors from your regression.

Because its p value is 0.98 (i.e., no evidence it is different than zero) I dropped the high school education regression from the following regression:

regresslog_NumberProjects i.conservative_08 i.regionunemployment_ratepercentage_immigrantparticipation_rate  downtown

Source |       SS       df       MS              Number of obs =     308

————-+——————————           F( 10,   297) =   32.03

Model |  85.9773855    10  8.59773855           Prob> F      =  0.0000

Residual |  79.7229633   297  .268427486           R-squared     =  0.5189

Total |  165.700349   307   .53974055           Root MSE      =   .5181

————————————————————————————–

log_NumberProjects |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

———————+—————————————————————-

1.conservative_08 |   .0136077   .0776265     0.18   0.861    -.1391599    .1663754

|

region |

2  |   .1162433   .1213143     0.96   0.339    -.1225013    .3549878

3  |   .2399847   .1014162     2.37   0.019     .0403994    .4395701

4  |   -.690597   .1217838    -5.67   0.000    -.9302654   -.4509285

5  |  -.1951041   .1609273    -1.21   0.226    -.5118064    .1215981

6  |   .5624509    .342562     1.64   0.102    -.1117054    1.236607

|

unemployment_rate |   3.431265    1.41916     2.42   0.016     .6383816    6.224148

percentage_immigrant |   -2.76117     .23346   -11.83   0.000    -3.220615   -2.301725

participation_rate |   -2.02994   .8019674    -2.53   0.012    -3.608199   -.4516815

downtown |   1.030233    .169525     6.08   0.000     .6966103    1.363855

_cons |   5.296436   .5876487     9.01   0.000     4.139954    6.452919

————————————————————————————–

.Dropping the education variable does not change the previous conclusion as the p value for the conservative in ’08 dummy is now 0.861. There is still very little evidence here to conclude that the population effect of going conservative is different than 0.

1. Explain in words the coefficients on a few significant variables.

The estimated coefficient on the unemployment rate is 3.43, and the rate is measured as a fraction of 100, so that a 3% unemployment rate is in the data as 0.03. The estimated coefficient indicates the change in the average of the log of the number of projects that would result from an increase of 1 in the unemployment rate, holding all other factors constant. Given that the model is linear and the unemployment rate is expressed as a proportion of 100, we should divide the estimated coefficient by 100 to obtain the correct interpretation that the average for the log of the number of projects increases by 0.0343 when the unemployment rate increases by 1 point (e.g., from 5% to 6%), all other factors held constant.

The estimated coefficient on region 4, Quebec, is -0.691. The estimated coefficient implies that a riding in Quebec would have an average log of number of projects that is 0.691 less than the omitted region, British Columbia, if all other variables were at the same values for the riding in Quebec and the riding in British Columbia.

The estimated coefficient for downtown indicates that a riding that is in an urban area will have an average log of number of projects that is 1.03 higher than a riding that is not in an urban area, all other factors held constant.