LR Logistic Regression Analysis

LR Logistic Regression Analysis

1. Conduct a Forward: LR logistic regression analysis with the following variables: IV—age, educ, hrsl, sibs, rincom91, life2 (categorical) DV—satjob2 Note: The variable life2 is categorical such that dull = 1, routine/exciting = 2, and all other values are system missing. 1. Develop a research question for the preceding scenario.
2. Conduct a preliminary Linear Regression to identify outliers and evaluate multicollinearity among the five continuous variables. Complete the following: a. Using the Chi-Square table in Appendix B near the end of this book, identify the critical value at p < .001 for identifying outliers. Use Explore to determine if there are outliers. Which cases should be eliminated? b. Is multicollinearity a problem among the five continuous variables?
3. Conduct Binary Logistic Regression using the Forward: LR method.IV—age, educ, hrsl, sibs, rincom91, life2 (categorical; last is the reference category) DV—satjob2 Note: Make sure that any outliers identified in Exercise 2.a. are removed from data before running the logistic regression. Also, designating life2 as a categorical covariate with the last category as the reference, essentially makes routine/exciting = 0 and dull = 1, so interpret the results accordingly.
a. Which variables were entered into the model?
b. To what degree does the model fit the data? Explain.
c. Is the generated model significantly different from the constant-only model?
d. How accurate is the model in predicting job satisfaction?
e. What are the odds ratios for the model variables? Explain.

Solution

Main report

Conduct a Forward: LR logistic regression analysis with the following variables: IV—age, educ, hrsl, sibs, rincom91, life2 (categorical) DV—satjob2 Note: The variable life2 is categorical such that dull = 1, routine/exciting = 2, and all other values are system missing.

1. Develop a research question for the preceding scenario.
The main research question of this report relates to the connection between self-assessment of work as dull or exciting and job satisfaction. More exactly, we hypothesize that exciting work leads to higher propensity for being satisfied with the job, controlling for typical determinants such as age, education, income, hours worked, and number of siblings.

2. Conduct a preliminary Linear Regression to identify outliers and evaluate multicollinearity among the five continuous variables. Complete the following:
a. Using the Chi-Square table in Appendix B near the end of this book, identify the critical value at p < .001 for identifying outliers. Use Explore to determine if there are outliers. Which cases should be eliminated?
Sig the graphical representation (the boxplot) it results there is no outlier for income, but on all other continuous IVs there are several cases with extreme values, far from the central tendency. The appendix to this report lists the extreme cases for each IV. However, they are not necessarily outliers. Boxplots reveal the outliers.
As indicated in Mertler& Reinhart (2016: 41-43), even more information is provided in the leaf-and-steam plots (not shown). For those being very satisfied, outliers include five cases with age greater of equal to 77, two cases with 5 or less years of education, 30 cases with 8 or more siblings, as well as 22 cases with 67 or more worked hours in the preceding week and 56 with less than 18 hours worked last week.
For those being not very satisfied, outliers include four cases with age equal or greater than 73, one case with 2 or less years of education, 39 cases with 8 or more siblings, as well as 17 cases with 70 or more worked hours in the preceding week and 31 cases with 18 or less worked hours per week.
Figure 1. Boxplot for independent continuous variables in the model

All the above described cases will be excluded from further analysis. Their number is 230, representing 15% of all cases, which satisfies Allison´s (2000) missingness threshold for not being necessary to use an imputation method.

b. Is multicollinearity a problem among the five continuous variables?
To assess multicollinearity, all IV are used in an OLS model. life was preliminary recoded into a dummy variable called life01. As shown in Table 1, collinearity coefficients are in acceptable limits. For instance, all tolerances are quite high, with the smallest being 0.683. We conclude multicollinearity is not a problem.
Table 1Collinearity coefficients for the independent variables
Model Collinearity Statistics
Tolerance VIF
1 age Age of Respondent .898 1.113
educ Highest Year of School Completed .880 1.136
hrs1 Number of Hours Worked Last Week .829 1.207
sibs NUMBER OF BROTHERS AND SISTERS .972 1.029
rincom91 RESPONDENTS INCOME .683 1.464
life01 .974 1.027
a. Dependent Variable: id Arbitrary Id numbers

3. Conduct Binary Logistic Regression using the Forward: LR method.IV—age, educ, hrsl, sibs, rincom91, life2 (categorical; last is the reference category) DV—satjob2 Note: Make sure that any outliers identified in Exercise 2.a. are removed from data before running the logistic regression. Also, designating life2 as a categorical covariate with the last category as the reference, essentially makes routine/exciting = 0 and dull = 1, so interpret the results accordingly.
a. Which variables were entered into the model?
The forward logistic regression retained only income to the model.
Table 2 Variables not in the Equation
Score df Sig.
Step 1 Variables Age of Respondent 2.498 1 .114
Highest Year of School Completed 1.660 1 .198
Number of Hours Worked Last Week .048 1 .826
NUMBER OF BROTHERS AND SISTERS .067 1 .795
life2(1) 3.785 1 .052
Overall Statistics 9.206 5 .101

b. To what degree does the model fit the data? Explain.
The model explains very little of the total variation in levels of satjob2. The Cox & Snell R2 is only .02, while the Nagelkerke pseudo-R2 is .027. This means the model explains less than 3% of the variation of the dependent variable.
c. Is the generated model significantly different from the constant-only model?
Regression results indicated that the overall model of one predictor (rincome) was statistically reliable in distinguishing between those satisfied and not very satisfied [–2 Log Likelihood = 631.44, χ2(2) = 9.694, p =. 002<.05].

d. How accurate is the model in predicting job satisfaction?
The model correctly classified only 57.4% of the cases. For those ¨very satisfied¨ the percentage is as low as 19%. Accuracy is rather not achieved.
Table 3 Classification Table
Observed Predicted
satjob2 Job Satisfaction Percentage Correct
1 Very satisfied 2 Not very satisfied
Step 1 satjob2 Job Satisfaction 1 Very satisfied 39 163 19.3
2 Not very satisfied 37 230 86.1
Overall Percentage 57.4
The cut value is .500

e. What are the odds ratios for the model variables? Explain.
The odds ratio for rincome is very close to 1, indicating a poor effect (OR=1.063). There is little change in likelihood of being very satisfied depending on income.
Table 4 Regression coefficients
B S.E. Wald df Sig. Exp(B) 95% C.I.for EXP(B)
Lower Upper
Step 1a RESPONDENTS INCOME .061 .020 9.334 1 .002 1.063 1.022 1.105
Constant -1.128 .296 14.530 1 .000 .324
a. Variable(s) entered on step 1: RESPONDENTS INCOME.
References
Allison, P. D. (2001). Missing Data, Thousand Oaks, CA: Sage.
Mertler, C. A., & Reinhart, R. V. (2016).Advanced and multivariate statistical methods: Practical application and interpretation. London: Routledge.Appendix. Extreme cases for each continuous variable
Extreme Values
satjob2 Job Satisfaction
1 Very satisfied 2 Not very satisfied
Case Number id Arbitrary Id numbers Value Case Number id Arbitrary Id numbers Value
age Age of Respondent Highest 1 679 679 82 401 401 78
2 1129 1129 79 1451 1451 75
3 407 407 78 780 780 74
4 501 501 77 570 570 73
5 805 805 77 112 112 72h
Lowest 1 1116 1116 19 1393 1393 19
2 310 310 19 306 306 19
3 1413 1413 20 1027 1027 20
4 1055 1055 20 1018 1018 20
5 732 732 20a 619 619 20a
educ Highest Year of School Completed Highest 1 133 133 20 64 64 20
2 139 139 20 149 149 20
3 151 151 20 152 152 20
4 192 192 20 285 285 20
5 450 450 20b 449 449 20b
Lowest 1 406 406 0 689 689 2
2 25 25 5 1432 1432 6
3 868 868 6 766 766 6
4 466 466 6 708 708 6
5 351 351 6c 122 122 6c
hrs1 Number of Hours Worked Last Week Highest 1 41 41 89 26 26 80
2 1214 1214 89 795 795 80
3 100 100 80 898 898 80
4 127 127 80 1073 1073 80
5 272 272 80d 1132 1132 80
Lowest 1 1129 1129 2 808 808 2
2 388 388 2 687 687 3
3 1150 1150 4 915 915 5
4 1170 1170 5 1423 1423 7
5 736 736 5 1210 1210 8i
sibs NUMBER OF BROTHERS AND SISTERS Highest 1 121 121 19 50 50 21
2 406 406 19 632 632 14
3 268 268 16 1343 1343 13
4 264 264 13 1458 1458 13
5 750 750 13 596 596 12j
Lowest 1 1394 1394 0 1441 1441 0
2 1302 1302 0 1327 1327 0
3 1223 1223 0 1322 1322 0
4 1173 1173 0 1269 1269 0
5 1170 1170 0e 1132 1132 0e
rincom91 RESPONDENTS INCOME Highest 1 108 108 22 4 4 22
2 127 127 22 18 18 22
3 148 148 22 91 91 22
4 223 223 22 136 136 22
5 226 226 22f 149 149 22f
Lowest 1 1348 1348 1 1423 1423 1
2 1055 1055 1 1334 1334 1
3 736 736 1 1236 1236 1
4 548 548 1 1192 1192 1
5 388 388 1g 1063 1063 1g

a. Only a partial list of cases with the value 20 are shown in the table of lower extremes.
b. Only a partial list of cases with the value 20 are shown in the table of upper extremes.
c. Only a partial list of cases with the value 6 are shown in the table of lower extremes.
d. Only a partial list of cases with the value 80 are shown in the table of upper extremes.
e. Only a partial list of cases with the value 0 are shown in the table of lower extremes.
f. Only a partial list of cases with the value 22 are shown in the table of upper extremes.
g. Only a partial list of cases with the value 1 are shown in the table of lower extremes.
h. Only a partial list of cases with the value 72 are shown in the table of upper extremes.
i. Only a partial list of cases with the value 8 are shown in the table of lower extremes.
j. Only a partial list of cases with the value 12 are shown in the table of upper extremes.