Multiple Regression Equations
In this homework you will examine regression assumptions in multiple regression equations present given the dataset posted to Moodle for this week. Use the notes and other resources posted in Moodle for this week to support your investigation. Report all relevant tables and charts to justify your work and be sure they are formatted to class expectations.
Response variable: The response variable is also known as the dependent variable which all other independent or explanatory variables depends on. Hence, for the purpose of this study the selected response variable is total expenditure because all other variables depend on this. The number of admission will increase the total expenditure or reduce it. Similarly, number of births and personnel were all depending on the total expenditure.
Descriptive statistics Estimates
Table 1: Descriptive statistics
Admission | Births | Payroll Exp | Personnel | Tot Exp | |
---|---|---|---|---|---|
Mean | 6831.84 | 874.05 | 30500.89 | 861.5 | 67139.81 |
Median | 4777 | 480 | 20739.5 | 589.5 | 43364.5 |
Variance | 44171146 | 1131385 | 1.07E+09 | 675021.6 | 4.95E+09 |
Std deviation | 6646.138 | 1063.67 | 32715.84 | 821.597 | 70386.44 |
Skewness | 1.6109 | 1.5898 | 2.2307 | 1.7909 | 2.0073 |
Kurtosis | 3.0956 | 2.5195 | 6.0754 | 3.1891 | 4.5156 |
Minimum | 111 | 0 | 1053 | 50 | 2082 |
Maximum | 37375 | 5691 | 188865 | 4087 | 367706 |
A normal bell shaped distribution has exactly 3. A distribution with kurtosis less than 3 is called platykurtic compared to a normal distribution, its tail are shorter and thinner and often its central peak is lower and broader. From the descriptive table above payroll expense has a kurtosis value closer to 3 as we can see above; hence its distribution will likely be bell-shaped.
Correlation matrix
Table 2: correlation with response variable
Admissions | Births | Payroll Exp. | Personnel | Tot. Exp. | |
---|---|---|---|---|---|
Admissions | 1 | ||||
Births | 0.855624 | 1 | |||
Payroll Exp. | 0.848209 | 0.659576079 | 1 | ||
Personnel | 0.879457 | 0.697463374 | 0.95187 | 1 | |
Tot. Exp. | 0.90249 | 0.713219085 | 0.982541 | 0.964709 | 1 |
From the correlation table above the variable payroll expense has the highest correlation with the response variable with the correlation coefficient between both variable (r=0.9825). This means there is a strong positive correlation between both variable total expenditure and payroll expense.
Scatter plot between Total expenditure and payroll expense.
Evidence of Multi collinearity
Table 3: Correlation matrix of independent variable
Admissions | Births | Payroll Exp. | Personnel | |
---|---|---|---|---|
Admissions | 1 | |||
Births | 0.855624455 | 1 | ||
Payroll Exp. | 0.848209291 | 0.659576 | 1 | |
Personnel | 0.879456785 | 0.697463 | 0.951870085 | 1 |
Multicollinearity is a phenomenon in which one predictor variable in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy. From the table above, there is multicollinearity between payroll expense and personnel. Similarly, there is multicollinearity between Admission and Births, Admission and Payroll expense, Admission and personnel with the correlation coefficients between these variables greater than 0.80.
Multiple Regression Assumptions
No Multicollinearity: The greatest assumptions of multiple regression analysis are that the independent variables were not highly correlated. From the table 3 above which shows the correlation matrix of all independent variable in this study, we observe a high correlation between pairs of independent variable in this study which means this assumption is violated.
Linearity: This assumption states that there must be linear relationship between the response variable and the independent variable this assumption is not violated.
Homoscedasticity: The variance of error terms is identical around the values of the independent variables, according to this assumption. The distribution of points across all values of the independent variables can be determined by plotting uniform residuals versus expected values. This assumption is also not violated.
Multiple regression equation
Table 4: Multiple regression model
Coefficient | B(Std. Err.) | t-value | p-value |
---|---|---|---|
Intercept | -2607.73(934.288) | -2.791 | 0.006 |
Admission Births Payroll Exp Personnel | 2.493(0.285)-1.781(1.179)-1.409(0.063)13.101(2.793) | 8.748 -1.510 22.261 4.691 | 1.02E-15 0.133 1.92E-55 5.11E-06 |
R-SquareAdjusted R-SquareMultiple R |
0.9844 0.9841 0.9922 |
||
F-Value | 3076.655 | ||
Pr(F>0) | 6.5E-175 |
The table 3 above shows the multiple regression models between the dependent variable and the independent variables. The regression model is significant with (F4,195=3076.655, p-value = 6.5E-175) with the p-value of the model lesser than 0.05 level of significance we establish the fact that the model is significant. The coefficient of determination R-square is the amount of variability in the regression model that the independent variables caused by the independent variable in the model. The R-square was computed to be 0.984 which means 98.4% of the variation in the model can be accounted for by the independent variables. Furthermore the test of significance of the independent variables indicate that all variables were significant expect births which has p-value greater than 0.05 level of significance. Lastly the multiple regression equation for this model can be written as Total Exp = -2607.73 +〖2.493〗_Admissions–〖1.781〗_Births–〖1.409〗_(payroll expense)+〖13.101〗_personnel