# Multicolinearity   Solution

1. Provide a correlation matrix of the variables under investigation and comment on the patterns of correlation you see amongst the variable as well as in relation to the predicted variable of interest.
2. With the exception of depression, all variables showed a significant correlation with Startleblink. However, the highest correlation was between Startleblink vs. Antisaccade (r = 0.740; p <0.01); followed by Startleblink vs. Anxiety (r = 0.543; p <0.01) and Startleblink vs. PTSD (r = 0.543; p <0.01). Other significant correlations were observed between PTSD vs. Anxiety (r = 0.331; p <0.01), between PTSD vs. Antisaccade (r = 0.311; p <0.01) and Anxiety vs. Antisaccade.
 Correlations Startleblink PTSD Anxiety Depression Antisaccade Startleblink Pearson Correlation 1 .349** .543** .185 .740** Sig. (2-tailed) .003 .000 .125 .000 N 70 70 70 70 70 PTSD Pearson Correlation .349** 1 .331** -.121 .311** Sig. (2-tailed) .003 .005 .317 .009 N 70 70 70 70 70 Anxiety Pearson Correlation .543** .331** 1 .303* .380** Sig. (2-tailed) .000 .005 .011 .001 N 70 70 70 70 70 Depression Pearson Correlation .185 -.121 .303* 1 .079 Sig. (2-tailed) .125 .317 .011 .518 N 70 70 70 70 70 Antisaccade Pearson Correlation .740** .311** .380** .079 1 Sig. (2-tailed) .000 .009 .001 .518 N 70 70 70 70 70 **. Correlation is significant at the 0.01 level (2-tailed). *. Correlation is significant at the 0.05 level (2-tailed).

Startleblink is the response variable or dependent variable

Independent variables: PTSD, Anxiety, Depression and Antisaccade

Would you be concerned about multicolinearity?

Multicollinearity refers to a strong correlation among several independent variables. To have a good statistical model it is interesting that the independent variables have low multicollinearity with the other independent variables, but also have high correlations with the dependent variable. In the present study, no high correlations were observed between the independent variables. However, there was a strong correlation between Startleblink vs. Antisaccade and a moderate correlation between Startleblink vs. Anxiety.

How and when does multicolinearity bias a regression model?

Moderate multicollinearity may not be problematic. However, severe multicollinearity is a problem because it can increase the variance of the coefficient estimates and make the estimates very sensitive to minor changes in the model. Fortunately, in our dataset we found low correlations indicating low multicollinearity.

The variance inflation factors (VIF), which indicate the extent to which multicollinearity is present in a regression analysis. A VIF of 5 or greater indicates a reason to be concerned about multicollinearity.

1. Perform a standard regression analysis withthe IVs predicting startle blink response. Report on the effects observed, and comment on the contribution of the variables in predicting startle blink response.
 Variables Entered/Removeda Model Variables Entered Variables Removed Method 1 Antisaccade, Depression, PTSD, Anxietyb . Enter a. Dependent Variable: Startleblink b. All requested variables entered.

The summary table of the adjusted model is shown in the table below. A R2 = 0.635 was observed, that is, the adjusted model explains the startle blink response prediction by 63%. To know the contribution of each variable, it is sufficient to observe the correlation table between the independent variables and the dependent variable.

 Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate 1 .797a .635 .612 1.94082 a. Predictors: (Constant), Antisaccade, Depression, PTSD, Anxiety

The adjusted model is shown below.

startleblink = -6.888 + 0.112* PTSD +0.088* Anxiety + 0.034* Depression + 0.018* Antisaccade

R2 = 0.635 and SEE = 1.94;

 Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -6.888 1.341 -5.138 .000 PTSD .112 .117 .080 .957 .342 Anxiety .088 .030 .265 2.974 .004 Depression .034 .042 .067 .824 .413 Antisaccade .018 .003 .609 7.336 .000 a. Dependent Variable: Startleblink

1. Perform a stepwise regression analysis and compare your results with that of 2 above. How has the stepwise method altered the contribution of the independent variables? How can the regression analysis check for multicolinearity? And are there cases for concern?

The model developed using the stepwise regression analysis method is shown below.

 Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate 1 .740a .547 .540 2.11259 2 .792b .627 .616 1.93046 a. Predictors: (Constant).Antisaccade b. Predictors: (Constant). Antisaccade. Anxiety

As can be seen in the table below, the model has not been improved compared to the one above. But the new model presented new predictive variables.

R2 = 0.635 and SEE = 1.93;

Compared to the previous model, the latter presents less independent variables and the same performance, therefore, is a better model for predicting startleblink.

 Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -3.306 1.061 -3.117 .003 Antisaccade .022 .002 .740 9.061 .000 2 (Constant) -6.592 1.299 -5.075 .000 Antisaccade .019 .002 .623 7.727 .000 Anxiety .102 .027 .306 3.800 .000 a. Dependent Variable: Startleblink

Checking for Multicollinearity – A condition index greater than 15 indicates a possible problem. An index greater than 30 suggests a serious problem with collinearity. Como se observanatabelaabaixo in coluna the Condition Index.There were no worrisome cases.

 CollinearityDiagnosticsa Model Dimension Eigenvalue Condition Index Variance Proportions (Constant) Antisaccade Anxiety 1 1 1.971 1.000 .01 .01 2 .029 8.280 .99 .99 2 1 2.948 1.000 .00 .01 .00 2 .033 9.507 .14 .99 .19 3 .019 12.374 .86 .00 .80 a. Dependent Variable: Startleblink
1. Assume that you are interested in assessing the contribution of PTSD first and Antisaccade (inhibitory control) on the last step. Comment on the contribution of each model and the F-change and R2 change on every step. How do the results of this model compare with the Stepwise model?

PTSD was included only in the first model. In the model developed using the Stepwise was removed, therefore, presents a low contribution to the model. The Antisaccade was present in all models, presents a greater contribution to explain the appearance of Startleblink.

1. Write a brief summary on the role of PTSD in explaining startle blink response in soldiers when exposed to upsetting scenes. What did the analysis as the most important contributor?

The PTSD presented low correlation with Startleblink and was removed from the model by stepwise method. The most important contributor to the model was the Antisaccade (r = 0.740; p <0.01);

1. Using regression equation of the final model of the hierarchical method (as 4 above), if a soldier score 8 on PTSD, 46 Anxiety, 20 on Depression, and had an antisaccade latency of 301 msec, what would the model predict for his startle blink response?

startleblink = -6.888 + 0.112* PTSD +0.088* Anxiety + 0.034* Depression + 0.018* Antisaccade

R2 = 0.635 and SEE = 1.94;

startleblink = -6.888 + 0.112* 8+0.088* 46 + 0.034* 20 + 0.018* 301

R2 = 0.635 and SEE = 1.94;

Logistic model

1.

The exploratory data analysis no missing cases were observed, as shown in the table “Case Processing Summary”.

 Case Processing Summary UnweightedCasesa N Percent Selected Cases Included in Analysis 30 100.0 Missing Cases 0 .0 Total 30 100.0 Unselected Cases 0 .0 Total 30 100.0 a. If weight is in effect. see classification table for the total number of cases.

The table below shows that 1 = relapsed and 0 = no relapse.

 Dependent Variable Encoding Original Value Internal Value no relapse 0 relapsed 1

In the table below, we have both observed and predicted variable. The dependent variable are indicated by the number of 0’s and 1’s. In the dataset was founded (56.7%) cases of relapsedpatiente. This gives the percent of cases for which the dependent variables that was correctly predicted given the model.

Block 0: Beginning Block

 Classification Tablea.b Observed Predicted relapse Percentage Correct no relapse relapsed Step 0 relapse no relapse 0 13 .0 relapsed 0 17 100.0 Overall Percentage 56.7 a. Constant is included in the model. b. The cut value is .500

The null model is showed in the next table and we have just the constant coefficient or intercept, indicated by B. Also, is showed the standard error (S.E.) around the coefficient for the constant. Moreover, the Wald chi-square test that tests the null hypothesis that the constant equals 0. The result show that the null hypothesis was not rejected because the p-value (0.467) was greater than the critical p-value of .05.  Hence, we conclude that the intercept is 0. Usually, this finding is not of interest to researchers. Exp(B) is the exponentiation of the B coefficient, which is an odds ratio: 1.308.This value is given by default because odds ratios can be easier to interpret than the coefficient.

 Variables in the Equation B S.E. Wald df Sig. Exp(B) Step 0 Constant .268 .368 .530 1 .467 1.308

In the next table, the Score test was used to predict whether an independent variable would be significant in the model. Looking at the p-values (located in the column labelled “Sig.”), we can see in that just was WMC (p<0.001) and FE (p = 0.003) was a predictors significant statistically. In overall statistics itis shown the result of including all predictors in the model.

 Variables not in the Equation Score df Sig. Step 0 Variables WMC 20.865 1 .000 FE 8.967 1 .003 Severity 7.805 9 .554 Severity(1) 2.802 1 .094 Severity(2) 2.802 1 .094 Severity(3) .136 1 .713 Severity(4) .027 1 .869 Severity(5) .084 1 .773 Severity(6) .632 1 .427 Severity(7) .039 1 .844 Severity(8) .136 1 .713 Severity(9) .136 1 .713 Overall Statistics 23.335 11 .016

Block 1: Method = Forward Stepwise

 Omnibus Tests of Model Coefficients Chi-square df Sig. Step 1 Step 31.059 3 .000 Block 31.059 3 .000 Model 31.059 3 .000
1. Comment oh the significance of the model(s) tested, the amount of variation explained by the model(s), and the – 2LLs of interest.

The – 2LLs is shown in the table below.

 Model Summary Step -2 Log likelihood Cox & Snell R Square Nagelkerke R Square 1 9.995a .645 .865 a. Estimation terminated at iteration number 8 because parameter estimates changed by less than .001.

In the next table, is shown the forward stepwisemethod(or model) with predictors. The value given in the Sig. column is the probability of obtaining the chi-square statistic given that the null hypothesis is true. We have just one step and we can see a high chi-square statistic (29.521). The Sig. p-value which is compared to a critical value(.05 or .01) to determine if the overall model is statistically significant.  In this case, the model is statistically significant because the p-value is less than the significance level.

 Omnibus Tests of Model Coefficients Chi-square df Sig. Step 1 Step 29.521 1 .000 Block 29.521 1 .000 Model 29.521 1 .000

In the next table, it is shown the values for the logistic regression equation for predicting the dependent variable from the independent variable. They are in log-odds units. The prediction equation is:

log(p/1-p) = 12.077 -4.241* WMC

where p is the probability of a patient being discharged by depression.

1. Which variable(s) predict the likelihood of relapse significantly? Provide an interpretation of the significance of Exp(B) of the significant variable(s).

These coefficients are in log-odds units, they are often difficult to interpret, so they are often converted into odds ratios. You can do this by hand by exponentiation the coefficient, or by looking at the column labelled “Exp(B)”.

The constant is the expected value of the log-odds of relapse when all of the predictor variables are equal zero.

1. For every one-unit increase in WMC score, we expect a -4.241 decrease in the log-odds of relapse.
 Variables in the Equation B S.E. Wald df Sig. Exp(B) Step 1a WMC -4.241 1.569 7.307 1 .007 .014 Constant 12.077 4.764 6.428 1 .011 175795.687 a. Variable(s) entered on step 1: WMC.

ROC curve

A measure of goodness-of-fit often used to evaluate the fit of a logistic regression model is based on the simultaneous measure of sensitivity (True positive) and specificity (True negative) for all possible cutoff points. First, we calculate sensitivity and specificity pairs for each possible cutoff point and plot sensitivity on the y axis by (1-specificity) on the x axis. This curve is called the receiver operating characteristic (ROC) curve. The area under the ROC curve ranges from 0.5 and 1.0 with larger values indicative of better fit.

 Area Under the Curve Test Result Variable(s):   Predicted probability Area Std. Errora Asymptotic Sig.b Asymptotic 95% Confidence Interval Lower Bound Upper Bound .950 .049 .000 .854 1.000 a. Under the nonparametric assumption b. Null hypothesis: true area = 0.5

SPSS output shows ROC curve. The area under the curve is 0.95 with 95% confidence interval (.854, 0.999). Also, the area under the curve is significantly different from 0.5 since p-value is 0.000 meaning that the logistic regression classifies the group significantly better than by chance. 