Multicolinearity
Solution
 Provide a correlation matrix of the variables under investigation and comment on the patterns of correlation you see amongst the variable as well as in relation to the predicted variable of interest.
 With the exception of depression, all variables showed a significant correlation with Startleblink. However, the highest correlation was between Startleblink vs. Antisaccade (r = 0.740; p <0.01); followed by Startleblink vs. Anxiety (r = 0.543; p <0.01) and Startleblink vs. PTSD (r = 0.543; p <0.01). Other significant correlations were observed between PTSD vs. Anxiety (r = 0.331; p <0.01), between PTSD vs. Antisaccade (r = 0.311; p <0.01) and Anxiety vs. Antisaccade.
Correlations  
Startleblink  PTSD  Anxiety  Depression  Antisaccade  
Startleblink  Pearson Correlation  1  .349^{**}  .543^{**}  .185  .740^{**} 
Sig. (2tailed)  .003  .000  .125  .000  
N  70  70  70  70  70  
PTSD  Pearson Correlation  .349^{**}  1  .331^{**}  .121  .311^{**} 
Sig. (2tailed)  .003  .005  .317  .009  
N  70  70  70  70  70  
Anxiety  Pearson Correlation  .543^{**}  .331^{**}  1  .303^{*}  .380^{**} 
Sig. (2tailed)  .000  .005  .011  .001  
N  70  70  70  70  70  
Depression  Pearson Correlation  .185  .121  .303^{*}  1  .079 
Sig. (2tailed)  .125  .317  .011  .518  
N  70  70  70  70  70  
Antisaccade  Pearson Correlation  .740^{**}  .311^{**}  .380^{**}  .079  1 
Sig. (2tailed)  .000  .009  .001  .518  
N  70  70  70  70  70  
**. Correlation is significant at the 0.01 level (2tailed).  
*. Correlation is significant at the 0.05 level (2tailed). 
Startleblink is the response variable or dependent variable
Independent variables: PTSD, Anxiety, Depression and Antisaccade
Would you be concerned about multicolinearity?
Multicollinearity refers to a strong correlation among several independent variables. To have a good statistical model it is interesting that the independent variables have low multicollinearity with the other independent variables, but also have high correlations with the dependent variable. In the present study, no high correlations were observed between the independent variables. However, there was a strong correlation between Startleblink vs. Antisaccade and a moderate correlation between Startleblink vs. Anxiety.
How and when does multicolinearity bias a regression model?
Moderate multicollinearity may not be problematic. However, severe multicollinearity is a problem because it can increase the variance of the coefficient estimates and make the estimates very sensitive to minor changes in the model. Fortunately, in our dataset we found low correlations indicating low multicollinearity.
The variance inflation factors (VIF), which indicate the extent to which multicollinearity is present in a regression analysis. A VIF of 5 or greater indicates a reason to be concerned about multicollinearity.
 Perform a standard regression analysis withthe IVs predicting startle blink response. Report on the effects observed, and comment on the contribution of the variables in predicting startle blink response.
Variables Entered/Removed^{a}  
Model  Variables Entered  Variables Removed  Method 
1  Antisaccade, Depression, PTSD, Anxiety^{b}  .  Enter 
a. Dependent Variable: Startleblink  
b. All requested variables entered. 
The summary table of the adjusted model is shown in the table below. A R^{2} = 0.635 was observed, that is, the adjusted model explains the startle blink response prediction by 63%. To know the contribution of each variable, it is sufficient to observe the correlation table between the independent variables and the dependent variable.
Model Summary  
Model  R  R Square  Adjusted R Square  Std. Error of the Estimate 
1  .797^{a}  .635  .612  1.94082 
a. Predictors: (Constant), Antisaccade, Depression, PTSD, Anxiety 
The adjusted model is shown below.
startleblink = 6.888 + 0.112* PTSD +0.088* Anxiety + 0.034* Depression + 0.018* Antisaccade
R^{2} = 0.635 and SEE = 1.94;
Coefficients^{a}  
Model  Unstandardized Coefficients  Standardized Coefficients  t  Sig.  
B  Std. Error  Beta  
1  (Constant)  6.888  1.341  5.138  .000  
PTSD  .112  .117  .080  .957  .342  
Anxiety  .088  .030  .265  2.974  .004  
Depression  .034  .042  .067  .824  .413  
Antisaccade  .018  .003  .609  7.336  .000  
a. Dependent Variable: Startleblink 
 Perform a stepwise regression analysis and compare your results with that of 2 above. How has the stepwise method altered the contribution of the independent variables? How can the regression analysis check for multicolinearity? And are there cases for concern?
The model developed using the stepwise regression analysis method is shown below.
Model Summary  
Model  R  R Square  Adjusted R Square  Std. Error of the Estimate 
1  .740^{a}  .547  .540  2.11259 
2  .792^{b}  .627  .616  1.93046 
a. Predictors: (Constant).Antisaccade  
b. Predictors: (Constant). Antisaccade. Anxiety 
As can be seen in the table below, the model has not been improved compared to the one above. But the new model presented new predictive variables.
startleblink = 6.592 + 0.019* Antisaccade +0.102* Anxiety
R^{2} = 0.635 and SEE = 1.93;
Compared to the previous model, the latter presents less independent variables and the same performance, therefore, is a better model for predicting startleblink.
Coefficients^{a}  
Model  Unstandardized Coefficients  Standardized Coefficients  t  Sig.  
B  Std. Error  Beta  
1  (Constant)  3.306  1.061  3.117  .003  
Antisaccade  .022  .002  .740  9.061  .000  
2  (Constant)  6.592  1.299  5.075  .000  
Antisaccade  .019  .002  .623  7.727  .000  
Anxiety  .102  .027  .306  3.800  .000  
a. Dependent Variable: Startleblink 
Checking for Multicollinearity – A condition index greater than 15 indicates a possible problem. An index greater than 30 suggests a serious problem with collinearity. Como se observanatabelaabaixo in coluna the Condition Index.There were no worrisome cases.
CollinearityDiagnostics^{a}  
Model  Dimension  Eigenvalue  Condition Index  Variance Proportions  
(Constant)  Antisaccade  Anxiety  
1  1  1.971  1.000  .01  .01  
2  .029  8.280  .99  .99  
2  1  2.948  1.000  .00  .01  .00 
2  .033  9.507  .14  .99  .19  
3  .019  12.374  .86  .00  .80  
a. Dependent Variable: Startleblink 
 Assume that you are interested in assessing the contribution of PTSD first and Antisaccade (inhibitory control) on the last step. Comment on the contribution of each model and the Fchange and R^{2} change on every step. How do the results of this model compare with the Stepwise model?
PTSD was included only in the first model. In the model developed using the Stepwise was removed, therefore, presents a low contribution to the model. The Antisaccade was present in all models, presents a greater contribution to explain the appearance of Startleblink.
 Write a brief summary on the role of PTSD in explaining startle blink response in soldiers when exposed to upsetting scenes. What did the analysis as the most important contributor?
The PTSD presented low correlation with Startleblink and was removed from the model by stepwise method. The most important contributor to the model was the Antisaccade (r = 0.740; p <0.01);
 Using regression equation of the final model of the hierarchical method (as 4 above), if a soldier score 8 on PTSD, 46 Anxiety, 20 on Depression, and had an antisaccade latency of 301 msec, what would the model predict for his startle blink response?
startleblink = 6.888 + 0.112* PTSD +0.088* Anxiety + 0.034* Depression + 0.018* Antisaccade
R^{2} = 0.635 and SEE = 1.94;
startleblink = 6.888 + 0.112* 8+0.088* 46 + 0.034* 20 + 0.018* 301
startleblink = 4.154
R^{2} = 0.635 and SEE = 1.94;
Logistic model
1.
The exploratory data analysis no missing cases were observed, as shown in the table “Case Processing Summary”.
Case Processing Summary  
UnweightedCases^{a}  N  Percent  
Selected Cases  Included in Analysis  30  100.0 
Missing Cases  0  .0  
Total  30  100.0  
Unselected Cases  0  .0  
Total  30  100.0  
a. If weight is in effect. see classification table for the total number of cases.

The table below shows that 1 = relapsed and 0 = no relapse.
Dependent Variable Encoding  
Original Value  Internal Value 
no relapse  0 
relapsed  1 
In the table below, we have both observed and predicted variable. The dependent variable are indicated by the number of 0’s and 1’s. In the dataset was founded (56.7%) cases of relapsedpatiente. This gives the percent of cases for which the dependent variables that was correctly predicted given the model.
Block 0: Beginning Block
Classification Tablea.b  
Observed  Predicted  
relapse  Percentage Correct  
no relapse  relapsed  
Step 0  relapse  no relapse  0  13  .0  
relapsed  0  17  100.0  
Overall Percentage  56.7  
a. Constant is included in the model.  
b. The cut value is .500 
The null model is showed in the next table and we have just the constant coefficient or intercept, indicated by B. Also, is showed the standard error (S.E.) around the coefficient for the constant. Moreover, the Wald chisquare test that tests the null hypothesis that the constant equals 0. The result show that the null hypothesis was not rejected because the pvalue (0.467) was greater than the critical pvalue of .05. Hence, we conclude that the intercept is 0. Usually, this finding is not of interest to researchers. Exp(B) is the exponentiation of the B coefficient, which is an odds ratio: 1.308.This value is given by default because odds ratios can be easier to interpret than the coefficient.
Variables in the Equation  
B  S.E.  Wald  df  Sig.  Exp(B)  
Step 0  Constant  .268  .368  .530  1  .467  1.308 
In the next table, the Score test was used to predict whether an independent variable would be significant in the model. Looking at the pvalues (located in the column labelled “Sig.”), we can see in that just was WMC (p<0.001) and FE (p = 0.003) was a predictors significant statistically. In overall statistics itis shown the result of including all predictors in the model.
Variables not in the Equation  
Score  df  Sig.  
Step 0  Variables  WMC  20.865  1  .000 
FE  8.967  1  .003  
Severity  7.805  9  .554  
Severity(1)  2.802  1  .094  
Severity(2)  2.802  1  .094  
Severity(3)  .136  1  .713  
Severity(4)  .027  1  .869  
Severity(5)  .084  1  .773  
Severity(6)  .632  1  .427  
Severity(7)  .039  1  .844  
Severity(8)  .136  1  .713  
Severity(9)  .136  1  .713  
Overall Statistics  23.335  11  .016 
Block 1: Method = Forward Stepwise
Omnibus Tests of Model Coefficients  
Chisquare  df  Sig.  
Step 1  Step  31.059  3  .000 
Block  31.059  3  .000  
Model  31.059  3  .000 
 Comment oh the significance of the model(s) tested, the amount of variation explained by the model(s), and the – 2LLs of interest.
The – 2LLs is shown in the table below.
Model Summary  
Step  2 Log likelihood  Cox & Snell R Square  Nagelkerke R Square 
1  9.995^{a}  .645  .865 
a. Estimation terminated at iteration number 8 because parameter estimates changed by less than .001. 
In the next table, is shown the forward stepwisemethod(or model) with predictors. The value given in the Sig. column is the probability of obtaining the chisquare statistic given that the null hypothesis is true. We have just one step and we can see a high chisquare statistic (29.521). The Sig. pvalue which is compared to a critical value(.05 or .01) to determine if the overall model is statistically significant. In this case, the model is statistically significant because the pvalue is less than the significance level.
Omnibus Tests of Model Coefficients  
Chisquare  df  Sig.  
Step 1  Step  29.521  1  .000 
Block  29.521  1  .000  
Model  29.521  1  .000 
In the next table, it is shown the values for the logistic regression equation for predicting the dependent variable from the independent variable. They are in logodds units. The prediction equation is:
log(p/1p) = 12.077 4.241* WMC
where p is the probability of a patient being discharged by depression.
 Which variable(s) predict the likelihood of relapse significantly? Provide an interpretation of the significance of Exp(B) of the significant variable(s).
These coefficients are in logodds units, they are often difficult to interpret, so they are often converted into odds ratios. You can do this by hand by exponentiation the coefficient, or by looking at the column labelled “Exp(B)”.
The constant is the expected value of the logodds of relapse when all of the predictor variables are equal zero.
 For every oneunit increase in WMC score, we expect a 4.241 decrease in the logodds of relapse.
Variables in the Equation  
B  S.E.  Wald  df  Sig.  Exp(B)  
Step 1^{a}  WMC  4.241  1.569  7.307  1  .007  .014 
Constant  12.077  4.764  6.428  1  .011  175795.687  
a. Variable(s) entered on step 1: WMC. 
ROC curve
A measure of goodnessoffit often used to evaluate the fit of a logistic regression model is based on the simultaneous measure of sensitivity (True positive) and specificity (True negative) for all possible cutoff points. First, we calculate sensitivity and specificity pairs for each possible cutoff point and plot sensitivity on the y axis by (1specificity) on the x axis. This curve is called the receiver operating characteristic (ROC) curve. The area under the ROC curve ranges from 0.5 and 1.0 with larger values indicative of better fit.
Area Under the Curve  
Test Result Variable(s): Predicted probability  
Area  Std. Error^{a}  Asymptotic Sig.^{b}  Asymptotic 95% Confidence Interval  
Lower Bound  Upper Bound  
.950  .049  .000  .854  1.000 
a. Under the nonparametric assumption  
b. Null hypothesis: true area = 0.5 
SPSS output shows ROC curve. The area under the curve is 0.95 with 95% confidence interval (.854, 0.999). Also, the area under the curve is significantly different from 0.5 since pvalue is 0.000 meaning that the logistic regression classifies the group significantly better than by chance.