A Comprehensive Statistical Analysis of the Causal Impact of Quitting Smoking on Mortality

Section 1 - Fitting Crude and Adjusted Logistic Regression Models:

Problem Statement:

In this section, we establish two distinct logistic regression models, the crude and adjusted models, to evaluate the association between quitting smoking and mortality. The crude model examines the unadjusted relationship, while the adjusted model accounts for factors like sex, race, age, education, and exercise.

Solution

We discover that the initial correlation between quitting smoking and death is confounded by these covariates, which need to be considered for a comprehensive understanding of the association.

Crude Logistic Regression Model:

Model: Logistic regression model without adjusting for other variables.

Coefficients:

Intercept: -1.57174
qsmk: 0.33959
Interpretation: Individuals who quit smoking have a 1.40 times higher odds of death compared to those who did not quit smoking after adjusting for other variables.

Adjusted Logistic Regression Model:

Model: Logistic regression model adjusting for sex, race, age, education, and exercise.

Coefficients:

Intercept: -3.06681
qsmk: -0.02304
Additional covariates: (sex, race, age, education, exercise)
Interpretation: After adjusting for covariates, there is no statistically significant association between quitting smoking and the odds of death.

Comparing these models suggests that the initial association observed in the crude model is confounded by sex, race, age, education, and exercise.

Section 2 - Binned Residual Plot Analysis:

Problem Statement:

The aim is to assess the model's performance and potential non-linearity, a binned residual plot is employed.

Solution:

The plot displays the residuals against predicted probabilities, revealing interesting patterns, such as an inverted U-shape. This suggests the presence of unmeasured or residual confounding factors and a tendency to over-predict death probabilities. These findings are critical for making predictions and further analysis.

The binned residual plot is a valuable tool for assessing the fit of a logistic regression model. It helps visualize the residuals against predicted probabilities, providing insights into the model's performance. An inverted U-shaped pattern in the plot suggests non-linearity in the relationship between predicted probabilities and the observed outcome, indicating potential unmeasured or residual confounding factors. Furthermore, consistently negative residuals in some bins indicate over-prediction of death probabilities. These insights are essential for making predictions or conducting further analyses.

Section 3 - Denominator and Numerator Models for IPTW Calculation:

Problem Statement:

In this section, we build two models to calculate Inverse Probability of Treatment Weights (IPTW).

Solution:

The denominator model predicts quitting smoking while considering factors such as sex, race, age, education, and exercise, to determine the propensity of quitting smoking. The numerator model estimates the probability of quitting smoking without covariates. The combined use of these models and their respective weights allows us to estimate the causal effect of quitting smoking on death.

Denominator Model:

Model: Logistic regression model to predict quitting smoking (qsmk) using sex, race, age, education, and exercise as covariates.

Coefficients:

Intercept: -1.8791
Additional covariates: (sex, race, age, education, exercise)
Interpretation: This model helps calculate weights for the propensity of quitting smoking.

Numerator Model:

Model: Logistic regression model to predict quitting smoking (qsmk) without covariates.

Coefficients:

Intercept: -1.0598
Interpretation: This model provides the numerator in the calculation of IPTW weights.

Section 4 - Estimating the Causal Effect Using IPTW:

Problem Statement:

Using the weights generated in the previous section, we calculate the causal effect of quitting smoking on death. This section provides insights into the impact of quitting smoking on mortality, emphasizing the necessity of using IPTW to account for potential confounding variables. The results show that while there is an effect, it is not statistically significant at the conventional alpha level of 0.05.

Solution

Using Weights to Estimate Causal Effect:

Coefficients:

Intercept: -1.47409
qsmk: 0.00914
Interpretation: The coefficient for qsmk provides an estimate of the causal effect of quitting smoking on death using IPTW. However, the coefficient is not statistically significant at the conventional alpha level of 0.05.

Section 5 - Comparing Coefficients and Standard Errors:

Problem Statement:

This final section compares the coefficients and standard errors of quitting smoking in various models, including the crude and adjusted logistic regression models, as well as the GEE model.

Solution

The discussion highlights the changing direction of the coefficient after adjusting for covariates and the implications for statistical significance. Additionally, it recognizes the efficiency of the GEE model in estimating causal effects.

Comparing Coefficients and Standard Errors:

In the crude logistic regression model (Question 2), the coefficient for qsmk was 0.33959 with a standard error of 0.14224.
In the adjusted logistic regression model (Question 2), the coefficient for qsmk was -0.02304 with a standard error of 0.16898.
In the GEE model (Question 4), the coefficient for qsmk was 0.00914 with a standard error of 0.14662.
None of the coefficients in the models are statistically significant at the conventional alpha level of 0.05.

Statistical Examination of the Causal Impact of Quitting Smoking on Mortality

Section 1 - Fitting Crude and Adjusted Logistic Regression Models:

Solution

Crude Logistic Regression Model:

Adjusted Logistic Regression Model:

Section 2 - Binned Residual Plot Analysis:

Solution:

Section 3 - Denominator and Numerator Models for IPTW Calculation:

Solution:

Denominator Model:

Numerator Model:

Section 4 - Estimating the Causal Effect Using IPTW:

Solution

Section 5 - Comparing Coefficients and Standard Errors:

Solution