Survival-Analysis Situation Homework Sample

Parsimonious model

The dataset consists of 11,627 observations. The average age of the samples is 54.79 and the median age is 54 years with a variation around the mean of 9.56 years. The youngest is 32 years while the oldest is 81 years. The average systolic BP is 136.32 with a standard deviation of 22.80. The median systolic BP is 132 with the least being 83.65 and the highest is 295. The average time to death is 21.52 years with a standard deviation of 4.90 years. The median time to death is 24.02. The smallest time to death is 0.07 years while the largest is 24.02 years.

stats	age	sysbp	timedth
mean	54.79	136.32	21.52
sd	9.56	22.80	4.90
p50	54	132	24.02
min	32	83.5	0.07
max	81	295	24.02
N	11627	11627	11627

The histogram for the three variables are presented below and we see that while age is normally distributed, systolic BP and time to death are not

The result below shows that 92.76% of the observations have no previous coronary heart disease while 7.24% have previous coronary heart disease. 41.39% have education level “0=-11 years”, 30.09% have High school diploma, 16.63% have some college and 11.89 have more that college

PREVCHD	Freq.	Percent	Cum.
0	10,785	92.76	92.76
1	842	7.24	100
Total	11,627	100
Educ
0=-11 years	4,690	41.39	41.39
High school diploma	3,410	30.09	71.48
some college	1,885	16.63	88.11
College+	1,347	11.89	100

Fitting the model with all predictors and examining coefficients of education level, I observe that all the levels of education are significant; p<0.001. This contradicts the expectation that the variable is not needed.

We check for the log linearity of the continuous covariate (age and systolic BP) in the model. The plot of the result below shows that age satisfies the assumption of log-linearity thus there is no need to transform.

Figure 1: log-linearity test for age

Similarly, the plot for systolic BP below is also linear which suggests that it also satisfies the log linearity assumption and does not need any transformation.

There is significant interaction of previous coronary heart disease (prevchd) with both continuous variables “age” and “sysbp”

Null hypothesis

The result of the proportional hazard assumption test is shown below. The result shows that the assumption is not met as chi2(5)=103.42, p<0.001. This connotes rejection of the null hypothesis of proportional hazard. age, systolic BP, and prevchd. Only the interactions satisfy the assumption.

To solve this we include the time-varying variable for age, systolic BP, and prevchd. Testing for proportional hazard from the newly estimated model, we see that the global chi2(7)=3.89, p=0.7924 which means that we cannot reject the null hypothesis of proportional hazard. Moreover, all the variables have insignificant p-values which means all variable satisfies the proportional hazard assumption. The result is shown below.

Using the cox snell residuals to test the goodness of fit of the model, we see that both lines only intersects at time 0 which calls into question the fit of the model. i.e. the model does not fit well

_t	Haz. Ratio
Agetvc	1.028*** (0.003)
Sysbptvc	0.997*** (0.001)
Prevchdtvc	1.092*** (0.000)
prevchd#c.age
0	0.969*** (0.003)
1	0.977** (0.009)
prevchd#c.sysbp
0	1.022*** (0.003)
1	1.020*** (0.003)
N	11,627
LR CHI2	1598.17
p	0.0000

Standard error in parenthesis; ***,**,* denotes significance at 1%, 5% and 10% respectively

From the result presented in g, we see that age is associated with an increased risk of death from heart disease. A year increase in age increases the risk by 2.8% and this effect is significant (p<0.001). Systolic blood pressure is associated with a slight reduction in death from CHD. An increase in systolic blood pressure reduces the risk significantly by 0.03% (p<0.001). History of previous CHD is associated with increased risk. A person with previous CHD has the risk of death higher than those without previous CHD by 9.2% (p<0.001). The interaction of age with previous CHD and systolic BP with previous CHD is significant.

The survival plot of age at 40 years and 60 years are presented in h. the plot shows that the risk is the same up till time 5 before we see that age60 plot moves quickly towards the origin signifying increasing risk of death at the later time period. This supports the estimates that age is an increased risk of death from CHD.

What is survival analysis?

Parsimonious model

Null hypothesis