# What is survival analysis?

Survival analysis is a branch of statistics that is used to analyze the expected duration of time of an occurrence of an event such as failure of equipment, death, etc. It attempts to answer questions about the probability of failure. It is used to correspond to other statistical approaches used in investigating the time for an event of interest to occur.

• Parsimonious model
• Null hypothesis

## Parsimonious model

The dataset consists of 11,627 observations. The average age of the samples is 54.79 and the median age is 54 years with a variation around the mean of 9.56 years. The youngest is 32 years while the oldest is 81 years. The average systolic BP is 136.32 with a standard deviation of 22.80. The median systolic BP is 132 with the least being 83.65 and the highest is 295. The average time to death is 21.52 years with a standard deviation of 4.90 years.  The median time to death is 24.02. The smallest time to death is 0.07 years while the largest is 24.02 years.
 stats age sysbp timedth mean 54.79 136.32 21.52 sd 9.56 22.80 4.90 p50 54 132 24.02 min 32 83.5 0.07 max 81 295 24.02 N 11627 11627 11627
The histogram for the three variables are presented below and we see that while age is normally distributed, systolic BP and time to death are not The result below shows that 92.76% of the observations have no previous coronary heart disease while 7.24% have previous coronary heart disease. 41.39% have education level “0=-11 years”, 30.09% have High school diploma, 16.63% have some college and 11.89 have more that college

 PREVCHD Freq. Percent Cum. 0 10,785 92.76 92.76 1 842 7.24 100 Total 11,627 100 Educ 0=-11 years 4,690 41.39 41.39 High school diploma 3,410 30.09 71.48 some college 1,885 16.63 88.11 College+ 1,347 11.89 100
Fitting the model with all predictors and examining coefficients of education level, I observe that all the levels of education are significant; p<0.001. This contradicts the expectation that the variable is not needed.

We check for the log linearity of the continuous covariate (age and systolic BP) in the model. The plot of the result below shows that age satisfies the assumption of log-linearity thus there is no need to transform. Figure 1: log-linearity test for age

Similarly, the plot for systolic BP below is also linear which suggests that it also satisfies the log linearity assumption and does not need any transformation. There is significant interaction of previous coronary heart disease (prevchd) with both continuous variables “age” and “sysbp”

## Null hypothesis

The result of the proportional hazard assumption test is shown below. The result shows that the assumption is not met as chi2(5)=103.42, p<0.001. This connotes rejection of the null hypothesis of proportional hazard. age, systolic BP, and prevchd. Only the interactions satisfy the assumption. To solve this we include the time-varying variable for age, systolic BP, and prevchd. Testing for proportional hazard from the newly estimated model, we see that the global chi2(7)=3.89, p=0.7924 which means that we cannot reject the null hypothesis of proportional hazard. Moreover, all the variables have insignificant p-values which means all variable satisfies the proportional hazard assumption. The result is shown below. Using the cox snell residuals to test the goodness of fit of the model, we see that both lines only intersects at time 0 which calls into question the fit of the model. i.e. the model does not fit well _t Haz. Ratio Agetvc 1.028*** (0.003) Sysbptvc 0.997*** (0.001) Prevchdtvc 1.092*** (0.000) prevchd#c.age 0 0.969*** (0.003) 1 0.977** (0.009) prevchd#c.sysbp 0 1.022*** (0.003) 1 1.020*** (0.003) N 11,627 LR CHI2 1598.17 p 0.0000
Standard error in parenthesis; ***,**,* denotes significance at 1%, 5% and 10% respectively

H From the result presented in g, we see that age is associated with an increased risk of death from heart disease. A year increase in age increases the risk by 2.8% and this effect is significant (p<0.001). Systolic blood pressure is associated with a slight reduction in death from CHD. An increase in systolic blood pressure reduces the risk significantly by 0.03% (p<0.001). History of previous CHD is associated with increased risk. A person with previous CHD has the risk of death higher than those without previous CHD by 9.2% (p<0.001). The interaction of age with previous CHD and systolic BP with previous CHD is significant.

The survival plot of age at 40 years and 60 years are presented in h. the plot shows that the risk is the same up till time 5 before we see that age60 plot moves quickly towards the origin signifying increasing risk of death at the later time period. This supports the estimates that age is an increased risk of death from CHD.