Survival-Analysis Situation

Consider the following situation: We are involved in the monitoring of a study based on the log-rank test (i.e., it is a survival-analysis situation where the data are times to an event).  We expect that 1,000 patients will be enrolled and, at least, 200 of them will experience the event of interest. The sample size is based on a two-sided alpha level of 0.025 and 80% power.  Answer the following:

  1. What is the relevant sample size for this study?
  2. What is δ in this situation and what is it equal to?
  3. If there are three interim analyses done after 50, 122 and 200 events have been observed, what are the trial fractions t1, t2 and t3?
  4. Using the O’Brien-Fleming spending function, what are the boundaries for the three analyses?
  5. Now suppose that, at the second interim analysis (at 122 events), you realize that, at most, you will observe only 190 events by the end of the study Adjust your study to reflect this and give the revised O-F boundary for the final analysis.
  1. Consider the following situation: A trial was planned to evaluate a new agent for reducing the incidence or progression of diabetic retinopathy. Historical and epidemiologic information suggested that about 22% (i.e., ) of diabetics could be expected to demonstrate these events over a two-year span. The expected event rate of the new treatment is hoped to be half of the 22%, (i.e., ). As there was considerable uncertainty about the true overall event rate, an interim look of the blinded results was planned, to see if the original sample sizes were adequate which occurred 25% of the patients in the trial had completed one year of experience.   Suppose that 12 total events of retinopathy were observed at this interim analysis. Do the following:
  1. Calculate the sample size at the start of the study based on an alpha level of 5% (two-sided) and power 90%.
  2. Re-calculate the sample size based on the method by Gould (Stat Med 1992), which maintains the blinding during the interim analysis.
  1. Now consider a study studying a new cancer treatment, either the control group or the treatment group. The event of interest is the death from cancer and the response is the time from randomization to death. Suppose that from past experience, the median survival time for the control group is  weeks, and the study would like to detect a median survival  weeks median survival time with a 80% power in the trial (and an alpha level 5% two-sided).  Suppose also that the study is designed based on an O’Brien Fleming boundary applied to four total analyses (three interim and one final).  Answer the following:
  1. Assuming exponential survival functions for the two groups find the hazard rates and compute the treatment effect .
  2. Generate the O’Brien-Fleming bounds for this study and compute the drift
  3. Compute the number of required events and the sample size (consider an exponential distribution of survival and uniform accrual of 10 patients per month).
  4. Given accrual time of 20 weeks, compute the follow-up time
  5. Suppose that an analysis was performed at the time when 15 events were observed and the test statistic was and another analysis was performed when 34 events were observed with a test statistic . What should be the decision at this point (i.e., continue or stop)?
  6. Provide a 95% confidence interval for the treatment effect.

In this lab, we are going to go over sample size calculation, which is required to design a clinical trial.

\section{Sample size for a pre-specified precision}

Using some notation, we want


\Pr(|\bar{X}-\mu|<d)\geq 1-\alpha.


To figure things out, we need to standardize the distribution above by dividing by the standard deviation $\sigma/\sqrt{n}$, resulting in


\Pr\left (|Z|=\left |\frac{\bar{X}-\mu}{\sigma/\sqrt{n}}\right |<\frac{d}{\sigma/\sqrt{n}}=z_{1-\alpha/2}\right )\geq 1-\alpha


which in turn means that


n\geq\left(\dfrac{\sigma z_{\alpha/2}}{d}\right)^2


where $z_{1-\alpha/2}$ is the $(1-\alpha/2)\%$ quantile of the standard normal distribution (i.e. $z_{1-\alpha/2}=\Phi^{-1}(1-\alpha/2)$).

For example, if we want to estimate a mean within half a standard deviation, i.e., $d=0.5\sigma$, we get a formula that is applicable even without knowledge of $\sigma$, since


n\geq (2z_{\alpha/2})^2


In \verb|R|, you can simply code this as follows:

“`{r cv, }

# 4. Find the smallest sample size required to ensure that the error

# of estimation of the mean will be within

# d = 0.5?? of the true mean with 95% confidence



Since we cannot obtain partial units (human subjects in our context), we round up to the next integer

“`{r cv1}

# Or to round n up to the next highest integer



# Testing for a single (normal) mean

Among others, the library \verb|pwr| provides some useful function for obtaining the sample size required. \\[2ex]

As an example, consider the case of testing the (one-sided) null hypothesis $H_0: \mu\leq \mu_0=3$ versus the alternative hypothesis $H_1: \mu=\mu_1=4$, assuming that the standard deviation in both cases is $\sigma=2$.  We will use the function \verb|pwr.t.test| from the \verb|pwr| package to do this. Please, take a look at the help file of the function.

Note that, to properly invoke the function, we have to specify the effect size, which in this case is just the difference of the means under the null and the alternative hypothesis divided by the common standard deviation




as \verb|pwr.t.test| takes the effect size as its argument

“`{r singlemean}

# 5. Sample size calculation for a Normal mean

# H0: mu <= 3 vs H1: mu = 4, a = 0.05, beta = 0.1 and sigma = 2


pwr.t.test(d = (4-3)/2, sig.level = 0.05, power = 0.90,

alternative = “greater”,type = “one.sample”)


If we would like to calculate the power of a test, we can specify the sample size and leave the power unspecified.

“`{r singlemean1}

# To get the actual power of the test

pwr.t.test(d = (4-3)/2, sig.level = 0.05, n = 36,

alternative = “greater”,type = “one.sample”)


Note that the power of the test is not exactly 0.90 because some rounding took place regarding the sample size.\\[2ex]

You could do this more elegantly, by saving the sample size from the previous command as follows:

“`{r singlemean2}

pwr.output<-pwr.t.test(d = (4-3)/2, sig.level = 0.05, power = 0.90,

alternative = “greater”,type = “one.sample”)

# See what’s in pwr.output object


# Save the sample size into n$n)

# Now invoke pwr.t.test again to get the power without worrying about the n

pwr.t.test(d = (4-3)/2, sig.level = 0.05, n =,

alternative = “greater”,type = “one.sample”)


# Revisiting the Phase-II design

We want to test the null hypothesis $H_0:p\leq0.15$ vs the alternative $H_1:p=0.40$ setting $\alpha=0.10$ and $\beta=0.20$. The library \verb|clinfun| includes the function \verb|ph2single| that can give us the sample size based on the binomial distribution of the number of responders. Because there may be  a number of designs which would satisfy the $\alpha$ level and

power requirement, we can see, for example, 5 of the them by using the argument \verb|nsoln|.

“`{r phase2rev}

# 6. Revisiting the Phase II situation with H0:p = 0.15

# versus the alternative HA:p = 0.40,

# alpha = 0.1 and beta = 0.20

# (i) Using exact binomial test


ph2single(pu = 0.15, pa = 0.40, ep1 = 0.10, ep2 = 0.20, nsoln = 5)


All the designs above meet our criteria, but we choose the one with the smallest sample size (i.e. $n=16$). Recall that we found this design earlier in the discussion.

Using the function \verb|pwr.p.test| we can find the sample size based on the arcsine transformation, which is mainly used to normalise the distribution of proportions so that the normal approximation is appropriate even for small $p$, since the normal approximation for the sampling distribution of an estimated proportion is poor when $\hat{p}\rightarrow 0$ or $\hat{p}\rightarrow 1$.

“`{r singleproparcsine}

# (ii) Using the arcsine transformation of the proportion

# to normalise the distribution of the test statistic

pwr.p.test(h = ES.h(0.40, 0.15), sig.level = 0.10,

power = 0.80, alternative = “greater”)


Note that we have used the utility function \verb|ES.h| to calculate the effect size for proportions. The resulting sample size estimate is $n=14$. The two methods would be closer to each other if the sample size were larger. In any case, the previous method was exact whereas the latter method relies on the normal approximation.

# Testing for two independent means

As we discussed in the lecture, testing for two means involves the sample size formula


n=\frac{\left (z_{1-\alpha}-z_\beta \right )^2 2\sigma^2}{(\mu_2-\mu_1)^2}


where $n$ is the sample size \emph{for each group} (here we are assuming equal sample sizes in the two groups). The same can be done by considering the effect size $f=\frac{(\mu_2-\mu_1)}{2\sigma}$ as follows:


n= \frac{\left (z_{1-\alpha}-z_\beta \right )^2}{f^2}


The following calculations cycle through a sequence of effect sizes and produce a table of sample sizes.

“`{r sstable10}

# 7. Verify the calculation in Table 11.11 in the textbook

# (Piantadosi,2005, pp. 280)

table1 = data.frame(ES = seq(0.25,1.50,by = 0.25),

beta = 0.10,”Alpha = 0.05″ = NA,”Alpha = 0.10″ = NA)

names(table1)[3:4] = c(“Alpha = 0.05”, “Alpha = 0.10”)

for (i in 1:length(table1[,1]))


table1[i,3] = ceiling(2*pwr.t.test(d = table1[i,1], sig.level = 0.05,

power = 0.90, alternative = “two.sided”,

type = “two.sample”)$n)

table1[i,4] = ceiling(2*pwr.t.test(d = table1[i,1], sig.level = 0.10,

power = 0.90, alternative = “two.sided”,

type = “two.sample”)$n)




Now we produce the same results with $\beta=0.20$:

“`{r sstable20}

table2 = data.frame(ES = seq(0.25,1.50,by = 0.25),

beta = 0.20,”Alpha = 0.05″ = NA,”Alpha = 0.10″ = NA)

names(table2)[3:4] = c(“Alpha = 0.05”, “Alpha = 0.10”)

for (i in 1:length(table1[,1]))


table2[i,3] = ceiling(2*pwr.t.test(d = table2[i,1], sig.level = 0.05,

power = 0.80, alternative = “two.sided”,                                         type = “two.sample”)$n)

table2[i,4] = ceiling(2*pwr.t.test(d = table2[i,1], sig.level = 0.10,

power = 0.80, alternative = “two.sided”,

type = “two.sample”)$n)




Note that we have used the Student’s t-distribution instead of the normal distribution in computing the sample sizes.  Now let’s unpack the previous code.  First, we created a \verb|data.frame| containing, as its first column, a sequence of effect sizes ranging from $f=0.25$ to $f=1.50$ in steps of 0.25.  The remaining columns were filled with missing values (\verb|NA|).

“`{r sstable1, eval=FALSE}

table1 = data.frame(ES = seq(0.25,1.50,by = 0.25),

beta = 0.10,”Alpha = 0.05″ = NA,”Alpha = 0.10″ = NA)


The command

“`{r sstable2, eval=FALSE}

names(table1)[3:4] = c(“Alpha = 0.05”, “Alpha = 0.10”)


assigns names to the third and fourth columns of the table.  Then we loop through all the effect sizes in the first column to produce, alternatively, the sample size under alpha levels (two-sided) of 5\% and 10\%.

“`{r sstable3, eval=FALSE}

for (i in 1:length(table1[,1]))


table1[i,3] = ceiling(2*pwr.t.test(d = table2[i,1], sig.level = 0.05,

power = 0.80, alternative = “two.sided”,                                         type = “two.sample”)$n)

table1[i,4] = ceiling(2*pwr.t.test(d = table2[i,1], sig.level = 0.10,

power = 0.80, alternative = “two.sided”,

type = “two.sample”)$n)



Note the use of the command \tpl{\texttt{\Sexpr{‘length(table1[,1])’}}} which does not require us to know how many rows are in the table.

## Differential allocation

A slight modification in the sample size calculation occurs when the sample size in the two groups is unequal by some factor $r=1,2,\ldots,$. We are to use the function \verb|nNormal| found in the library \verb|gsDesign|. Please have a look at the help file.  Assuming that the ratio of sample sizes between the two groups is from 1 (equal allocation) to 10 and that the effect size is 0.5 (mean difference divided by the standard deviation), we get

“`{r ratio, message=FALSE}

# 8. Unequal group allocation (using normal quantiles)


ar = data.frame(ratio = 1/1:10, n = NA)

for (i in 1:10)


ar$n[i] = nNormal(delta1 = 0.5, delta0 = 0, sd = 1, sd2 = 1,

alpha = 0.05, beta = 0.10,

sided = 2, ratio = ar$ratio[i])




The sample table is as follows:

“`{r sampletable}



kable_styling(kable(ar, “html”),bootstrap_options = c(“striped”, “hover”, “condensed”, “responsive”))


Note the options `striped`, `condensed` and `responsive` which create the effects of the table. The last one, makes the table scrollable in small devices like cell phones, which may not be able to show it in its entirety.  We can also create a graph.

“`{r ratiofigure, message=FALSE, fig.width=8, fig.height=5, fig.align=”center”, fig.cap=”<b>Figure 1.</b> Effect of ratio of sample sizes on the total sample size.”}

plot(ceiling(n)~I(1/ratio),data = ar,xlim = c(1,10),ylim = c(0,550),

xlab = “Ratio of sample sizes”,

ylab = “Total sample size”,type = “l”,xaxt = “n”)

axis(1,at = 1:10)


We can easily see that the farther the allocation is from a balanced one ($r = 1$)

the larger the total sample size must be. Thus, unless a compelling counter-argument exists (e.g., making a randomized clinical trial more palatable to subjects) we suggest using equal group allocation.

In the above code consider the envocation of the `plot` command

“`{r ratiofigure1, eval=FALSE}

plot(ceiling(n)~I(1/ratio),data = ar,xlim = c(1,10),ylim = c(0,550),

xlab = “Ratio of sample sizes”,

ylab = “Total sample size”,type = “l”,xaxt = “n”)

axis(1,at = 1:10)


The same output would be possible with the following (simpler) version:

“`{r ratiofigure2, eval=FALSE}

with(ar, plot((1/ratio), ceiling(n), xlim = c(1,10),ylim = c(0,550),

xlab = “Ratio of sample sizes”,

ylab = “Total sample size”,type = “l”,xaxt = “n”))

axis(1,at = 1:10)


Note how the order of the x and y variables is reversed and the fact that the data can no longer be part of the `plot` command.

# Survival analysis

The first thing we should do is to get an idea of the hazard ratio of the new treatment compared to the standard treatment. To translate information from a statement about median survival, or proportion of patients surviving at some landmark time point, which make sense to everyone, to a statement about hazard of failure, which makes sense to no one, some simplifying assumptions about the survival distribution should be made.

  1. <b>Translating medians to hazards and hazard ratios </b>

We begin by assuming that the time to death is distributed according to the exponential distribution for both groups. Denoting by $m_0$ and $m_{1}$ the median survival times for the standard treatment and the new treatment, respectively. Using some (very simple) calculous we can solve the equation with respect to the hazard $\lambda_{0}$ in the following formula


\int_{0}^{m_{0}}\lambda_0e^{-\lambda_0 t}dt = 0.5


where $\lambda_0$ and $\mu_0$ are the hazard and the median survival time (i.e., the time point where 50\% of the subjects have experienced the event of interest) under the null hypothesis respectively. Running through the steps we get $\lambda_0 = – \dfrac{\log(0.5)}{m_0}$ and $\lambda_1 = – \dfrac{\log(0.5)}{m_1}$

in a similar manner, where $\lambda_1$ and $\mu_1$ are the hazard of failure and median survival time under the alternative hypothesis respectively. Now the hazard ratio is easily obtained. We point out that another nice property of the exponential distribution is that the ratio of hazards equals the inverse ratio of medians i.e.,




Note above that we have not specified a unit in time in any of the above calculations. You should be very careful to maintain the same unit of time (months, years, etc.) throughout your calculations. For example, the hazard ratio when the median survival times are $\mu_0=15$ and $\mu_1=25$ months (say) is

“`{r ss.survival}

# Median survival of standard therapy

m0 = 15

lambda0 = – log(0.5)/m0

# Median survival of experimental therapy

m1 = 25

lambda1 = – log(0.5)/m1

# Based on the properties of the exponetial distribution



  1. **Sample size when accrual time total follow-up duration are known**

You have to always keep in mind that when dealing with survival analysis, you are concerned about events not patients. Thus, to obtain the desired level of power in a survival study, a certain number of events must occur, irrespective of the number of patients involved. A complicating factor is that there are different combinations of numbers of patients and length of follow-up that will result in the desired number of events. WE can use the function \verb|nSurv| to help us investigate such combinations.\\[2ex]

First we give some definitions:

(1) Accrual time is the time it takes to enroll all the patients in the study

(2) Follow-up time is the duration of the follow-up period after the last patient has been enrolled

(3) Study duration is the sum of (1)+(2).

The \verb|nSurv| function has plenty of options. You are strongly advised to examine the options in depth.

“` {r ss.survival55}

# Assume 55 months of accrual; 10 months of follow up afterwards

# The function solves

nSurv(lambdaC = lambda0, hr = lambda1/lambda0, hr0 = 1,

R = 55, minfup = 10,T = 65,

ratio = 1, alpha = 0.05, beta = 0.20, sided = 2)


In the above example

* \verb|lambdaC| is the event rate in the control group

* \verb|hr| is the hazard ratio under the alternative hypothesis

* \verb|hr0| is the hazard ratio under the null hypothesis

* \verb|R| is the accrual (recruitment) duration

* \verb|minfup| is the follow-up of the last patient enrolled

* \verb|T| is the maximum study duration

In this example, the sample size needed to obtain the desired power is $n=173$, whereas the accrual rate should be 3.13 patients a month. Note that $3.1351\times 55= 172.4305$ so that the sample size will be accrued in the specified accrual period of 55 months.

  1. **Sample size when the total follow-up time is known**

Since we allow for the possibility that some patients might get lost while being under follow-up, it is obvious that we implicitly inflate the variance in our estimates, so a larger sample size should be required. Indeed we see that we now need 197 patients if the annual rate of loss to follow-up is $\eta=0.009$ (see below):

“` {r ss.survival55.ltu}

# Assuming dropout rates

nSurv(lambdaC = lambda0, hr = lambda1/lambda0, hr0 = 1,

R = 55, minfup = 10,T = 65,eta = 0.009,etaE = 0.009,

ratio = 1, alpha = 0.05, beta = 0.20, sided = 2)


We increase the follow-up period up to 100 months. Thus, patients would be under study for a longer period of time leading to a much greater probability of observing an event at a progressively higher proportion of them. Since the key quantity in determining the power in survival analysis is the number of events, we don’t have to enroll so many patients as almost everybody would develop the event as follow-up time is extended.

“` {r ss.survival100}

# Assuming a larger follow-up time

nSurv(lambdaC = lambda0, hr = lambda1/lambda0, hr0 = 1,

R = 55, minfup = 100,T = 155,

ratio = 1, alpha = 0.05, beta = 0.20, sided = 2)


The expected sample size is now much smaller ($n=122$). It is in fact very close to the expected number of events, which means that almost all patients are going to experience the event within that period of time.

  1. **Sample size when accrual rate and duration are known

We specify the enrollment rate through \verb|gamma=5| and the accrual duration  through \verb|R=35|. The following code will produce a design under these criteria

“` {r ss.survival5.35}

# Survival study design with known accrual rate and accrual duration

nSurv(lambdaC = lambda0, hr = lambda1/lambda0, hr0 = 1,

gamma = 5, R = 35,

ratio = 1, alpha = 0.05, beta = 0.20, sided = 2)


Note that the expected number of events is the virtually the same as in the previous cases. We need $n=175$ subjects and the additional follow-up time required is approximately 16.5 months.

On the following graph, we see that for a given recruitment duration of 5 months, as the follow up period increases the sample size decreases, and eventually converges to the expected number of events. The results are presented in Figure 2.

“` {r ss.survivalfig, fig.width=8, fig.height=5, fig.align=’center’, fig.cap=”<b>Figure 2.</b> Sample size as the length of follow-up is extended.”}

# Increase follow up from 5 up to 100 months

fp = seq(5,100,by=0.5)

n = rep(NA,length(fp))

ar = rep(NA,length(fp))

for (i in 1:length(fp))


x = nSurv(lambdaC = lambda0, hr = lambda1/lambda0, hr0 = 1,

R = 5, minfup = fp[i], T = 5 + fp[i],

ratio = 1, alpha = 0.05, beta = 0.20, sided = 2)

n[i] = x$n

ar[i] = x$gamma


plot(fp,n,ylim = c(80,510),yaxt = “n”,ylab = “Sample size”,

xlab = “Follow up (months)”,type = “l”)

axis(2,at = c(100,120,200,300,400,500),las = 1)

abline(h = 119,col = “red”)

text(18,130,”Expected events”)




In this case we have to test whether event proportion is greater than (200/1000)=0.2 or not ,so let n be sample size let x be the no of events occurring and in this sample, if x is the no of events among n then we will take sample proportion x/n as an estimate of ,with 5% margin of error

so the sample size the estimated event rate=200/1000=0.2. the difference between margin of error=0.2*1/20=0.01(5%)


>n=0.2*(1-0.2)*(qnorm(0.975)-qnorm(0.2))/(0.01)^2 #run this code to get the ans.


(z(1-alpha)-z(b))=delta*standard deviation

This code gives the value of delta

del=(qnorm(0.975)-qnorm(0.2))*sqrt(0.2*0.8)/n # run this code to find the ans


Trian fraction=no of events ocurred/no of events expected

The trial fractions are =50/200,122/200,200/200=1


To calculate the Obrien, fleming boundaries, we have to run the following codes

so in this case ,3 interim analysis is being conducted ,there is a O brien fleming chart avialble for this purpose ,this is given below

Analysis no.







>gsBoundSummary(x).. run this code to get the bounds

a package gsDesign is built under R to calculate the obrien fleming bounds, in the code test is 2 sided test ,beta and alpha value is given in the question, timings have been previously calculated


Now, we have seen that the expected no of events =190

so the trial fractions have changed to 50/190,122/190 and the final trial fraction remains the same =1 and the other things remains the same

so the code will change to




Run these codes and get the ans youself.


The level of the test =0.05, power of the test =90%, the test in this case, null hypothesis is a=11%,alternative hypothesis

is a=22%,i.e it is the test of proportion .under this condition the sample is

>n=sqrt(.11*.89)*(qnorm(.95)-qnorm(.1))/(.11)^2. this will give the ans for normal aproximation

otherwise for bionomial approximation.


pwr.2p.test(h = ES.h(0.22, 0.11), sig.level = 0.05,

power = 0.90, alternative = “two.sided”)

will give you the sample size


we will follow the bionomial approximation

first of all ,in this case R=0.5

do find out 12/n

then the from the gould’s formula for the readjustment of the sample size ,it could written as




>N=(qnorm(0.975)*sqrt(2*r*(1-r))+qnorm(.1)*(sqrt(p1*(1-p1))+sqrt(p2(1-p2))))^2/(p1-p2)^2 N will be the re calculated size


The survival function is a exponential one

s(t)=aexp{-at} so the median =1/a*ln(2)

a=1/median*ln(2) will give you the ans, in problem the median is given clearly ,so just do the calculation to get the ans


4 interim analysis are done at equal interval, so we will assume the trial fraction to be 0.25,0.5,0.75,1


>gsBoundSummary(y) #will give all the answers


>x3 = gsSurv (k = 4, test.type = 2, alpha = 0.05, sided = 2,

lambdaC = lambda0, hr = lambda1/lambda0, hr0 = 1,

beta = 0.20, sfu = “OF”, gamma = 10,

ratio = 1, timing=c(0.25, 0.5, 0.75, 1.0))


this code will give the value of sample size and event required ,lambdao=a,lambda1=a+1


Similarly as like in the previous case there is fup function developed in gDesign in R which gives you the minimum follow up time

>fup=gsSurv(k=4, test.type =2, alpha=0.05, sided=2, lambdaC = lambda0, hr=lambda1/lambda0, hr0=1,beta=0.2,     sfu=’OF’,R=20, gamma=10, minfup=NULL)

fup$minfup    #run this code to get the ans


You have the upper bound for the interim analysis ,if you look the upper bound for the seond interim analysis ,upper bound is less than the observed value of test statistic =3.2 at the second analysis, which is a unfavorable condition, so we should abort it.


As from the ans you colud see that the number of events=n,the variance=4/n,delta=log(2) n you will find out from question no 3c