Further Considerations on Simon Two-Stage Design

Introduction

Simon’s two-stage design is used to minimize the expected sample size when the true response is less than some pre-determined uninterested level \(\pi_0\) (Simon 1989).

The null hypothesis is that \(H_0: \pi = \pi_0\) and the alternative hypothesis is \(H_1: \pi = \pi_1\) where \(\pi_1\) is some desirable level that warrant further development.

Suppose the Type I and Type II errors are \(\alpha\) and \(\beta\).

Let \(n_1\) patients be given treatment in the first stage. If \(r_1\) or less respond or more than \(r\) respond, stop stage 1. Otherwise, let \(n_2\) patients be given treatment in the second stage. \(n = n_1 + n_2\).

Suppose \(X_1 \sim Bin(n_1, \pi)\) and \(X_2 \sim Bin(n_2, \pi)\). We declare the new drug a

  • Failure if \(\xi_F = \{X_1 \leq r_1\} \text{ OR } \{X_1 > r_1 \text { AND } X_1 + X_2 \leq r \}\)
  • Success if \(\xi_S = \{X_1 > r_1\} \text{ AND } \{X_1 + X_2 > r \}\)

Therefore, under the controlling of \(\alpha\) and \(\beta\)

\[ P(\xi_F \mid \pi \leq \pi_0) \leq \alpha, \quad P(\xi_S \mid \pi \geq \pi_1) \geq 1- \beta \]

Moreover, we have

\[ \begin{aligned} P(\xi_S \mid \pi) & = \sum_{m_1 > r_1, m_1 + m_2 > r} b(m_1; n_1, \pi)b(m_2; n_2, \pi) \\ P(\xi_F \mid \pi) & = 1- P(\xi_S \mid \pi) \\\\ & = B(r_1; n_1, \pi) + \sum_{x = r_1 + 1}^{\min\{n_1, r\}}b(x; n_1, \pi)B(r-x; n_2, \pi) \end{aligned} \]

Or, the type II error \(\beta\) happens when

  • \(X_1 \leq r_1\) OR
  • \(X_1 \gt r_1 \text{ AND } X_1+X_2 \le r\)

The constraint will be \[ \begin{aligned} P(\text{type II error}) &= P(X_1 \le r_1 \mid \pi \geq \pi_1)+P(X_1 \gt r_1, X_1+X_2 \le r \mid \pi \geq \pi_1) \\\\ &= P(\sum_{i=1}^{n_1}{X_i \le r_1 \mid \pi_1})+P(\sum_{i=1}^{n_1}{X_i \gt r_1},\sum_{i=1}^{n}{X_i \le r} \mid \pi_1) \le \beta \end{aligned} \]

The expected sample size \(EN(\pi_0)\) is given by \[ n_1\left[P(X_1 \leq r_1\mid \pi = \pi_0) + P(X_1 > r\mid \pi = \pi_0) \right] + nP(r_1 + 1 \leq X_1 \leq r \mid \pi = \pi_0) \]

The probability of early rejection in stage 1 is \(PET(\pi_0) = P(X_1 \leq r_1 \mid \pi = \pi_0)\).

There are several choices to select \((r_1, n_1, r, n)\):

  • Optimal design: the one has the smallest expected sample size when \(\pi = \pi_0\).
  • Minimax design: the one has the smallest total sample size for the whole trial when \(\pi = \pi_0\).
  • Adimssible design: compromise the minimax and the optimal design. (Jung et al. 2004)

Further Considerations on Study Design

nn1r1r2Type I ErrorPowerType II ErrorProbability of early stopping
Study 150162100.09800.65270.34730.5614

This study employs Simon’s optimal two stage design with type I error rate at 0.1 and this design yield the power of 0.65.

In stage I, a total number of 16 patients is accrued. If there are 2 or fewer Responders among these 16 patients, the study will end early, and this drug is considered to have no development value. We cannot make efficacy claiming at the end of this stage even if there are more than 2 Responders (out of 16 patients). The probability of early termination is 56%, and it is the probability that the design will be stopped after stage I. This phase only serves as a futility gate.

Otherwise, additional 34 patients will be accrued in stage II, resulting in a total number sample size of 50. If there are 11 or more Responders among these 50 patients, the null hypothesis will be rejected. We can consider the study to be successful and the drug to be of R&D value.

The above is the basic study decision process and logic.

As for the type I error rate of 0.1, it means that the probability of falsely concluding a beneficial effect of the drug when none exists is no more than 10% (but more than 5%, which is a default level for regulatory approval).

Type I error also means that the drug does not have R&D value but makes the wrong judgment that it is worthwhile to continue R&D. This type of error occurs when

  • Responders\(\gt 2\) is observed in the stage I and Responders\(\ge 11\) is observed in the stage II.

what if we want to claim efficacy at the end of stage I?

If a claiming of efficacy benefit is made in the stage I of this study based on the interim results (i.e. the number of responders at the end of stage I), a portion of the type I error we assigned to the entire study will be consumed at the time of the interim analysis. This would result in a type I error inflation at the end of the study and lead to treating additional patients with an ineffective drug.

But it is possible to make efficacy claiming at the end of stage I by changing the structure of design.

An extension of Simon's two-stage simultaneous treatment of ineffective interim (futility interim) and efficacy interim (efficacy interim) can be found in Mander and Thompson 2010.

In this case, there are then two opportunities to potentially misjudge the drug as ineffective (type II error \(\beta\) increasing) and two opportunities to potentially misjudge the drug as effective (type I error \(\alpha\) increasing), and therefore, both type I and type II errors need to be adjusted. Compared to the Simon two-stage design, this method may require a larger sample size for additional control of the overall type I error.

The specific implementation is the same as the Simon two-stage design, i.e., the overall type I and type II errors are restricted and the other parameters are searched to find the OPTIMAL solution.

Note that compared to the Simon two-stage design, this approach adds an additional parameter (\(a_1\)), and if the number of responders in the stage I exceeds \(a_1\), the trial can be terminated early, considered effective, and considered for a confirmatory phase III trial. Again from the above example, if this approach is used, the corresponding optimal scenario would be as follows.

nn1r1\(a_1\)r2Type I ErrorType II ErrorEN
Optimal Design41103-150.0220.216.9
Optimal Design with Efficacy Boundary311137120.0250.216.7

The expected sample size \(EN\) is slightly smaller than that of the Simon two-stage design because the exact type I error is 0.022 in the optimal scenario of the Simon two-stage design and 0.025 in the later optimal scenario. In most cases, the addition of the interim scenario will result in a larger sample size than the corresponding Simon two-stage design. The sample size is somewhat larger.

what happens if we do not stop when the responders in stage I are less (or equal to) than the \(r_1\)

For this case, the conditions of efficacy/failure claiming on the final stage become

  • Failure if \(\{X_1 + X_2 \leq r \}\)
  • Success if \(\{X_1 + X_2 > r \}\)

Both of the \(\alpha\) and \(\beta\) will be impacted.

what happens when we set a low power for this design?

Further, type II error rate (1-power), which is correlated to study power, is defined as failing to show an effect of a drug where there actually is one. This type of error will occur when

  • Responders\(\le 2\) is observed at phase I, or
  • Responders\(\gt 2\) is observed in the stage I and Responders\(\lt 11\) is observed in the stage II.

Accordingly, the loose type II error limit of this study (~35%) has the potential to keep us from continuing to develop a useful drug due to early stopping based on a small sample size in stage I. In general the design with more patients during the stage I is preferable since the corresponding type II error rate is small. The probability of falsely concluding the treatment to be ineffective at the end of the stage I will therefore be low.

What to do if the number of evaluable subjects in the Stage I exceeds our expectations?

It is not uncommon for the number of cases in the stage I to exceed expectations.

For example, in a optimal protocol, the stage I was 10 cases and the total sample size was 41 cases. It is possible that more than 10 subjects may have already had an efficacy assessment at the time of the stage I analysis (if less than 10, the trial can continue). To deal with this situation, it can be stated at the beginning of the study design, either by referring to the following two approaches.

  • Analyze only the first 10 subjects with an efficacy assessment, the extra subjects are not included in the stage I of the analysis.

  • Use all subjects with an efficacy assessment for the analysis, but require an update of the OPTIMAL protocol. For example, if there are 2 extra subjects with efficacy assessment in the stage I, i.e., 12 subjects, the optimal scheme can be recalculated to obtain the following optimal scheme (i.e., fix \(n_1=12\) in addition to restricting \(\alpha\) and \(\beta\), and then search for the other 3 parameters to find the new optimal scheme. This can be achieved by simple programming.

Statistical Inference

This section is based on the methodology in Jung and Kim (2004) for point estimation and Koyama and Chen (2008) for p-value and confidence interval.

Let the null hypothesis be \(H_0: \pi \le \pi_0\). A Simon’s two-stage design is usually indexed by four numbers as above section.

Let \(m\) be the stopping stage and \(S\) be the total number of successes accumulated up to stopping stage.

That is, \(S=X_1\) when \(m=1\) and \(S=X_t\) when \(m=2\). If \(m=2\) then let \(n_2^{\ast}\) be the actual stage 2 sample size. Let \(x_1\) and \(x_2\) be the actual observed numbers of successes at stage 1 and stage 2 (if \(m=2\)), respectively. Let \(s\) be the total number of successes observed at the end of the trial.

Point Estimate

The maximum likelihood estimator (MLE) \(\hat{\pi}_{MLE}\) \[ \hat{\pi}_{MLE}=\left\{\begin{matrix} \frac{x_1}{n_1} & ,m=1 \\ \frac{x_1+x_2}{n_1+n_2^{\ast}} & ,m=2 \end{matrix}\right. \]

The MLE estimator is unbiased when \(m=1\). However, it is biased when \(m=2\) and the following UMVUE estimator is unbiased and preferred (Jung and Kim, 2004).

The uniformly minimum variance unbiased estimator (UMVUE) \(\hat{\pi}_{UMVUE}\) $$ \hat{\pi}_{MLE}=\left\{\begin{matrix} \frac{x_1}{n_1} & ,m=1 \\ \frac{ {\textstyle \sum_{x_1=(r_1+1)\vee (s-n_2^{\ast})}^{s\wedge n_1}} \begin{pmatrix} n_1-1 \\ x_1-1 \end{pmatrix} \begin{pmatrix} n_2^{\ast} \\ s-x_1 \end{pmatrix} }{{\textstyle \sum_{x_1=(r_1+1)\vee (s-n_2^{\ast})}^{s\wedge n_1}} \begin{pmatrix} n_1 \\ x_1 \end{pmatrix} \begin{pmatrix} n_2^{\ast} \\ s-x_1 \end{pmatrix} } & ,m=2 \end{matrix}\right. $$

where \(a \wedge b = \text{min} (a,b)\) and \(a \vee b = \text{max} (a,b)\).

P-value

If \(m=1\) then the p-value is \(p=P_{\pi_0}(X_1 \ge x_1)\).

If \(m=2\) and the actual sample size in stage 2 is the same as the planned (i.e. \(n_2^{\ast} = n_2\)), then the p-value is computed as

\[ p=\sum_{x_1=r_1+1}^{n_1}{P_{\pi_0}\left [ X_1=x_1 \right ] P_{\pi_0}\left [ X_2 \ge x_2 \mid X_1=x_1 \right] } \]

Where \(X_2 \sim \text{Binomial} (n_2,\pi_0)\).

If \(m=2\) and the actual sample size in stage 2 is different from the planned (i.e. \(n_2^{\ast} \neq n_2\)), then the p-value is computed as

\[ p=\sum_{x_1=r_1+1}^{n_1}P_{\pi_0}\left [ X_1=x_1 \right ] A\left ( x_1,n_2,\pi^{ \ast} \right ) , \]

where

\[ \begin{align*} A\left ( x_1,n_2,\pi \right ) &= P_{\pi}\left [ X_2 \ge r_t-x_1 \mid n_2 \right ] \\\\ &= \sum_{x_2=r_t-x_1+1}^{n_2}\begin{pmatrix} n_2 \\ x_2 \end{pmatrix}\pi^{x_2}(1-\pi)^{n_2-x_2} \end{align*} \]

is the conditional power function at the second stage and \(\pi^{\ast}\) is the solution of

\[ A\left ( x_1,n_2,\pi^{\ast} \right )=P_{\pi_0}\left [ X_2^{\ast} \ge x_2 \mid n_2^{\ast} \right ] \]

where \(X_2^{\ast} \sim \text{Binomial} (n_2^{\ast},\pi_0)\)

Confidence Interval

If \(m=1\), the exact 95% confidence interval will be computed using Clopper-Pearson method.

If \(m=2\), we can compute p-value using the approach described above for testing \(H_0\) for any \(\pi_0\). A 2-sided 95% confidence interval is a collection of \(\pi_0\) such that the corresponding p-value is within \([0.025,0.975]\).

Recommended Materials in Chinese

癌症第二期試驗採用Simon 兩階段設計之新思維

簡介Simon 二階段試驗設計 - 財團法人醫藥品查驗中心

UMVUE

Best Unbiased Estimators

SAP for Reference

Safety, Efficacy, and Tolerability Study of PF-06480605 in Subjects With Moderate to Severe Ulcerative Colitis.