A Family-Based Graphical Approach for Testing Hierarchically Ordered Families of Hypotheses

In applications of clinical trials, tested hypotheses are often grouped as multiple hierarchically ordered families. To test such structured hypotheses, various gatekeeping strategies have been developed in the literature, such as series gatekeeping, parallel gatekeeping, tree-structured gatekeeping strategies, etc..

Key words : hierarchically ordered families; gatekeeping

However, these gatekeeping strategies are often either non-intuitive or less flexible when addressing increasingly complex logical relationships among families of hypotheses.

我们想解决什么问题? 图形化拥有层次逻辑结构的假设检验族,或者说糅合了 gatekeeping策略和graphicial approach的图示法(Bretz et al. (2009) 和 Dmitrienko et al. (2008) )。相比于Burman et al. (2008)的方法,图示法是“alpha-exhaustive”的;

对于Bretz et al. (2009)提出的方法,核心是"Hypothesis-based graphical approach",此处的方法核心为"Family-based graphical approach".

In order to overcome the issue, Qiu et al. (2019) developed a new family-based graphical approach which is based on the approach introduced by Bretz et al (2009), which can easily derive and visualize different gatekeeping strategies.

In the proposed approach, a directed and weighted graph is used to represent the generated gatekeeping strategy where each node corresponds to a family of hypotheses and two simple updating rules are used for updating the critical value of each family and the transition coefficient between any two families.

Theoretically, they have shown that the proposed graphical approach strongly controls the overall family-wise error rate at a pre-specified level.

Methodology of family-based graphical approach

While testing multiple families of hypotheses, hierarchically logical restrictions among the families are often one important aspect. Thus, it is natural for us to focus more on the logical relationships at family level rather than at hypothesis level, to develop a graphical approach for visualizing conventional gatekeeping strategies for testing multiple ordered families of hypotheses.

By using the similar idea as in Kordzakhia and Dmitrienko (2013), a vertex could be used to represent a family of hypotheses instead of an individual hypothesis and a directed edge with a pre-specified weight associated with it to represent the transition relationship between two families. This approach is termed as family-based graphical approach.

Figure - A intuitive example of Hypothesis-based graphical visualization of gatekeeping procedure with truncated Holm procedure with truncation parameter γ as gatekeeper.

Basic notation

Suppose there are N2 hypotheses divided into m2 families, which are further grouped into n layers, with Li={Fi1,Fi2,...,Fili} being the i-th ordered layer consisting of li families of hypotheses, i=1,...,n and i=1nli=m.

Each family Fij within layer Li has nij1 null hypotheses, denoted as Fij={Hij1,...,Hijnij} , for j=1,..,li, such that i=1nj=1linij=N.

These families Fij of hypotheses are to be tested based on their respective p-value Pijk,  k=1,...,nij, subject to controlling an overall measure of type I error at a pre-specified level α.

Each of the true null p-value is assumed to be stochastically greater than or equal to the uniform distribution on [0,1]. That is, if Tij is the set of true null hypotheses in Fij, then for any fixed u[0,1], Pr[PijkuHijkTij]u, for any i=1,...,n;  j=1,...,li and k=1,...,nij.

Figure - Graphical representation of general family-based graphical approach. Each vertex is associated with a family instead of a hypothesis, we term the graph as a family-based graph.

Global type I error rate control

The familywise error rate (FWER), which is the probability of incorrectly rejecting at least one true null hypothesis, is a commonly used notion of an overall measure of type I error when testing a single family of hypotheses.

Since we have multiple layers with any number of families within each layer, we consider this measure not locally for each family but globally. In other words, we define the overall FWER as the probability of incorrectly rejecting at least one true null hypothesis across all families of hypotheses for all layers.

"Layer" is an important concept in this paper! It represents the hierarchical structures over families instead of single hypothesis.

If it is bounded above by α regardless of which and how many null hypotheses within each family are true for any layer, then this overall FWER is said to be strongly controlled at α.

Given the pre-specified α, let αi denote the initial critical values assigned to layer Li with i=1nαiα.

Remarks on error rate function introduced in Dmitrienko et al. (2008)

The error rate function was used to develop a simple stepwise approach for parallel gatekeeping strategies. In their discussion, the error rate function is required to be strictly less than α unless all of the hypotheses in one family are rejected, which is termed as separability condition.

Note that in applications, if the error rate function e() cannot be calculated easily, we often use one of its upper bounds e() to replace it.

The definition of the error rate function in this section we used is a a little bit more general. For this function, the separability condition is not required when choosing local procedures for our suggested family-based graphical approach.

In the family-based approach, each family is tested by its own local procedure, thus it is associated with a particular error rate function e().

Let αij denotes the local critical value for testing family Fij and Aij denote the set of accepted hypotheses in Fij.

Based on Aij, we can calculate e(Aij) after testing Fij at level αij and then transfer the remaining amount of its local critical value αije(Aij) to their respective families in the subsequent layers according to the corresponding transition coefficients.

Testing procedures and alpha transition approach

The procedures start with testing L1 to Ln sequentially and within each layer Li, families Fij are tested in any order using any local procedures based on their own (local) critical values.

The critical values used to locally test each family within the current layer is updated from its initially assigned value to one which incorporates certain portions of the critical values used in testing the families within the previous layers. This procedure stops testing when all families of the last layer Ln are tested.

The distribution of the amount of critical values transferred among families can be pre-fixed by a transition coefficient set G which is defined as follows.

Transition coefficient set G

Let G={gijkl} denote a set of all transition coefficients gijkl which satisfies the following conditions for any i=1,...,n and j=1,...,li:

k=i+1nl=1lkgijkl1;

and

0gijkl1;  gijkl=0 if ik.

The gijkl is defined as the proportion of the local critical value that can be transferred from family Fij within layer Li to family Fkl within layer Lk. The above figure shows the graphical representation of the general family-based approach.

Algorithm

Algorithm 1 - Two layers with two families of hypotheses within each layer

Consider m=4 families of hypotheses being divided into two layers L1={F11,F12},L2={F21,F22} based on their hierarchal relationships, with two families of hypotheses within each layer.

Graph for two layer family-based procedure with m=4.

Step 1

Test family F1j,j=1,2 using any FWER controlling procedure at critical value α1j and calculate e(A1j).

Update the graph with L1L1{F1j}; for k=1,2let α2kα2k+(α1je1j(A1j))g1j2k; and g1l2k={g1l2k,lj0,otherwise. if L1, go back to step 1; otherwise, go to next step.

Step 2

Test F2k,k=1,2, using any FWER controlling procedure at level α2k and update the graph: L2L2{F2k}. If L2, go back to step 2; otherwise stop.

步骤简述:

此算法从第一层L1的假设检验族 family F1j,j=1,2 开始执行;

一旦F1j被检验,F2k 的检验水准将基于转移矩阵G和error rate function e1j(A1j)更新,并且G的更新将删除所有的与F1j相关的元素。

该算法的证明见原论文Appendix A.1。

Algorithm 2 - General multi-layer family-based graphical approach

The aforementioned two-layer four-family case demonstrates the inherent nature of sequential testing of the family-based graphical approach.

Now we generalize the graphical approach from two layers with two families of hypotheses in each layer to any n layers with arbitrary number of families of hypotheses within each layer.

The general multi-layer family-based graphical approach is defined through the following algorithm:

Step i  (1in1)

Test family Fij,j=1,2,...,li using any FWER controlling procedure at critical value αij and calculate eij(Aij).

Update the graph with LiLi{Fij}; for k=i+1,...,n;l=1,...,lklet αklαkl+(αijeij(Aij))×gijkl; and giskl={giskl,sj0,otherwise. if Li, go back to step i; otherwise, go to next step.

Step n

Test Ln={Fn1,...,Fnln}, using any FWER controlling procedure at level αnj to test Fnj and update the graph: LnLn{Fnj}. If Ln, go back to step n; otherwise stop.

简单案例:

考虑一个有nL1,...,Ln,并且每层Li只有一个假设检验族Fi1的多重检验。

在多层family-based graphical approach下,Fi1 的初始检验水准是α,如果i=1;其余family分到的检验水准都是0。此时transition coefficient为gi1k1=1,如果1in1,  k=i+1 。其他情况下gi1k1=0

该算法的证明见原论文Appendix A.2。

对于上述算法,有以下几点额外的考量:

  1. 如果每个family都使用了能控制住FWER的local procedure,并且这些procedure都能满足separability condition (i.e. 每个local procedure的error rate function都严格小于α),那么这个算法等价于parallel gatekeeping中的general multistage gatekeeping procedure (Dmitrienko et al.(2008))。这个非常重要,表示truncated step-wise的传统检验步骤可以被图形化
  2. 如果每个family都使用了能控制住FWER的local procedure,并且这些procedure的error rate function的上确界为e(I)=α, for any I,那么这个算法等价于特定的serial gatekeeping,比如传统Holm,fixed sequence procedure等;
  3. 如果每个family都只有1个假设检验,其等价于fixed sequence procedure;
  4. 如果一个family内,null p-values之间的相关性已知,那么对于local procedures我们有更多选择,比如传统或者truncated Hochberg.

Example 1

Consider the Type II diabetes clinical trial example in Dmitrienko et al. (2007).

The trial compares three doses of an experimental drug (Doses L, M and H) versus placebo (Plac) with respect to one primary endpoint (P: Haemoglobin A1c), and two secondary endpoints (S1: Fasting serum glucose; S2: HDL cholesterol).

The three endpoints will be examined at each of the three doses, so a total of nine null hypotheses will be formulated and grouped into three families, F1, F2 and F3.

Family F1 consists of three dose-placebo comparisons corresponding to the primary endpoint

  • H vs Plac (H11)
  • M vs Plac (H12)
  • L vs Plac (H13)

Similarly, family F2 and F3 consists of three dose-placebo comparisons corresponding to the secondary endpoint and family F3 consists of three dose-placebo comparisons corresponding to the secondary endpoint.

The overall Type I error rate is pre-specified at α=0.05 and the raw p-values for the nine null hypotheses are given in the following table.

In this example, we assume that the primary endpoint P is more important than the secondary endpoints S1 and S2, thus F1 is always tested before testing F2 and F3.

Suppose that the secondary endpoints S1 and S2 are equally important, thus F2 and F3 are grouped into the same layer; the dose-placebo comparisons within each family are ordered a priori (H vs. Plac through L vs. Plac).

We choose the conventional fixed sequence procedure as local procedure for each family and the initial allocation of critical values for F1, F2 and F3 are 0.04, 0.005, and 0.005, respectively. Once F1 is tested and all of its hypotheses are rejected, its critical value is equally allocated to F2 and F3.

The above figure visualizes this gatekeeping strategy. We start testing F1 at level 0.04; all of three hypotheses in F1 are rejected using the conventional fixed sequence procedure.

Then, all of its local critical value 0.04 is equally assigned to F2 and F3 and the updated critical values for F2 and F3 become 0.005+0.02=0.025. We continue to test F2 and F3 at level 0.025 in any order using the conventional fixed sequence procedure; the resulting rejected hypotheses are H21,H31 and H32.

Finally, the testing results of Procedure are summarized in the above table. In addition, a figure below provides a graphical visualization for Procedure by using the hypothesis-based graphical approach.

Example 2

Consider an example as below

In hypothesis-based graphical approach, the initial weights for hypotheses H1,H2,H3,H4 are 14α=14×0.025=0.00625 respectively. The ϵ denotes the infinitesimally small weight for significance shifting between families.

Suppose the raw p-value for these 6 hypotheses are

In the environment of R, the ϵ is approximated as 0.00001. The code of generating the gMCP graph is

1
2
3
4
5
6
7
8
9
10
11
12
13
m <- rbind(H1=c(0, 0.5, 0.5, 0, 0, 0),
H2=c(0.5, 0, 0, "0.5-\\epsilon", "\\epsilon/2", "\\epsilon/2"),
H3=c(0.5, 0, 0, 0.5, 0, 0),
H4=c(0, "0.5-\\epsilon", 0.5, 0, "\\epsilon/2", "\\epsilon/2"),
H5=c(0, 0, 0, 0, 0, 1),
H6=c(0, 0, 0, 0, 1, 0))
weights <- c(0.25, 0.25, 0.25, 0.25, 0, 0)
graph1 <- new("graphMCP", m=m, weights=weights)
pvalues <- c(0.005, 0.007, 0.004, 0.004, 0.00626, 0.002)
gMCP(graph1, pvalues, test="Bonferroni", alpha=0.025,verbose = TRUE, eps = 0.00001)

# Import to GUI
graphGUI(graph1,pvalues=pvalues)

注意:gMCP()在RStudio环境中极易崩溃,需用原生R来运行

We can break down the MTP into steps as

  1. Reject H1 and pass its level to H2 and H3

  2. Reject H3, note that 7.510534α using the Algorithm 1 in Bretz et al.(2009) to update the weight.

    In this graph, j=3,l=4,k=5, the graph update algorithm is

glkglk+gljgjk1gljgjl=ϵ/2+1/2×011/2×2/3=3α4.

其余权重更新具体数值可以根据这个公式计算

That is

  1. This graph is updated to, and we can continuously reject H2

  2. Reject H4

  3. Finally both of H5 and H6 can be rejected.

Could it be re-written as family-based graph approach?

Suppose F1={H1,H3,H2,H4}L1, F2={H5,H6}L2.

It is obvious that F1 is more important that F2. Consider a case that all the hypotheses in F1 are not significant, e.g.

1
2
3
# Suppose H1 and H3 are not significant!
pvalues2 <- c(0.08, 0.007, 0.08, 0.004, 0.00626, 0.002)
graphGUI(graph1,pvalues=pvalues2)

After rejecting H2,H4 in F1, the graph turns into the following one

F2 can not be tested unless the hypotheses in F1 is significant.

由此,我们可以看出该检验的层次结构,F1F2具有同等重要性,而F3的检验需要基于F1F2 的检验结果。

The family-based graph looks like

Example 3 - Truncated Holm

Consider the 4 hypotheses truncated holm MTP (parallel gatekeeping) in hypothesis-based graph approach and the prespecified significance level is α=0.05. The detailed algorithm of truncated holm is described in Dmitrienko et al. (2008).

Let F1={H1,H2} and F2={H3,H4}.

If H2 is rejected, j=2,l=1,k=3, the graph update algorithm based on the removal of H2 is

g13g13+g31g231g21g12=1γ+γγ22(1γ2)=1/2.

Similarly, g14=1/2:

In family-based graphical approach,

where weight w in the graph is calculated based on the error rate function w={1[γ+1γnA1],A11,A1= Note that A1 is the number of hypotheses accepted in F1 and n is the number of hypotheses in F2.

Firstly, test hypotheses in F1 using truncated Holm:

  • If all hypotheses in F1 are tested and rejected using truncated Holm at initial α1=α=0.05, the w=1 since A1=. Therefore, F2 will be tested on level α2=α10=α;
  • If only one hypothesis in F1 is tested and rejected at initial α1=α=0.05, the w=1212γ since A1. That is, F2 will be tested on level α2=α1e(A1)=α[γ+1γ2]α=[1212γ]α;
  • If all hypotheses in F1 are tested and accepted using truncated Holm at initial α1=α=0.05, the w=0. Therefore, F2 will not be tested.

Reference

Bretz, Frank & Maurer, Willi & Brannath, Werner & Posch, Martin. (2009). A graphic approach to sequentially rejective multiple test procedures. Statistics in medicine. 28. 586-604. 10.1002/sim.3495.

Dmitrienko, Alex & Tamhane, Ajit & Wiens, Brian. (2008). General Multistage Gatekeeping Procedures. Biometrical journal. Biometrische Zeitschrift. 50. 667-77. 10.1002/bimj.200710464.

Qiu, Zhiying & Li, Yu & Guo, Wenge. (2018). A Family-based Graphical Approach for Testing Hierarchically Ordered Families of Hypotheses. 10.13140/RG.2.2.23109.29929.