Standard Conventions for Time-to-Event (TTE) Analyses
The following are standard conventions for time-to-event (TTE) analyses
The median, percentiles and probabilities at particular points in time are estimated using the method of Kaplan & Meier, 1958.
Confidence intervals for the median and percentiles are generated by the method of Brookmeyer & Crowley, 1982.
Confidence intervals for estimated probability of event by time t may be generated using the log(-log) method with back transformation to a confidence interval on the untransformed scale.
Log-rank or stratified log-rank tests typically are used for comparing treatment arms. Other tests may be specified for sensitivity analyses.
Cox proportional hazards regression is used for estimating hazard ratios and for generating a confidence interval for the hazard ratio. It is also used for modeling.
Restricted Mean Survival Time (RMST)
In a longitudinal trial to compare 2 treatment arms, the primary endpoint is often the time to a specific event (e.g., disease progression, death). The hazard ratio estimate from the Cox proportional hazard (PH) model is routinely used to empirically quantify the between-arm difference under the assumption that the ratio of the 2 hazard functions is approximately constant over time. When this assumption is plausible, such a ratio estimate may capture the relative difference between 2 survival curves. However, the clinical meaning of such a ratio estimate is difficult, if not impossible, to interpret when the underlying PH assumption is violated (i.e., the hazard ratio is not constant over time).
The restricted mean survival time (RMST) is a robust and clinically interpretable summary measure of the survival time distribution and is equivalent to the area under the Kaplan-Meier curve from ’start date’ through a specific cutoff point. Unlike median survival time, it is estimable even under heavy censoring. There is a considerable body of methodological research (including Uno et al, 2014; Zhang, 2013) about the use of RMST to estimate treatment effects as an alternative to the hazard ratio approach. Zhang,2013 provides a good overview of the methodology and associated considerations.
The RMST methodology is applicable independently of the PH assumption and can be used, at a minimum, as a sensitivity analysis to explore the robustness of the primary analysis results. However, when large departures from the PH assumption are observed, the log-rank test is underpowered to detect differences between the survival distributions for the treatment arms, and a test of the difference between the RMST for the experimental arm and the control arm may be more appropriate to determine superiority of the experimental arm compared to the control arm with respect to the TTE endpoint.
In particular, as it pertains to the cutoff point \(\tau\) to evaluate the RMST it is noted that the cutoff point should not exceed the minimum of the largest observed time for both treatment arms so that the RMST of all treatment arms being evaluated can be adequately estimated and comparison between treatments is feasible; \(t\) should be clinically meaningful and closer to the end of the study follow-up so that the majority of survival outcomes will be covered by the time interval. The RMST up to time \(t\) can then be interpreted as the expected survival time restricted to the common follow-up time \(t\) among all patients. To avoid arbitrary selection of the common cutoff \(t\) for both treatment arms, three sets of analyses will be performed:
\(\tau_1\) = minimum of (largest observed survival time for experimental arm, largest observed survival time for control arm). This is also the primary analysis for RMST to retain maximum information as recommended in the literature (Huang & Kuan, 2018; Tian et al, 2020).
\(\tau_2\) = minimum of (largest survival event time for experimental arm, largest survival event time for control arm).
\(\tau_3\) = midpoint between \(\tau_1\) and \(\tau_2\)
Here, ‘survival’ is meant to denote PFS, OS or any other TTE endpoints, as applicable.
Assessment of proportional hazards
Schoenfeld residuals for the (stratified) Cox proportional regression model may be plotted to investigate graphically violations of the PH assumption; a non-zero slope is evidence of departure from PH. The PH assumption may be formally tested using Schoenfeld’s residual test (Schoenfeld,1980; Therneau & Grambsch,2000). Large departures from PH could be evidenced by a p-value \(< 0.05\); note however that the test will not be sensitive to detect non-linear deviations from PH.
In addition, the PH assumption can be checked visually by plotting \(log(-log(S(t))\) versus \(log(t)\), within each randomization stratum, where \(S(t)\) is the estimated survival function at time \(t\).