Recent case study highlights the importance of applying statistical rigor throughout the development and validation processes of biomarkers.
Drug development in many diseases is now shifting toward molecularly targeted treatments that often rely on prognostic and predictive biomarkers for their application. Confronted with such major breakthroughs in the evolution toward personalized or precision medicine, the analytical and clinical validation of biomarkers and their eventual registration as in vitro diagnostic devices (IVD) has received more attention recently. This article will provide a discussion around the need for integration of biomarker validation in clinical development, with a focus in oncology. It will highlight how flawed analytical and/or clinical validation can jeopardize a biomarker’s and drug’s clinical utility and showcase why biomarkers deserve statistical rigor throughout the development and validation process.
Precision medicine relies on validated biomarkers that allow classification of patients by their probable disease risk, prognosis, or response to treatment.1 Diagnostic biomarkers identify patients who are eligible for a given treatment due to the presence of a molecular alteration, such as mutation or gene rearrangement. A prognostic biomarker is used to identify the likelihood of a clinical event, such as disease recurrence or death. More relevant to precision medicine, predictive biomarkers distinguish patients who are likely to benefit from a given treatment, or conversely, to benefit from alternative treatments. A topical example in oncology is the presence of activating mutations of the epidermal growth factor receptor (EGFR); such mutations are associated with benefit from treatment with tyrosine-kinase inhibitors in non-small-cell lung cancer (NSCLC).2 Importantly, establishing that a biomarker is predictive generally requires a comparison against control treatment in individuals with and without the biomarker, usually in randomized trials.1
The development and qualification of biomarkers are keys to the future of drug development and precision medicine, particularly in oncology. A validated predictive biomarker may become a companion diagnostic (CDx), defined by FDA as a medical device, often an in-vitro diagnostic device (IVD), which provides information that is essential for the safe and effective use of a corresponding drug or biological product.3 Moreover, integrating biomarkers into the therapeutic development process may allow less promising projects to be stopped earlier (especially before they enter into costly Phase III trials), thus optimizing the total cost of drug development.
A biomarker is only as good as the procedure used to measure it. Validation is the process of assessing the biomarker and its measurement performance characteristics and determining the range of conditions under which the biomarker will give reproducible and accurate data. Validation of a biomarker is a necessary component to ensure delivery of high-quality data necessary for effective clinical use of biomarkers. Biomarkers’ performance is evaluated at three different levels:
The prerequisites for analytical validation are the unambiguous identification of the biomarker, the specification of the components of IVD (including the definition of a cut-off value when test outcomes are reported as positive/negative), and the definition of the context of use. With regard to the latter, the results obtained with the IVD should be “reliably correct”, and the validation process should be “fit-for-purpose” (i.e., the analytical performance of the CDx is suitable for the intended clinical use). Typically, companion diagnostics make the distinction between biomarker-positive and biomarker-negative individuals (so called “qualitative tests”) based on a cut-off applied to quantitative measurements that is often determined on retrospectively analyzed samples. The cut-off is an inherent part of qualitative IVDs and should be available before Phase III trials, for example, based on samples prospectively collected in well-designed Phase II trials. Sample size may be a critical issue, with an unreliable cut-off if too few samples are used; formulae are available in order to optimize the precision of the cut-off value and, hence, the clinical sensitivity and/or specificity of the test.
Two of the most important analytical performance characteristics to be validated are the test’s precision and accuracy. Precision indicates the closeness of agreement between replicate measurements under specified testing conditions. Such conditions determine the type of precision evaluated, ranging from repeatability to reproducibility. For qualitative tests as companion diagnostics, the closeness of agreement is evaluated against the majority call for a given sample. For qualitative tests, analytical accuracy refers to a pair of agreement measures, positive and negative percent agreement (PPA, NPA), with a reference standard. In analytical validation, it is essential to assess agreements between a candidate CDx and a reference standard. When a single measurement of the test under validation and the reference are available in the validation study, it becomes impossible to disentangle bias from random error, as well as accuracy from precision.
With regard to clinical validation, the ideal pathway for co-development of the drug and its CDx has been outlined by FDA as in Figure 1 below, which displays the parallel activities that should ideally take place for the drug and the biomarker.4
Click image to enlarge
Therefore, clinical validation spans Phase II and III trials, the latter including both pivotal and confirmatory trials, when needed. Moreover, in the ideal scenario, pivotal and confirmatory studies will be used to assess the clinical utility of the drug and the CDx simultaneously. Therefore, both in Phase II and III clinical trials, biomarker-driven trial designs assume fundamental importance in the era of precision medicine.
There are several clinical-trial designs which can be useful in the co-development of the drug-CDx pair.5,6 The choice among these designs depends on several issues, such as the clinical development phase, the degree of validation of the biomarker, and the need to assess both biomarker-positive and biomarker-negative patients. Enrichment designs, for example, will exclude biomarker-negative patients, and will often randomize biomarker-positve patients to the experimental drug or control. There are numerous examples in oncology of designs of this type; one recent case is the Phase III trial of osimertinib in the first-line therapy of advanced NSCLC with an EGFR mutation.7
Another design that only includes biomarker-positive patients is the basket trial, which is often used in Phase II and allows enrollment of different cancer types harbouring the same molecular alteration. In this case, all patients receive the experimental drug, but nothing prevents researchers to also randomize their patients when this is considered feasible. With regard to the trial mechanics and statistical underpinnings, at least four types of basket trials have been proposed:
As a recent example of a biomarker-driven clinical trial with informal borrowing across different tumors, we were faced with the development of a novel drug for patients with one of five tumor types (NSCLC, melanoma, renal-cell cancer, triple-negative breast cancer, and castration-resistant prostate cancer), all unmet medical needs. For each of these tumors, monotherapy and combination were of interest to the sponsor. For safety and dose-finding, we proposed a randomized 1:1:1 design between low-dose monotherapy, high-dose monotherapy, and dose-escalating combination.
Once the best dose for monotherapy and combination were known, we implemented a randomized stratified Phase II design for each indication, allocating patients to the single-agent arm or the experimentally tested combination therapy.
This resulted in 10 individual single-arm trials, two for each of the five tumor types, and one each for monotherapy or combination. In each trial, the design allowed for two interim analysis, the first using Gehan’s rule, and the second consisting in a Sargent’s two-stage design (see Figure 2 below).10
Click image to enlarge
Unlike conventional Phase II designs, which allow for a “positive” and a “negative” outcome based on observed responses, Sargent’s design allows for a third, “inconclusive” outcome, in which the response rate is neither so low as to stop development or so high as to warrant a Phase III trial. When the outcome is inconclusive, the sponsor can decide to continue or stop the development based on other considerations, including safety and informal borrowing of information across strata. One potential advantage of this design is the typically smaller sample size, when compared with conventional Phase II designs.
Nevertheless, the trial consisting of 10 individual single-arm designs still has a considerably large sample size, with a maximal enrollment of 480 patients. This maximum sample size would only be reached if for all trials both futility bars are successfully passed. Interestingly, this sample size is driven by the analytical performance of the test, since the observed treatment effect will be diluted for a biomarker with poor diagnostic accuracy. In a biomarker-enrichment study, the dilution factor implicitly applied to the true treatment effect depends on the tests PPV and “R”, the ratio of treatment effects in true biomarker-positive over biomarker-negative patients.
More specifically, this dilution factor is (1-PPV) x R, and this relationship is depicted in Figure 3 below.11 Even for biomarkers with a reasonably high accuracy of 90% and relatively low ratio of effects in biomarker-positive vs. -negative patients of 2, the dilution effect will be already 10%. With a more modest treatment effect, a higher sample size is needed to significantly show the experimental drug or regimen reaches better clinical outcomes compared with the current standard of care.
Click image to enlarge
A popular strategy for co-development entails the implementation of a bridging study, applied when the pivotal clinical trial was conducted with another assay than the CDx under validation. A bridging study consists in assessing the concordance between the clinical-trial assay (CTA) and a new assay that is intended to be a CDx. According to the FDA, the problem to be addressed in bridging studies is to estimate the drug efficacy in the CDx-defined subpopulation; in other words, to confirm that the clinical benefit of the drug would have been maintained had the CDx IVD kit been used instead of the CTA. The statistical approach in this case depends on the availability of samples from the pivotal trial, as well as the design of such trial (e.g., all-comers vs enrichment designs). When no or insufficient samples from the pivotal trial are available, an external concordance study should be considered.
This is also the case for the validation of a follow-on CDx. A follow-on CDx is an in vitro companion diagnostic device that seeks the same therapeutic indication in its intended use as in the intended use of a the approved CDx.11 In this case, the design and estimation of performance depend on the availability of a reference test and on the sampling distribution of results both for the follow-on CDx and the CDx.
Our experience suggests that analytical performance drives clinical performance. A flawed analytical and/or clinically validation can jeopardize a biomarker’s and drug’s clinical utility. Therefore, statistical rigor is needed throughout the development and validation processes of biomarkers.
Elisabeth Coart, PhD, is Director of Consulting Services, IDDI