Integration of Biomarker Validation in Clinical Development in Oncology

May 28, 2020
Elisabeth Coart, PhD

Applied Clinical Trials

The development and qualification of biomarkers are keys to the future of drug development and precision medicine, particularly in oncology.

Precision medicine relies on validated biomarkers that allow classification of patients by their probable disease risk, prognosis, or response to treatment.Diagnostic biomarkers identify patients who are eligible for a given treatment due to the presence of a molecular alteration, such as mutation or gene rearrangement. A prognostic biomarker is used to identify the likelihood of a clinical event, such as disease recurrence or death. More relevant to precision medicine, predictive biomarkers distinguish patients who are likely to benefit from a given treatment, or conversely, to benefit from alternative treatments. A topical example in oncology is the presence of activating mutations of the epidermal growth factor receptor (EGFR); such mutations are associated with benefit from treatment with tyrosine-kinase inhibitors in non-small-cell lung cancer (NSCLC).2  Importantly, establishing that a biomarker is predictive generally requires a comparison against control treatment in individuals with and without the biomarker, usually in randomized trials.1

The development and qualification of biomarkers are keys to the future of drug development and precision medicine, particularly in oncology. A validated predictive biomarker may become a companion diagnostic (CDx), defined by the Food and Drug Administration (FDA) as a medical device, often an in-vitro diagnostic device (IVD), which provides information that is essential for the safe and effective use of a corresponding drug or biological product.Moreover, integrating biomarkers into the therapeutic development process may allow less promising projects to be stopped earlier (especially before they enter into costly phase 3 trials), thus optimizing the total cost of drug development.

Analytical validation

A biomarker is only as good as the procedure used to measure it. Validation is the process of assessing the biomarker and its measurement performance characteristics and determining the range of conditions under which the biomarker will give reproducible and accurate data. Validation of a biomarker is a necessary component to ensure delivery of high-quality data necessary for effective clinical use of biomarkers. Biomarkers’ performance is evaluated at three different levels: 

  • Analytical performance is the ability of a biomarker assay to measure the underlying biomarker quantity under a variety of conditions;

  • Clinical performance is the ability of the assay to inform about a clinical condition of interest;

  • Clinical utility is an assay’s ultimate ability to improve clinical outcomes.

The prerequisites for analytical validation are the unambiguous identification of the biomarker, the specification of the components of IVD (including the definition of a cut-off value when test outcomes are reported as positive/negative), and the definition of the context of use. With regard to the latter, the results obtained with the IVD should be “reliably correct”, and the validation process should be “fit-for-purpose” (i.e., the analytical performance of the CDx is suitable for the intended clinical use). Typically, companion diagnostics make the distinction between biomarker-positive and biomarker-negative individuals (so called ‘qualitative tests’) based on a cut-off applied to quantitative measurements that is often determined on retrospectively analyzed samples. The cut-off is an inherent part of qualitative IVDs and should be available before Phase III trials, for example, based on samples prospectively collected in well-designed Phase II trials. Sample size may be a critical issue, with an unreliable cut-off if too few samples are used; formulae are available in order to optimize the precision of the cut-off value and hence the clinical sensitivity and/or specificity of the test.

Two of the most important analytical performance characteristics to be validated are the test’s precision and accuracy. Precision indicates the closeness of agreement between replicate measurements under specified testing conditions. Such conditions determine the type of precision evaluated, ranging from repeatability to reproducibility. For qualitative tests as companion diagnostics, the closeness of agreement is evaluated against the majority call for a given sample, For qualitative tests, analytical accuracy refers to a pair in agreement measures, positive and negative percent agreement (PPA,NPA), with a reference standard. In analytical validation, it is essential to assess agreements between a candidate CDx and a reference standard. When a single measurement of the test under validation and the reference are available in the validation study, it becomes impossible to disentangle bias from random error, as well as accuracy from precision.

Clinical validation and utility

With regard to clinical validation, the ideal pathway for co-development of the drug (Rx below) and its CDx has been outlined by FDA as in the following figure, which displays the parallel activities that should ideally take place for the drug and the biomarker4:

Therefore, clinical validation spans Phase II and III trials, the latter including both pivotal and confirmatory trials, when needed. Moreover, in the ideal scenario pivotal and confirmatory studies will be used to assess the clinical utility of drug and the CDx simultaneously. Therefore, both in Phase II and Phase III clinical trials, biomarker-driven trial designs assume fundamental importance in the era of precision medicine.

Biomarker-driven clinical trials

There are several clinical-trial designs which can be useful in the co-development of the drug-CDx pair.5,6 The choice among these designs depends on several issues, such as the clinical development phase, the degree of validation of the biomarker, and the need to assess both biomarker-positive and biomarker-negative patients. Enrichment designs, for example, will exclude biomarker-negative patients, and will often randomize biomarker-positve patients to the experimental drug or control. There are numerous examples in oncology of designs of this type; one recent case is the Phase III trial of osimertinib in the first-line therapy of advanced NSCLC with an EGFR mutation.

Another design that only includes biomarker-positive patients is the basket trial, which is often used in Phase II and allows enrolment of different cancer types harboring the same molecular alteration. In this case, all patients receive the experimental drug, but nothing prevents researchers to also randomize their patients when this is considered feasible. With regard to the trial mechanics and statistical underpinnings, at least four types of basket trials have been proposed:

  • Exploratory designs, not powered, which may be used in one or more cohort of patients, defined, for example, on the basis of tumor types8;

  • Classical Simon’s two-stage design, which may be used in one or more cohort of patients, defined, for example, on the basis of tumor types8;

  • Bayesian basket design, which allows statistically formal borrowing of information across tumor types9;

  • Sargent’s design, which allows informal borrowing across tumor types (see below).10


As a recent example of a biomarker-driven clinical trial with informal borrowing across different tumors, we were faced with the development of a novel drug for patients with one of five tumor types (NSCLC, melanoma, renal-cell cancer, triple-negative breast cancer, and castration-resistant prostate cancer), all unmet medical need. For each of these tumors, monotherapy and combination were of interest to the sponsor. For safety and dose-finding, we proposed a randomized 1:1:1 design between low-dose monotherapy, high-dose monotherapy, and dose-escalating combination. Once the best dose for monotherapy and combination were known, we implemented a randomized stratified Phase II design for each indication, allocating patients to the single-agent arm or the experimentally tested combination therapy. This resulted in 10 individual single-arm trials, two for each of the five tumor types, and one each for monotherapy or combination. In each trial, the design allowed for two interim analysis, the first using Gehan’s rule, and the second consisting in a Sargent’s two-stage design:10

Unlike conventional Phase II designs, which allow for a “positive” and a “negative” outcome based on observed responses, Sargent’s design allows for a third, “inconclusive” outcome, in which the response rate is neither so low as to stop development or so high as to warrant a Phase III trial. When the outcome is inconclusive, the Sponsor can decide to continue or stop the development base on other considerations, including safety and informal borrowing of information across strata. One potential advantage of this design is the typically smaller sample size, when compared with conventional Phase II designs.

Nevertheless, the trial consisting of 10 individual single-arm designs still has a considerably large sample size, with a maximal enrollment of 480 patients. This maximum sample size would only be reached if for all trials both futility bars are successfully passed. Interestingly, this sample size is driven by the analytical performance of the test, since the observed treatment effect will be diluted for a biomarker with poor diagnostic accuracy. In a biomarker-enrichment study, the dilution factor implicitly applied to the true treatment effect depends on the tests PPV and “R”, the ratio of treatment effects in true biomarker positive over biomarker negative patients. More specifically, this dilution factor is (1-PPV) x R, and this relationship is depicted in the Figure below.11 Even for biomarkers with a reasonably high accuracy of 90% and relatively low ratio of effects in biomarker-positive vs -negative patients of 2, the dilution effect will be already 10%. With a more modest treatment effect, a higher sample size is needed to significantly show the experimental drug or regimen reaches better clinical outcomes compared with the current standard of care.

Bridging studies

A popular strategy for co-development entails the implementation of a bridging study, applied when the pivotal clinical trial was conducted with another assay than the CDx under validation. A bridging study consists in assessing the concordance between the clinical-trial assay (CTA) and a new assay that is intended to be a CDx. According to the FDA, the problem to be addressed in bridging studies is to estimate the drug efficacy in the CDx-defined subpopulation; in other words, to confirm that the clinical benefit of the drug would have been maintained had the CDx IVD kit been used instead of the CTA. The statistical approach in this case depends on the availability of samples from the pivotal trial, as well as the design of such trial (e.g., all-comers vs enrichment designs). When no or insufficient samples from the pivotal trial are available, an external concordance study should be considered. This is also the case for the validation of a  follow-on CDx. A follow-on CDx is an in-vitro companion diagnostic device that seeks the same therapeutic indication in its intended use as in the intended use of a the approved CDx.11In this case, the design and estimation of performance depend on the availability of a reference test and on the sampling distribution of results both for the follow-on CDx and the CDx.


Our experience suggests that analytical performance drives clinical performance. A flawed analytical and/or clinically validation can jeopardize a biomarker’s and drug’s clinical utility. Therefore, statistical rigor is needed throughout the development and validation processes of biomarkers. 


Elisabeth Coart, PhD, Director of Consulting Services at IDDI



  1. FDA-NIH Biomarker Working Group. BEST (Biomarkers, EndpointS, and other Tools) Resource [Internet]. Silver Spring (MD): Food and Drug Administration (US); 2016-. Co-published by National Institutes of Health (US), Bethesda (MD).
  2. Mok TS, Wu YL, Thongprasert S, et al. Gefitinib or carboplatin-paclitaxel in pulmonary adenocarcinoma. N Engl J Med 2009;361:947-57.
  3. Food and Drug Administration. In Vitro Companion Diagnostic Devices Guidance for Industry and Food and Drug Administration Staff. Available at
  4. Hu Y-F. Development of Companion Diagnostics – An FDA Perspective. Available at
  5. Freidlin B, McShane LM, Korn EL. Randomized clinical trials with biomarkers: design issues. J Natl Cancer Inst. 2010;102:152-60.
  6. Buyse M, Michiels S, Sargent DJ, et al. Integrating biomarkers in clinical trials. Expert Rev Mol Diagn. 2011;11:171-82.
  7. Ramalingam SS, Vansteenkiste J, Planchard D, et al. Overall Survival with Osimertinib in Untreated, EGFR-Mutated Advanced NSCLC. N Engl J Med. 2020;382:41-50.
  8. Simon R. Optimal two-stage designs for phase II clinical trials. Control Clin Trials 1989;10:1-10.
  9. Simon R, Geyer S, Subramanian J, Roychowdhury S. The Bayesian basket design for genomic variant-driven phase II trials. Semin Oncol 2016; 43:13-18.
  10. Sargent DJ, Chan V, Goldberg RM. A three-outcome design for phase II clinical trials. Control Clin Trials 2001; 22:117-25.
  11. Li M. Statistical Methods for Clinical validation of follow-on companion diagnostic devices via an external concordance study. Stat Biopharm Res 2016; 8:355-363.

Related Content:

Online Extras