Image Credit: © Sergey Nivens - stock.adobe.com
Clinical trial failure rates are estimated to be around 90%, with rates varying by trial phase, therapeutic area and drug class and target.1-3 Trial failures can be attributed to many factors, most commonly the drug optimization process and the pharmaceutical companies’ trial management strategy.2,3 Some often overlooked but critical factors are the rates of inaccuracy, error, and variability in data from clinical outcomes assessments (COAs) used to determine efficacy in a clinical trial. COAs are intended to measure pertinent aspects of a trial participant’s health relating to the trial endpoints. If COAs support the primary outcomes in a study, they effectively determine the success or failure of the trial.4 COA measures can be completed or rated by clinical professionals via a clinician-reported outcome (ClinRO), by trial participants using a patient-reported outcome (PRO), by an observer (e.g., a caregiver) through a observer-reported outcome, or based on performance outcome measures. COAs are subjective and can be influenced by judgment, training and motivation.5,6 For example:
- Raters may have clinical experience with a therapeutic area, but not with specific assessments.
- Even with knowledge of the assessments, raters may not perform each in the exact same way, using the same criteria every time. Regulators such as the European Medicines Agency and FDA recommend clinical study personnel, including raters, be trained.7,8
- Participants (subjects and/or study partners/caregivers) may not understand the terminology used or how it is applied in their disorder.9
- Many don’t understand the nuances of severity ratings (e.g., mild, moderate, severe).10
- There can be inconsistency among multiple study partners/caregivers.
As ClinROs are susceptible to variations in rating and lack of precision,11 it is critical to conduct training to enhance rater reliability and precision by standardizing scale administration and scoring.
Rater training is a well-recognized mechanism for improving the signal-to-noise ratio by minimizing inconsistency, errors, and variability (e.g., inter- and intra-rater reliability) and bias in trials.6,12,13,14
Rater training is employed to varying degrees across therapeutic areas. In central nervous system clinical trials, training assumes a pivotal role, ensuring the reliability of measurements and the sensitivity to detect changes throughout the course of a trial.15 In many CNS trials, ClinROs and PROs are used in the absence of objective outcomes.
Other therapeutic areas such as dermatology and rheumatology use COAs that are prone to significant variability and subjectivity of the scales. Complex scales, such as the British Isles Lupus Assessment group (BILAG) disease activity index used in trials for lupus, may require the rater to consider current and historic symptoms and laboratory results to distinguish manifestations associated with active disease16 and training raters that significantly improve inter- and intra-rater agreement.17 In dermatology, body surface area measurement has been shown to be highly variable, even among trained individuals,18 speaking to the need for within-study training to improve consistency among raters.
With participant or observer/caregiver assessments, individuals may not understand concepts, terminology, or the measurement scale, resulting in spurious or highly variable data.19 Implementing educational materials for participants that outline the terminology used (e.g., what is fatigue) and/or expectations on specific tasks and complex diaries (e.g., Hauser et al.’s Parkinson’s disease diary20 and epilepsy or migraine diaries) will improve data quality.
The success of a clinical trial depends on the ability to determine if a treatment is safe and effective. When data demonstrate high variability due to inconsistent administration within and among raters, subjective scoring criteria, erroneous rating, biases, or any other behavior adversely impacting data quality, the potential for a trial to fail is increased. Training raters, be they clinical professionals, trial participants, or study partners, can improve the ability to detect true treatment effects by improving data quality, contributing to a successful clinical trial.
Authored on behalf of Critical Path Institute’s eCOA Consortium by Rinah Yamamoto, principal scientist, Clinical ink; Jennifer Olt, senior clinical scientist, Signant Health; Sayaka Machizawa, associate director, clinical science, Signant Health; and Martina Micaletto, senior clinical scientist, Signant Health.
- FDA, "Step 3: Clinical Research," 2018. [Online]. Available: https://www.fda.gov/patients/drug-development-process/step-3-clinical-research. [Accessed 2023].
- D. Sun, W. Gao, H. Hu and S. Zhou, "Why 90% of clinical drug development fails and how to improve it?," Acta Pharmaceutica Sinica B, vol. 12, no. 7, pp. 3049-3062, 2022.
- E. Kim, Y. Jaehoon, S. Park and K. Shin, "Factors Affecting Success of New Drug Clinical Trials," Therapeutic Innovation & Regulatory Science, vol. 57, no. 4, pp. 737-750, 2023.
- J. B. Williams and K. A. Kobak, "Development and reliability of a structured interview guide for the Montgomery Åsberg Depression Rating Scale (SIGMA)," The British Journal of Psychiatry, vol. 192, pp. 52-58, 2008.
- K. A. Kobak, J. M. Kane, M. E. Thase and A. A. Nierenberg, "Why Do Clinical Trials Fail? The Problem of Measurement Error in Clinical Trials: Time to Test New Paradigms?," Journal of Clinical Psychopharmacology, vol. 27, no. 1, pp. 1-5, 2007.
- M. K. Walton, J. H. Powers, J. Hobart, D. L. Patrick, P. Marquis, S. Vamvakas and M. Isaac, "Clinical Outcome Assessments: Conceptual Foundation–Report of the ISPOR Clinical Outcomes Assessment – Emerging Good Practices for Outcomes Research Task Force Value in Health, vol. 18, no. 6, pp. 741-752, 2015.
- EMA: International Council for Harmonisation, ICH guideline E6 on good clinical practice: Draft ICH E6 principles, 2021.
- FDA, Patient-Focused Drug Development: Incorporating Clinical Outcome Assessments Into Endpoints For Regulatory Decision-Making: Guidance for Industry, Food and Drug Administration Staff, and Other Stakeholders, 2023.
- R. T. Yamamoto, E. Durand, S. T. Gary, J. M. Tuller and S. M. Dallabrida, "Patient reported outcomes (pros) are subject to interpretation errors: PSY98 Patients' understanding of how to report pain severity over a period of time," in Value in Health, vol. 20, p. S226, 2017.
- R. Yamamoto, "Common Symptom Terminology is Frequently Misunderstood," Poster W-14 at Drug Information Association (DIA) Global Annual Meeting, San Diego, 2019.
- J. H. Powers, D. L. Patrick, M. K. Walton, P. Marquis, S. Cano, J. Hobart, M. Isaac, S. Vamvakas, A. Slagle, E. Molsen and L. B. Burke, "Clinician-Reported Outcome Assessments of Treatment Benefit: Report of the ISPOR Clinical Outcome Assessment Emerging Good Practices Task Force," Value in Health, vol. 20, no. 1, pp. 2-14, 2017.
- K. A. Kobak, A. D. Feiger and J. D. Lipsitz, "Interview Quality and Signal Detection in Clinical Trials," American Journal of Psychiatry, vol. 162, no. 3, p. 628, 2005.
- 13 K. A. Kobak, N. Engelhardt, J. B. W. Williams and J. D. Lipsitz, "Rater Training in Multicenter Clinical Trials: Issues and Recommendations," Journal of Clinical Psychoparmacology, vol. 24, no. 2, pp. 113-117, 2004.
- M. J. Muller and A. Szegedi, "Effects of Interrater Reliability of Psychopathologic Assessment on Power and Sample Size Calculations in Clinical Trials," Journal of Clinical Psychpharmacology, vol. 22, no. 3, pp. 318-325, 2002.
- M. G. Opler, C. Yavorsky and D. G. Daniel, "Positive and Negative Syndrome Scale (PANSS) Training: Challenges, Solutions, and Future Directions," Innovations in Clinical Neuroscience, vol. 14, no. 11-12, pp. 77-81, 2017.
- D. Isenberg, A. Rahman, E. Allen, V. Farewell, M. Akil, B. IN, D. D'Cruz, B. Griffiths, M. Khamashta, P. Maddison, N. McHugh, M. Snaith, L. Teh, C. Yee, A. Zoma and C. Gordon, "BILAG 2004. Development and initial validation of an updated version of the British Isles Lupus Assessment Group’s disease activity index for patients with systemic lupus erythematosus," Rheumatology, vol. 44, no. 7, pp. 902-906, 2005.
- C.-S. Yee, V. Farewell, D. A. Isenberg, A. Prabu, K. Sokoll, L.-S. Teh, A. Rahman, I. N. Bruce, B. Griffiths, M. Akil, N. McHugh, D. D'Cruz, M. A. Khamashta, S. Bowman, P. Maddison, A. Zoma, E. Allen and C. Gordon, "Revised British Isles Lupus Assessment Group 2004 index: a reliable tool for assessment of systemic lupus erythematosus activity," Arthritis & Rheumatism, vol. 54, no. 10, pp. 3300-3305, 2006.
- J. Scarisbrick and S. Morris, "How big is your hand and should you use it to score skin?," British Journal of Dermatology, vol. 169, no. 2, pp. 260-265, 2013.
- T. Poepsel, A. Nolde, C. Hadjidemetriou, R. Israel, R. Browning and S. McKown, "PCR248 Global Problems in PGI Measures: The Patients' Perspective on and Solutions to Poor PRO Word Choice," Value in Health, vol. 26, no. 6, p. S359, 2023.
- R. A. Hauser, J. Friedlander, T. A. Zesiewicz, C. H. Adler, L. C. Seeberger, C. F. O'Brien, E. S. Molho and S. A. Factor, "A Home Diary to Assess Functional Status in Patients with Parkinson’s Disease with Motor Fluctuations and Dyskinesia," Clinical Neuropharmacology, vol. 23, no. 2, pp. 75-81, 2000.