Enrollment: More Than Numbers

February 1, 2012
John F. Tomera

Richard C. Walovitch

Vincent J. Girardi III

Applied Clinical Trials

Applied Clinical Trials, Applied Clinical Trials-02-01-2012, Volume 21, Issue 2

Partnering with CROs and using a blinded independent central review can increase trial success.

When it comes to gaining regulatory approval in a clinical trial, controlling variability is essential, especially in pivotal Phase III trials. It is here that the patient population being studied can not only end up determining the potential market size for a drug, but also impacts the drug's efficacy conclusion to determine if it meets its primary endpoint.1 Performing an independent analysis of the data, such as a patient's diagnostic status, can have a significant impact on minimizing variability and is hence why patient enrollment plays such a key role in determining a trial's outcome.

Identifying the upfront problem

Too often patient enrollment is recognized as a numbers game where the objective of meeting aggressive enrollment rates is mistakenly achieved by enrolling patients who may not always meet all the inclusion or exclusion criteria.1 This is a common systemic issue that may be due in part to sites trying to meet aggressive enrollment goals or clinical operations staff trying to meet corporate mandates. This is usually done one of two ways, either the investigator enrolls a patient who they admit does not meet the inclusion/exclusion criteria and asks for a protocol exemption/waiver, or their view of the subjective patient enrollment data is biased by the interest to push the enrollment rate up. The first is easily tractable since detailed records of the exemption granted by the sponsor are monitored whereas the latter is more problematic, insidious, and requires active detailed monitoring and in many cases would benefit greatly by a blinded independent central review (BICR). Regardless of the cause the result is the same: a decrease in homogeneity of the population. This problem only seems to be growing and, although it is tracked in most Phase III trials by looking at the size of the intent to treat (ITT) population as compared to the per protocol population (PPP), it remains an area of minimal focus.

Since most registration trials require that the ITT population be used in the primary analysis, drugs are being evaluated with up to 50% of patients comprising the per protocol population (i.e., total pooled ITT population of 937 patients, of which 493 comprised the per protocol population).2 This added heterogeneity usually means an increase in variability of response and can put meeting the trial endpoints at risk. In addition, the potential for GCP violations due to multiple enrollment exemptions can often cause serious regulatory compliance issues.1

Pre-enrollment BICR of subjective enrollment criteria can play a significant role in controlling study population heterogeneity and has the potential to improve trial outcome. Using a BICR to evaluate patient inclusion/exclusion criteria can bring better standardization to the patients enrolled and mitigate variability. BICR are advocated by the FDA and are commonly performed in oncology registration trials when the trial's primary endpoint is based on tumor imaging.3 BICRs are usually done by contract research organizations (CROs) who manage the process and describe the procedures as it relates to acquiring the data, training the independent reviewers (IR), and monitoring the IR performance during the conduct of the study.4, 5, 6, 7

In most trials there are two opportunities for a BICR to provide objective review of potential trial patients. First is the inclusion/exclusion criteria assessment and then again upon enrollment, but before randomization, when multiple screening tests are performed. These assessments can be either a simple review of subjective criteria (e.g., cardiac echo quality) or a complex review requiring multiple clinical experts and multiple imaging studies (e.g., early Alzheimer's).

The inclusion/exclusion criteria should be based on both the sponsor's knowledge of the action or effect of the investigational material and on information from prior studies. Exclusion criteria should not be so restrictive that the available population is too limited to obtain the enrollment goal. However, the inclusion criteria should not be so lax that the acceptable population becomes too broad to prove effectiveness of the study article. Once it has been determined that the patient has met the inclusion/exclusion criterion, the patient is enrolled in the study. After enrollment, but prior to receiving on-study treatment, subjects are screened to establish a baseline for their conditions to determine the degree of disease burden so that the patient may be stratified for randomization.

Imaging studies and pathological assessments are often important subjective assessment parameters that play a role in all parts of a clinical trial (enrollment, on study assessment, and patient outcome). This is most evident in imaging studies where the FDA has commented on the variability of on-site radiological assessment and the need to control variability. Unfortunately, many companies are often in a rush to enroll/randomize patients and thus depend on on-site assessment of disease burden and/or disease staging. This can create problems particularly when the subjective assessment is difficult to determine and requires a great degree of precision.

Precision is obtained by making multiple independent assessments of a parameter. These assessments are regularly performed by a BICR following a multi-read adjudication process, which is seldom done in a clinical setting. The issue of site interpretational bias due to partial or complete un-blinding of patient treatment arms or pre-existing conditions can confound interpretation. This can be a major issue in open labeled trials, which require a BICR to determine on-study response to treatment. A recent example of this is a hepatocellular carcinoma (HCC) trial in which the site enrolled a patient based on the results of a site read of a liver CT but the CRO sub-specialty radiologists repeatedly indicated that the lesion was just a hemangioma.8 In this case, the patient baseline status can't change and since the patient doesn't have HCC, he/she is assured not to be a responder. One way of handling this issue is to reclassify the patient. This is often done by changing patient status from PPP to ITT populations (see description below). This issue can often be avoided when independent experts are used for both screening and on study reads.



PPP and ITT populations

When the analysis is restricted to the participants who fulfill the protocol in terms of the eligibility, interventions, and outcome assessment, the analysis is known as an "on-treatment", "on-study," or "per protocol" analysis. The per-protocol analysis restricts the comparison of the treatments to the ideal patients; that is, those who adhered perfectly to the clinical trial instructions as stipulated in the protocol. This population is classically called the per-protocol population and the analysis is called the per-protocol-analysis.

In the ITT population, none of the patients are excluded and the patients are analyzed according to the randomization scheme. Medical investigators have often expressed difficulties in accepting the ITT analysis, even though it is the pivotal analysis for the FDA and EMA. This analysis is generally favored because it avoids the bias associated with the non-random loss of the participants. Nevertheless, the ITT analysis by definition includes patients who don't meet the inclusion and exclusion criteria and may not have the disease the sponsor is targeting (as was the case in the hemangioma example in the HCC trial discussed above). This can have a significant impact on a drug's chance for success. To demonstrate this issue, the following provides an example of how ITT, which is >15% larger than a PPP, can result in a drug not hitting its primary endpoint.

Assume an open labeled oncology trial where participants are equally randomized into an active control arm and an investigational drug arm. There is an aggressive push for enrollment from the sponsor, and the investigator thinks the test treatment has a much better chance of treating the patient than the control arm treatment. Therefore, he/she decides to enroll more "borderline" eligible participants to the treatment arm than the control arm with the intent of using the new drug to help a wider range of patients.

The primary endpoint of the trial is to determine if the investigative drug shows superiority for progression free survival (PFS). Based on previous studies, it is estimated that the response rates for the treatment and control arms are 0.5 and 0.4 respectively. As such, 390 participants are enrolled in each study arm in order to obtain 80% statistical power for the primary endpoint analysis.

Assume that a certain percentage of the participants were retrospectively found to be either tumor-free or non-responsive to treatments because the patients did not have target disease and a slightly higher percentage of these deviations were found in the treatment arm due to investigator bias (i.e., 60% vs. 40% for control). These participants will therefore not be able to show full response; however they will still be included as part of the ITT analysis. PFS rates in both arms will drop, making it more difficult to show superiority and diluting the treatment effect.

We are testing the hypothesis that the test arm is superior to the control arm, with an alpha of 0.05. We assume that the response rates for the eligible patients in each arm will be consistent with past studies. Figure 1 demonstrates how the eligibility status of our patient population can effect the ability of a trial to show a significant treatment effect. The horizontal line represents the 0.05 alpha level, p-values to the right of the line are not statistically significant. If 15% of the participants are found not to be tumor-free upon enrollment, the expected significance of the treatment effect is borderline at the 0.05 level (p-value=0.048). If the percentage rises to >15%, the effect is not significant.



Subspecialty assessment

Differences between institutional assessment (i.e., on-site reads) and BICR have been observed in studies and have been a topic of recent debate, particularly as it relates to on study progression of disease as determined using radiological assessments.9, 10, 11, 12 The focus of this debate is centered on informative censoring of a process that does not apply to enrollment assessment.

Although much attention has been paid to the need to standardize the performance and acquisition of imaging in oncology trials, the on-site image evaluation process remains much less controlled. Usually, these latter evaluations are performed by personnel with variable medical imaging experience who have not gone through detailed training and whose performance is not tested regularly. Typically, there is a pool of readers attached to each site and up to a few hundred readers in total who make measurements and complete case report forms. All too frequently, radiologists are not co-investigators and have not signed the FDA Form 1572, so there is minimal imaging oversight and review on performing the image acquisitions. Additionally, there is often no dedicated training for performing the image evaluations that conform to the requirements of a well-controlled multi-center trial (e.g., Response Evaluation Criteria in Solid Tumors, Cheson, etc.). While the standard of care for clinical interpretation of images by a radiologist at a site may be sufficient for patient management, a radiologist's clinical report is generally insufficient for selecting and measuring lesions required in a clinical trial.

In contrast, in independent blinded evaluations, images on which efficacy is based are almost always read by radiologists who are trained in the related specific indication. Prior to the readers beginning evaluation of the images, they are trained and tested in a standardized, validated and documented fashion. These central reviews are conducted by a small number of reviewers with specific expertise, thus lessening measurement variability. Also, prospective trial-specific training and testing of readers further reduce measurement variability. Most importantly, the central review complies with the regulatory training requirement to "train the qualified readers prospectively."13 More than 15 years ago, it was recognized by industry, FDA, and other regulatory bodies that if image evaluations were to be used as a measure of efficacy, a controlled evaluation process was required. Not much has changed in the quality of site reads since that time.

The problem with trying to understand the added value of independent assessment is complicated by the difference between degrees of independence (i.e., lack of complete independence of the site assessment compared to the complete blinded assessment performed by an independent core laboratory). The integrity of the blinding to treatment process can sometimes decrease accuracy of assessment and this trade-off must be compared to the importance of obtaining the data without bias. Since the objective of pivotal trials is to perform a study for regulatory approval the goal of the study is not the practice of clinical medicine but more in the domain of regulatory science in which removal of bias is an important component. This concept does not resonate with everyone, particularly healthcare providers, but the effects of bias can have a major impact on results. For example, when comparing site readers to an independent blinded read in a dementia trial using SPECT perfusion imaging, the data was similar and the variability between site reads and independent reads were fairly uniform across centers with one noted exception: a site reader who had almost a perfect positive predictive value for diagnosing Alzheimer's disease. After investigation, it was determined that >90% of his patients came from a longitudinal Alzheimer study being conducted at the hospital. This example demonstrates how controlling for bias can have a profound effect on reader performance, thus necessitating the need to measure variability. The statistical measurements of variability between readers (inter-reader variability) and within readers (intra-reader variability) is something that is often not well understood and seldom done when assessments are performed in clinical practice or when on-site assessments are used in clinical trials. This is unfortunate and can be a problem because sites are trained at the beginning of a trial; that is when terms are defined and criteria are set for making an assessment. It is well known that if those criteria differ from criteria used in clinical practice, a process called definitional drift can occur over time.

Two examples which emphasize the need to focus on subspecialty trained experts to preclude issues that could involve misinterpretation of subjective data include the following:

  • Hemangioma vs. hematoma. From a white paper concerning the European Congress of Radiology 2010 in Vienna, it was reported that Gerd Schueller, MD, a radiologist at the Medical University of Vienna stated that for oncologic imaging, and for that matter emergency imaging, hemangioma of the liver does not look like a hematoma of the liver, and that "we have to learn the differences."8 The need for subspecialty expertise is real and cannot be ignored.

  • Breast cancer diagnosis variability. A disfiguring lumpectomy has occurred only for the patient to learn that there was no cancer present.14 Even though rare, this is not an entirely uncommon event. Such events can occur when the variability of either pathological or radiological assessments are not controlled. Implications of uncontrolled variability per pathological15, 16, 17 and radiological18, 19, 20, 21, 22, 23 assessments collectively compound maximum diagnostic performance to poor prognostic ability. To put it another way, when the false positive rate is 10% for both radiological and pathological assessments, maximum diagnostic performance can only be approximately 81%. Consequently, the inherent lack of control of variability within each subspecialty discipline (i.e., radiology and pathology) lessens the integrity of the diagnosis as well as the prognosis. A pathologic diagnosis is the foundation upon which all other treatment decisions are made and with breast cancer, the pathology dictates the use of potentially curable therapy. Incorrect pathologic diagnoses may lead to negative outcomes as serious as failure to treat a missed case of breast cancer or provision of unnecessary surgery, chemotherapy, and radiation. In 2010 a review19 of 2,564 cases from the Sloane Project reported that in 30% of patients undergoing breast-conserving-surgery for ductal carcinoma in situ, preoperative imaging underestimated the extent of disease resulting in a requirement for further surgery.



What role does the CRO play?

It's not necessary for all subjective assessments to undergo a BICR process, but assessments that are both difficult to make because they require extensive experience and those that are part of important endpoints of the trial can realize tremendous value from working with a CRO whose expert independent reviewers are subjected to a rigorous training and variability testing. These factors combined with a multi-read adjudication process will maximize the potential for obtaining unbiased accurate data with maximal precision. Since Phase III trials are large (in some cases 100 sites and lasting several years), it would be a logistical nightmare to have to perform intra- and inter-reader variability testing with 100 local reviewers that may require retraining throughout the course of the trial. The BICR read team is much smaller and thus facilitates the ability to determine variability and to retrain as a team or individually if needed.

Identifying what subjective assessments are important to have independently reviewed is not always straightforward, although it is often clear what assessment(s) will have an impact on trial outcome. The variability of making that assessment is not often well understood. Understanding variability requires statistically accounting for chance agreement. This is where experienced CROs that have done these diagnostic assessments can provide value. CROs provide the experience necessary to collect variability data and understand the reproducibility of the measurement since they regularly deal with a limited pool of independent reviewers (i.e., radiologists and pathologists) and have criteria for testing and retraining. If needed, they can determine if definitional drift occurs by performing intra-reader variability testing.

CROs also possess the knowledge and experience in both determining if the subjective assessment needs to be independently reviewed and the understanding of what type of training and retesting of independent reviewers needs to be performed during the trial so that precision and accuracy are maintained. However, since the eligibility criteria often requires experts from multiple disciplines who have experience in diagnosing uncommon diseases, the CRO has to have the ability to leverage the subspecialty experts who are needed to make the independent assessments. This requires both strategic relationships with leading academic and tertiary care institutions and advanced digital portals so that a review of images and other data can be done at remote locations.


An experienced CRO can play an important role in strictly enforcing per-protocol criteria determining what patients get into the trial. Through their enforcement of strict criteria for inclusion and exclusion, as well as through their ability to manage and facilitate a relationship involving subspecialty screening, CROs can positively impact trial outcomes to be more robust, homogeneous, and aligned to regulatory compliance with all bias minimized.

Richard Walovitch,* PhD, is President, e-mail: rwalovitch@wcclinical.com, Vincent J. Girardi III, MS, is Associate Director of Biostatistics and Data Management, and John Tomera, PhD, is Director of Regulatory Affairs and Associate Medical Director at WorldCare Clinical, 7 Bulfinch Place, P.O. Box 8908, Boston, MA.

*To whom all correspondence should be addressed.




1. G. Nauyok, "Waive Inclusion and Exclusion Criteria?" Applied Clinical Trials, October 2010, 60-65.

2. F. G. Freitag, et al., "Analysis of Pooled Data From Two Pivotal Controlled Trials on the Efficacy of Topiramate in the Prevention of Migraine," JAOA, 107 (7) 251-258 (2007).

3. K. Borradaile et al., "Discordance Between BICR Readers," Applied Clinical Trials, November 2011, 40-46.

4. R. Walovitch, and J. Tomera, "A Team Approach: How Independent Endpoint Assessment Committees can Overcome Imaging Limitations," Applied Clinical Trials Online, May 1, 2010, http://appliedclinicaltrialsonline.findpharma.com/appliedclinicaltrials/Therapeutics/A-Team-Approach/ArticleStandard/Article/detail/668740/.

5. Food and Drug Administration, Guidance for Industry: Clinical Trial Endpoints for the Approval of Cancer Drugs and Biologics, (FDA, Rockville, MD, 2007).

6. European Medicines Agency, Committee for Medicinal Products for Human Use, Guideline on the Evaluation of Anticancer Medicinal Products in Man, (EMA, London, UK, 2006).

7. Food and Drug Administration, Guidance for Industry: Standards for Clinical Trial Imaging Endpoints, Draft Guidance, (FDA, Rockville, MD, 2011).

8. P. Gould, "Europe Lacks Consistent Emergency Radiology SubSpecialty Training" Diagnostic Imaging Europe(4), (2010), http://www.diagnosticimaging.com/display/article/113619/1584930/.

9. K. Shamsi, and R. H. Patt, "Onsite Image Evaluations and Independent Image Blinded Reads: Close Cousins or Distant Relatives?" To the Editor, Journal of Clinical Oncology, 27 (12) 1-2 (2009).

10. K. Shamsi, R. Patt, "On and Offsite Image Reads: Is Basing Drug Efficacy on the Site Read Risky Business?" Applied Clinical Trials Online, January 1, 2010, http://www.appliedclinicaltrialsonline.com/appliedclinicaltrials/article/articleDetail.jsp?id=652247/.

11. L. E. Dodd, E. L. Korn, B. Freidlin, C. C. Jaffe, L. V. Rubenstein, J. Dancey, and M. M. Mooney, "Blinded Independent Central Review of Progression-Free Survival in Phase III Clinical Trials: Important Design Element or Unnecessary Expense?" J. Clin. Oncol., 26 (22) 3791-3796 (2008).

12. D. L. Raunig, "Are Onsite Image Evaluations the Solution or are we Trading One Problem for Another?" To The Editor, J. Clin. Oncol., 27 (35 ), e263 (2009).

13. R. Patt, "The Role of RECIST & Other Imaging Efficacy Guide- lines: An Update," Presentation at PhRMA Meeting, October 2008.

14. E. Tousimis, "Breast Cancer: The Importance of a Second Opinion," Women's Voices for Change, July 22, 2010, http://womensvoicesforchange.org/tag/dr-eleni-tousimis/.

15. L. Pederson et al., "Inter- and Intraobserver Variability in the Histopathological Diagnosis of Medullary Carcinoma of the Breast, and its Prognostic Implications," Breast Cancer Res Treat, 14 (1) 91-99 (1998).

16. C. Elston and, I. Ellis, "Pathological Prognostic Factors in Breast Cancer: I. The Value of Histological Grade in Breast Cancer: Experience From a Large Study with Long-Term Follow-Up," Histopathology 19 (5) 403-410 (1991).

17. S. Apple, "Variability in Gross and Microscopic Pathology Reporting in Excisional Biopsies of Breast Cancer Tissue," Breast J., 12 (2) 145-149 (2006).

18. D. Newell et al., "Selection of Diagnostic Features on Breast MRI to Differentiate Between Malignant and Benign Lesions Using Computer-Aided Diagnosis: Differences in Lesions Presenting as Mass and Non-Mass-Like Enhancement," Eur. Radiol., 20 (4) 771-781 (2010).

19. T. J. Evans et al., "Radiological and Pathological Size Estimations of Pure Ductal Carcinoma in Situ of the Breast, Specimen Handling and the Influence on the Success of Breast Conservation Surgery: A Review of 2564 Cases from the Sloane Project," Br. J. Cancer, 102 (2) 285-293 (2010).

20. J. G. Elmore et al., "Variability in Interpretive Performance at Screening Mammography and Radiologists' Characteristics Associated with Accuracy," Radiology, 253 (3) 641-651 (2009).

21. S. G. Komen, "Why Current Breast Pathology Practices Must Be Evaluated," White Paper, June 2006. http://ww5.komen.org/uploadedFiles/Content_Binaries/PathologyWhitePaperB2.pdf.

22. N. Houssami et al., "Accuracy and Outcomes of Screening Mammography in Women with a Personal History of Early-Stage Breast Cancer," JAMA, 305 (8) 790-799 (2011).

23. Z. Chustecka, "Mammography Less Accurate After Breast Cancer," Medscape Medical News, February 23, 2011, http://www.medscape.com/viewarticle/737848.

download issueDownload Issue : Applied Clinical Trials-02-01-2012

Related Content: