“Protocol authors have a legitimate scientific interest in restrictive eligibility: narrowing the enrolled population strengthens the statistical signal, reduces heterogeneity in treatment effect, and protects internal validity.”
Two Decades of Rising Protocol Complexity: A Longitudinal Analysis of Clinical Trial Design Evolution, 2004–2025
Key Takeaways
- Trial registrations doubled from 2004 to 2025, with investigator-initiated/non-industry growth outpacing Industry, and a transient 2020–2021 surge attributable to COVID-19 studies.
- Endpoint burden showed the strongest longitudinal rise (median 3→7; +0.22/year), almost entirely from secondary endpoints, while primary endpoints remained stable at one.
Protocol complexity in clinical trials has risen for two decades, but the trend has not been comprehensively quantified by phase and sponsor across the full 2004–2025 window on ClinicalTrials.gov.
Abstract
- Background. Protocol complexity in clinical trials has risen for two decades, but the trend has not been comprehensively quantified by phase and sponsor across the full 2004–2025 window on ClinicalTrials.gov.
- Methods. We analyzed 190,265 phased interventional trials (Phase I, I/II, II, II/III, III, IV) registered on ClinicalTrials.gov with start dates 2004–2025, extracted from the Aggregate Analysis of ClinicalTrials.gov (AACT) database (April 2026 release). Trials were stratified by phase and by lead-sponsor agency class (Industry vs IIS, where IIS = investigator-initiated and includes academic, NIH, and other non-industry lead sponsors). All overall-status values were included; failure-rate calculations used only matured cohorts (start ≤ 2022) to avoid right-censoring. Annual medians were computed for number of eligibility criteria (text-line proxy from the eligibilities.criteria field), endpoint count (sum of design_outcomes rows per NCT), planned enrollment, site count, masking level, and Data Monitoring Committee (DMC) presence. Failure was defined as overall_status in TERMINATED, WITHDRAWN, or SUSPENDED. Ordinary-least-squares regression quantified temporal trends.
- Results. Total phased interventional registrations doubled from 5,023 in 2004 to 10,261 in 2025, with IIS growing faster than Industry. Median endpoint count doubled from 3 to 7 (slope +0.22/year, r = 0.97). Planned enrollment remained largely flat (Industry Phase III ≈ 300–350; IIS Phase II ≈ 45–50). Industry DMC adoption rose from 9.2% to 32.0% (+0.94 pp/year, r = 0.88). Aggregate failure rates rose from 10.2% (2004) to 13.2% (2022 matured cohort), slope +0.13 pp/year (r = 0.48). The escalation slope was steepest in Phase I (+0.23 pp/year) and Phase III (+0.13 pp/year); Phase II rose more slowly (+0.10 pp/year) but remained the highest-failure phase throughout the period (12.9% → 15.2%); Phase IV was essentially flat. Oncology's share of phased interventional activity grew from 29.0% to 34.3% (+5.3 pp), while cardiometabolic trials declined from 10.1% to 7.2%.
- Conclusions. Protocol complexity has risen steadily over 20 years, most clearly for endpoint counts. Failure rates have drifted upward at a modest pace (roughly +1 percentage point per decade), not the dramatic escalation that earlier commentary has sometimes implied. Concurrent therapeutic-area drift toward oncology is a plausible driver of both trends and must be treated as a confounder rather than residual noise. The data support continued attention to protocol parsimony, particularly in Phase II, while cautioning against strong causal claims linking rising counts to rising failure.
1. Introduction
Clinical-trial protocols have grown steadily denser over the past two decades. The Tufts Center for the Study of Drug Development has reported that the average number of eligibility criteria per protocol increased by over 60% between 2001 and 2020, and that the number of distinct procedures per trial visit tripled over the same period.
These headline figures, however, aggregate across phases, sponsor types, and therapeutic areas—each of which may be moving in different directions at different speeds. Granular analysis matters for three reasons.
- First, the rising number of design elements in a protocol is often cited as a cause of rising trial failure, but the strength of that association depends heavily on segment. A companion analysis suggests that the count-based correlation between eligibility criteria and failure is concentrated in Phase II investigator-initiated trials (cross-sectional r ≈ +0.33, 4.9× quintile gradient), while Industry Phase III shows essentially no such correlation (r ≈ 0.006). If rising counts and rising failures are indeed linked, the link is not uniform.
- Second, the COVID-19 pandemic in 2020–2021 disrupted trial operations globally. Whether it permanently shifted design patterns or produced only a transient perturbation is a question with strategic implications for sponsors planning long pipelines.
- Third, operational and regulatory practices—DMC adoption, masking conventions, site proliferation, and decentralized components—have co-evolved with protocol complexity, potentially as adaptive responses rather than independent trends.
This study provides a phase-stratified, sponsor-stratified longitudinal description of 190,265 phased interventional trials on ClinicalTrials.gov over 22 years (2004–2025). The analysis is structured to separate the multiple concurrent drivers of observed trends—protocol-design change, therapeutic-area drift, regulatory evolution, and global redistribution of research activity—and to frame the resulting correlations in a manner appropriate to registry-level evidence.
2. Methods
2.1 Data source and cohort
We queried the AACT database, database, April 2026 release, from a local PostgreSQL 17 instance. The full database contained 579,828 studies at the time of extraction.
We applied the following inclusion filters:
- study_type = 'INTERVENTIONAL';
- phase in {PHASE1, PHASE1/PHASE2, PHASE2, PHASE2/PHASE3, PHASE3, PHASE4};
- start_date between 2004-01-01 and 2025-12-31.
No filter was applied to overall_status: all trials matching the inclusion criteria were retained regardless of whether they were completed, ongoing, recruiting, terminated, withdrawn, or suspended. This yielded 190,265 trials.
Failure-rate calculations (Section 3.4) are additionally restricted to start_date ≤ 2022-12-31 to avoid under-counting failures in cohorts that had not yet had time to fail.
2.2 Metric definitions
- Eligibility criteria (proxy). ClinicalTrials.gov stores eligibility criteria as free text in the eligibilities.criteria field. We used the newline count of this field as a proxy for the number of criteria items. This proxy correlates with but does not equal true itemised criteria counts; we therefore describe trends in criteria complexity rather than in absolute criteria counts.
- Endpoint burden. Sum of rows in the design_outcomes table per NCT ID, which includes primary, secondary, and other pre-specified outcomes as declared in the registered protocol. This captures design intent, not achieved reporting.
- Enrollment. We used the enrollment field as registered, which on ClinicalTrials.gov corresponds to the planned/target sample size at registration (occasionally updated). It is not an achieved-enrollment measure, and ClinicalTrials.gov is not a strong source for operational metrics such as screen-failure rate or actual accrual. We interpret the enrollment field as a design-choice indicator.
- Failure. overall_status in {TERMINATED, WITHDRAWN, SUSPENDED}. Completed, ongoing, recruiting, not-yet-recruiting, enrolling-by-invitation, active-not-recruiting, available, and approved-for-marketing statuses are not counted as failures.
- Sponsor class. Lead-sponsor agency_class = INDUSTRY is labeled Industry; all other values (NIH, FED, OTHER_GOV, OTHER, NETWORK, INDIV, AMBIG, UNKNOWN, or null) are labeled IIS (investigator-initiated, used here as a deliberately broad catch-all for non-industry-led trials).
2.3 Trend analysis
Temporal trends were quantified by ordinary-least-squares linear regression of the annual metric against year: Metric(year) = α + β × year + ε. We report the slope β, Pearson r, and p-value. Throughout, "rising" should be read as rising in the registered metric; we make no claim of causal direction among count-based metrics and outcome metrics in this work.
3. Results
3.1 Trial volume
Annual registrations of phased interventional trials approximately doubled over the 22-year window, from 5,023 in 2004 to 10,261 in 2025 (Figure 1). IIS registrations grew faster than Industry registrations in absolute terms: IIS rose from 2,721 to 5,975 (×2.2) while Industry rose from 2,302 to 4,286 (×1.9). A visible surge in 2020–2021 corresponds to COVID-related trials (Section 3.5).
3.2 Endpoint burden
Endpoint count is the single cleanest longitudinal signal in the dataset (Figure 2). Median total endpoints per phased interventional trial rose from 3 in 2004 to 7 in 2025, a 20-year doubling (slope +0.219/year, r = 0.97, p < 10⁻¹³). The growth was carried almost entirely by secondary endpoints (median 2 → 4–5); primary endpoints have remained stable at 1 throughout the period.
3.3 Eligibility-criteria complexity
Eligibility-criteria complexity (measured as the newline count in the criteria text, a proxy for the number of bulleted items) also rose across all strata, with Industry Phase III trials showing a representative rise from roughly 21 criteria-lines in 2004 to 25 in 2025. The overall ordinary-least-squares (OLS) slope was +0.38 criteria-lines/year (r = 0.98).
We emphasize that this is a count-based measure. It does not capture qualitative complexity such as biomarker-dependent eligibility, centrally adjudicated enrollment, or genetic screening, all of which have become more common but are not visible as additional bullet points in the criteria text.
3.4 Failure rates by phase
Restricting to matured cohorts (start_date ≤ 2022), the aggregate failure rate—defined as overall_status in TERMINATED, WITHDRAWN, or SUSPENDED—rose from 10.2% in 2004 to 13.2% in 2022, a slope of +0.13 percentage points per year (r = 0.48). The trend is statistically detectable, but its rate—roughly 1.3 percentage points per decade—is far more modest than the headline figures sometimes cited in the trade press, which have suggested failure-rate escalations of several percentage points per year.
Our replication of the AACT data does not support those higher figures (Table 1).
The per-phase picture (Figure 3 and Table 1) clarifies what was obscured in v1. Phase I shows the steepest rise, consistent with a plausible interpretation that growing Phase I activity reflects more molecules being tested—including riskier, earlier-in-development candidates—rather than a systemic degradation.
Phase III rises modestly (+0.13 pp/yr). Phase II rises only weakly as a fraction but accounts for the largest absolute failure count because of its baseline level. Phase IV is essentially flat.
Stratifying by phase reveals heterogeneity that aggregate trend reporting obscures: each phase has its own trajectory and policy implications, and any headline figure that combines them masks more than it reveals.
3.5 Therapeutic-area drift
A natural question is whether the cohort itself has shifted over time—for example, toward oncology and rare/specialty indications that carry structurally more eligibility criteria and more endpoints. Our data support this hypothesis (Figure 4).
Oncology's share of phased interventional activity rose from 29.0% in 2004 to 34.3% in 2025 (+5.3 pp), while cardiometabolic trials (diabetes, cardiovascular, hypertension, obesity combined) declined from 10.1% to 7.2% (-2.9 pp). CNS/neurology trials and non-COVID infectious-disease trials also declined modestly.
COVID-19 trials spiked to 14.4% of the cohort in 2020, falling to under 1% by 2025.
This drift is a material confounder for any interpretation of the rising-complexity/rising-failure trends. Oncology trials typically carry more eligibility criteria (molecular subtyping, prior-therapy histories, performance-status thresholds), more secondary endpoints (response rate, progression-free survival, overall survival, quality of life), and—in rare-tumor sub-populations—more sites per trial to achieve enrollment.
A five-percentage-point shift in cohort composition toward such trials will mechanically raise all three counts without any underlying change in sponsor behavior. Disentangling therapeutic-area composition from a true trend in protocol design requires within-indication longitudinal analysis, which is outside the scope of this paper but is the natural next step.
3.6 Planned enrollment
Planned enrollment has been largely flat for 20 years (Figure 5). Industry Phase III remained near 300–350 participants (slope +0.49/year, r = 0.14); IIS Phase II moved from 45 to 50 (slope +0.37/year, r = 0.77).
Phase I enrollments are small and stable across sponsor classes. This is not surprising on its own, as planned enrollment is driven primarily by statistical-power calculations and by the available disease population, neither of which changes mechanically with protocol complexity.
The notable observation is therefore not that enrollment is flat, but that it has remained flat in the face of substantially more restrictive eligibility criteria: protocols are narrowing the eligible pool without a compensating increase in enrollment target.
The operational consequence—higher screen-failure rates—cannot be directly measured from ClinicalTrials.gov, which does not record screen failures, but it is the most parsimonious explanation for the rising trial-duration trends reported elsewhere in the literature.
3.7 DMC adoption
Data Monitoring Committee (DMC) adoption is the single most dramatic regulatory trend in the dataset (Figure 6). Industry DMC adoption rose from 9.2% in 2004 to 32.0% in 2025 (+0.94 pp/year, r = 0.88).
IIS adoption was higher at baseline (~32%) but grew more slowly, finishing at 46%. This convergence reflects evolving regulatory guidance on independent safety monitoring and the maturation of industry safety practices.
3.8 COVID-19 impact
The pandemic produced a transient but striking perturbation. COVID trials reached 14.4% of the cohort in 2020, an ad-hoc reorientation of global research capacity.
Failure rates rose modestly in 2020 across phases, most visibly in Phase II, before largely returning to trend by 2022. Protocol-complexity metrics (endpoints, eligibility text length) were insensitive to the pandemic, which is methodologically reassuring: the long-run complexity trend is not an artefact of 2020–2021.
4. Discussion
4.1 Temporal co-escalation, not established causation
The principal longitudinal finding is a co-escalation, as counts of eligibility criteria and endpoints have risen and failure rates have drifted upward over the same 20-year window. Co-escalation is not causation and this paper does not claim it is.
The cross-sectional association between eligibility-criteria counts and failure within Phase II IIS (r = +0.33) is suggestive but compatible with three non-exclusive explanations:
- Denser protocols are operationally harder to execute.
- Sponsors intentionally make higher-risk programs more restrictive, so stricter eligibility correlates with harder science.
- Therapeutic-area drift (Section 3.5) simultaneously raises both counts and intrinsic failure risk.
Untangling these mechanisms requires within-indication analysis and is outside the scope of a registry-level longitudinal description.
4.2 Exclusion-criteria growth and internal validity
Protocol authors have a legitimate scientific interest in restrictive eligibility: narrowing the enrolled population strengthens the statistical signal, reduces heterogeneity in treatment effect, and protects internal validity. Restrictive criteria can therefore produce a more useful and reliable outcome, and the policy question is not whether exclusion criteria are good or bad in principle, but whether each specific exclusion carries a contemporary justification or whether protocols accumulate historical exclusions indefinitely.
A sunset-review discipline on exclusion lists is a reasonable recommendation independent of any causal claim about failure.
4.3 End-of-Phase-2 risk management
A useful framing for interpreting the Phase II trends is the end-of-Phase-2 go/no-go meeting, which creates a strong sponsor incentive to front-load design thoroughness in Phase II so that the Phase III commitment rests on robust evidence. The observed rise in Phase II Industry eligibility-criteria complexity and endpoint counts is consistent with this risk-management strategy rather than with drift or protocol bloat.
Read this way, the Phase II trend is intentional investment in go/no-go robustness, not a sign of deteriorating Phase II discipline.
4.4 Industry Phase III: infrastructure scaling, partly driven by therapeutic-area shift
Industry Phase III median site counts have risen, DMC adoption has tripled, and open-label prevalence has fallen—a pattern naturally read as compensatory scaling in response to rising protocol complexity. This interpretation is defensible, but it must accommodate the therapeutic-area-drift finding: more oncology trials, and in particular more rare-tumor and molecularly stratified oncology trials, mechanically require more sites to recruit.
Rising site counts are therefore multi-causal—protocol complexity, globalization (the US share of global trial activity has fallen from ~56% in 2005 to ~23% in 2025 in a companion analysis), and therapeutic-area composition all contribute.
4.5 IIS trials: single-site as a structural characteristic
Persistent single-site execution in investigator-initiated trials is best understood as a structural feature of the IIS operating model rather than as an operational divergence from Industry practice. An investigator with an idea, access to a cohort, and institutional backing runs the trial at that institution, not as a cost-minimizing fallback but as the default.
The implications of this structural reality—limited site redundancy, dependence on a single recruitment pool, and high sensitivity of trial outcomes to local operational disruptions (e.g., the COVID-19 perturbation in Phase II IIS failure rates in 2020)—are properties of the IIS model itself, not symptoms of deficient management.
4.6 Implications and recommendations
- Protocol design. Exclusion-criteria lists should carry contemporary justifications and a periodic sunset-review discipline. This is independent of any causal claim about failure.
- Industry strategy. The Phase II complexity trend is consistent with intentional end-of-Phase-2 risk management and should be evaluated as such rather than characterized as drift.
- Regulatory science. Protocol-complexity metrics are a useful pipeline-health signal, but count-based metrics alone under-represent qualitative complexity (biomarker dependencies, adjudicated endpoints). Registry enhancement to capture these directly would improve the transparency of the evidence base.
- IIS support. If the structural single-site model of IIS is to continue, operational support must be proportionate to the complexity IIS investigators are choosing, or complexity must be pared back to match the model.
- Within-indication follow-up analysis. The therapeutic-area-drift confounder motivates a within-indication longitudinal decomposition—the single most useful next analytic step.
5. Conclusion
Over 2004–2025, clinical-trial protocol complexity has risen across every count-based metric we examined: eligibility-criteria text length, endpoint counts, site counts for Industry Phase III, and DMC adoption for Industry. Endpoint-burden doubling (median 3 → 7) is the cleanest signal.
Aggregate failure rates have drifted upward at a modest pace of approximately one percentage point per decade—detectable but far milder than earlier narratives have sometimes implied. The steepest failure escalation is in Phase I, consistent with expansion of the early-development testing pipeline.
Concurrent drift of the cohort toward oncology and away from cardiometabolic disease is a material confounder for all of these trends and should be factored explicitly into any sponsor or regulatory policy response. The case for protocol parsimony, particularly in Phase II, is supported by these data, but as a prudential recommendation grounded in temporal co-escalation, not as a causally-established mandate.
References
1. Getz KA, Stergiopoulos S, Short M, et al. The Impact of Protocol Amendments on Clinical Trial Performance and Cost. Ther Innov Regul Sci. 2016;50(4):436–441.
2. Tufts Center for the Study of Drug Development. Impact Report: Rising Protocol Complexity. 2023;25(1).
3. Fogel DB. Factors associated with clinical trials that fail and opportunities for improving the likelihood of success: A review. Contemp Clin Trials Commun. 2018;11:156–164.
4. DiMasi JA, Grabowski HG, Hansen RW. Innovation in the pharmaceutical industry: New estimates of R&D costs. J Health Econ. 2016;47:20–33.
5. Clinical Trials Transformation Initiative. AACT Database. https://aact.ctti-clinicaltrials.org/
6. Getz KA, Campo RA. Trends in clinical trial design complexity. Nat Rev Drug Discov. 2017;16(5):307.
7. Gresham G, Meinert JL, Gresham AG, et al. Assessment of trends in the design, accrual, and completion of trials registered in ClinicalTrials.gov by sponsor type, 2000–2019. JAMA Netw Open. 2020;3(8):e2014682.
8. Anderson ML, Chiswell K, Peterson ED, Tasneem A, Topping J, Califf RM. Compliance with results reporting at ClinicalTrials.gov. N Engl J Med. 2015;372(11):1031–1039.
9. Califf RM, Zarin DA, Kramer JM, et al. Characteristics of clinical trials registered in ClinicalTrials.gov, 2007–2010. JAMA. 2012;307(17):1838–1847.
10. Hwang TJ, Carpenter D, Lauffenburger JC, Wang B, Franklin JM, Kesselheim AS. Failure of investigational drugs in late-stage clinical development and publication of trial results. JAMA Intern Med. 2016;176(12):1826–1833.





