Improving Clinical Trial Enrollment Forecasts Using SORM


Applied Clinical Trials

Applied Clinical TrialsApplied Clinical Trials-05-01-2013
Volume 22
Issue 5

A useful tool for clinical research professionals to estimate potential recruitment duration.

An important challenge for any clinical research professional is to create reasonable initial forecasts of clinical trial recruitment. The importance of this task should not be minimized. Recruitment forecasts form the basis for trial budget timing and clinical project management activities. If you doubt the importance of forecasting well, ask your company's executive management if missing a forecasted trial milestone by a few months has any implication. Unfortunately, accurate initial projections of study recruitment appear to be dismally absent in reality and some estimates show that more than two thirds of publicly funded trials fail to enroll according to plan.1, 2. This state of affairs is humorously captured in the clinical research literature as Muench's Third Law and Lasagna's Law.3 These state: "In order to be realistic, the number of cases promised in any clinical study must be divided by a factor of at least 10" and "the number of patients available for entering a trial falls markedly at study initiation and rises markedly after study completion," respectively.

While site factors certainly contribute to forecast inaccuracies, using inadequate models also contributes. Simple unconditional linear estimation of enrollment by total site number and overall recruitment rates is the primary method for timeline estimation. This approach works well for single sites or situations where enrollment begins at all participating sites simultaneously. However many studies start with one operating site, while additional clinical sites are added later, especially with small company sponsored trials. This article will explore a novel modification of the linear approach, easily implemented on paper or a spreadsheet, to estimate trial enrollment when different sites start enrolling at different time points during the beginning of the study.

The business as usual approach

Let's start with an illustration of the situation with a typical example that many clinical research professionals will recognize. Assuming constant linear enrollment and site numbers makes this an unconditional deterministic model, as described by Katharine Barnard (Barnard et al. BMC Medical Research Methodology, 2010). With this approach, we forecast recruitment as a simple function of aggregate estimated site number, randomization rate, and time. Thus, this is a "first order polynomial" or linear equation, which will be referred to as the "first order recruitment model" or FORM using the following symbols and assumptions:

  • Sm = Total number of planned sites

  • R = Recruitment Rate (number of patients/site/month)

  • te = Enrollment duration (months from FPI to LPI)

  • Nm = Total patients needed

  • N0 = Number of patients initially enrolled, if any

This simple equation should be familiar to many readers. It is useful to work out an example to illustrate the utility of this common approach. Assume that the study from the opening scene of this article includes 350 total subjects and 40 sites. From prior trial experience and clinical operations data, assume a recruitment rate of approximately one patient per site per month calculated using PERT methodology.5, 6 With these initial conditions, how long will it take to enroll the total study?

Using simple algebra, we can rearrange the FORM to solve for the enrollment duration as follows:

te = (NmNo) / (SmR) te = (350.0 pts – 0.0)/(40.0 sites x 1.0 pts/site/mo) = 8.8 mo.

This gives the result of approximately nine months for te, from the first patient in (FPI) to the last patient in (LPI). Note that the answer is rounded to a single decimal place of precision. In most cases this is sufficient for initial "back of the envelope" estimation purposes. However, beware of over-reliance on this and accumulation of errors when greater precision is required.

This simple linear approach is quite useful. For example, by rearranging terms in the model we can also solve for recruitment rate, number of sites, etc. However, it is important to note what types of situations where this model is inappropriate. A useful example is when a study begins with less than the full complement of enrolling sites. Using FORM to extrapolate enrollment in this situation can lead to overly optimistic results because the model assumes that all sites are enrolling. One way to work around this is to build an explicit spreadsheet model and manually enter the expected site numbers and extrapolate the increased enrollment from that. However, that still requires assumptions about time and involves additional work, if one simply wants a quick initial estimate of the total time to enroll. So how can it be done differently?

Another method to consider

In contrast to FORM, we can estimate recruitment with a deterministic "conditional" model where the overall recruitment depends on underlying factors, such as the number of available sites.2 This approach will be referred to as the second order recruitment model or SORM, which is a second-order polynomial (i.e., quadratic or highest power equal two) equation. SORM differs from the FORM approach by assuming that sites start up at an average initiation rate until all sites are enrolling. During the start-up phase, the total enrollment is conditional upon the cumulative sites available to enroll. Enrollment finally becomes linear and unconditional once all sites are enrolling. Figures 1 and 2 illustrate this graphically using the results of Monte-Carlo trial simulations with site start-up modeled as a beta probability distribution based on estimates of the "average" (4.7 months), "maximum" (9.0 months), and "minimum" (2.5 months) PERT values for site initiation or "start-up" time.5, 6 Here we see 25 simulated scenarios with 30 sites starting up randomly during the site initiation period at a rate of approximately 4.6 sites per month (i.e., 25 sites/(max start-up time—min start-up time)) and a standard deviation of approximately one site per month. Figure 1 shows the scenarios modeled using a beta function process, along with the superimposed median and 95% confidence levels.

Over many simulations, to a first approximation, site start-up appears as a sigmoid (i.e., "S") shape with linear initiation up to the plateau where all sites are finally active. This is an example of logistic growth and is a common shape of cumulative probability distributions for many different underlying processes. Unfortunately, we need the integral of the sigmoid logistic function, which results in an equation for site enrollment without a convenient closed form solution for time. For this reason, we will model the logistic sigmoid shape as an approximate piecewise linear function, illustrated in Figure 2.

By extrapolating the cumulative enrollment as the simple product of the number of active sites at time "t" illustrated in Figure 1 and the overall study enrollment rate, the pattern in Figure 3 emerges. Note that enrollment starts with a shallow slope and accelerates to a maximum rate as more sites become active.

We could also model this situation using different Monte Carlo techniques or other elegant mathematical approaches (e.g., Bayesian estimation, etc.). Indeed, it should be noted that currently there are a range of very sophisticated approaches and tools for prediction and modeling that can incorporate waiting time, adjustments, etc.7, 8, 9 However, for the practicing clinical researcher there is value in having a simple set of equations that allows for a quick initial calculation, such as during a planning session.

So let us solve the same problem as before using this new approach. Earlier we solved for the total enrollment time for Nm = 350 total subjects, Sm = 40 sites, R = 1 patient/site/month, and no initial patients enrolled yet (i.e., N0 = 0). We now add a parameter indicating no initial sites are ready to enroll at the beginning of the trial (i.e., S0 = 0), and the average site start up time, ts = 5.5 months. Now we can ask how long it will take to enroll the study (i.e., FPI to LPI) using SORM? We begin by estimating the following variables starting with the site initiation rate (Ir):

The following equations (derived using integral calculus not shown here) are introduced, which produce the "quadratic" or second-order enrollment during the start-up phase and linear enrollment after that. The second-order nature of Equation 3 is what gives SORM its name. Equations 3 and 4 are a piecewise continuous solution giving similar results to the integrated logistic sigmoid function.

First, the site start-up or initiation rate from the average site start-up time is determined. Thus, Ir = (40.0 sites – 0.0 sites)/5.5 months which is approximately 7.3 sites/month, to one decimal place. Next, we estimate the total accrual during the site start-up phase, using Equation 3 which shows Nq = (0.5 x 7.3 sites/months x (5.5 months)2 + 0 sites x 5.5 months) x 1.0 pt/site/months + 0.0 pt = 110.4 or ≈ 110 subjects enrolled during this time. Since Nq is less than the total number needed, Nm (350 patients), we know that additional accrual must take place during the linear recruitment phase. Adding the initial quadratic accrual (Nq) and the site start up time (ts), we can now solve for the total study enrollment period (te) as follows:

Here the total enrollment time is (350.0 pts – 110 pts)/(40 sites x 1 pt/site/months)+ 5.5 months = 11.5 months (based on the SORM forecast method), compared to the original FORM enrollment estimate of 8.8 months, or an approximate 31% increase in enrollment time. This illustrates the more conservative enrollment time based on SORM approach compared to FORM, due to the additional time required for site start-up.

Figure 4 illustrates the cumulative enrollment over time, for this example using both FORM (forecast enrollment time = 8.8 months) and SORM (forecast enrollment time = 11.5 months). Here the initial non-linear nature of the SORM enrollment is clear, as well as the enrollment time difference between the two models.

It is important to note how SORM would behave under the special condition when total accrual completes prior to all sites completing initiation. This would occur when Nq ≥ Nm. Here the study enrollment time would be found using the quadratic formula to solve Equation 3 for enrollment time, te. As an example, consider a situation where 50 patients (Nm) are needed for a study with 10 sites (Sm), with two initial sites (S0), with an average randomization rate of three patients/site/month (R) and an average site start-up time of four months (ts). By solving for Ir using Equation 2, and rearranging terms from Equation 3, we can make the following substitutions: a = (IrR)/2; b = S0R; and c = N0-Nm. These values can be substituted into the quadratic formula (Equation 6) below, where the positive root becomes the solution for the enrollment time.4

Ir = (10.0 sites – 2.0 sites)/4 months = 2.0 sites/month a = (2.0 sites/months x 3.0 pt/site/months)/2.0 = 3.0 pts b = 2.0 sites x 3.0 pt/site/months = 6 pt/months; c = 0 – 50.0 = -50.0 pts therefore, te = (-6.0 + 25.2)/(2.0 x 3.0) = 3.2 months

Here, the total enrollment time is approximately three months, which is less than the time to start up the remaining sites. Too bad more studies could not be this quick.

It is useful to get an impression of the differences obtained using SORM versus FORM for various clinical trial examples. Table 1 illustrates the enrollment time estimates for both models over five trial scenarios.

Why is this useful?

This is a novel, but simple, conditional deterministic model for estimating study recruitment time in the setting of changing initial site numbers. A reasonable question to ask is whether this mathematical tool has any value over typical unconditional linear forecasting. So what are the advantages?

First, it allows a relatively quick estimate of recruitment simply using a hand calculator or spreadsheet that accounts for the fact that not all sites are enrolling at the same time, during the beginning of a study. While clinical research professionals can always use more elegant computer based models, it is useful to be able to quickly estimate solutions and understand how the answer was obtained.

Secondly, it allows researchers to "bound" their estimates by providing a rational (i.e., not based on intuition) upper and lower limit estimates for potential clinical trial duration. One way to accomplish this is by combining SORM estimates with FORM forecasts, to provide a sensitivity analysis of how long enrollment may take, with sites starting at different time points.

Finally, in the special case where all expected sites start enrolling at the same time at the start of the trial, the SORM model automatically reduces to the standard unconditional linear FORM model because Nq (Equation 3) goes to zero when start-up time is zero leaving only Equation 4 to give the result. Thus, SORM is actually a more general model of recruitment where the standard linear FORM method is a special case.

What are the model limitations?

Any tool must be used for the right problem to get reasonable results. A skillful model user should always be cognizant of the assumptions and limitations of their approach. One important assumption of both SORM and FORM is the use of aggregate estimates for site recruitment, total number of sites required, and site enrollment rate, with no allowance for variability. The variability limitation can be addressed by using SORM iteratively with randomly varying inputs from a plausible probability distribution function (e.g., beta function, log-normal, or exponential) to provide Monte Carlo estimates of study enrollment time. Figure 5 illustrates this using a fitted beta distribution to model the example problem from this article, where the randomization rate and start-up time are calculated in detail using PERT in the Program Evaluation sidebar.

A second assumption is that, SORM (as well as FORM) uses simple multiplication of the overall site recruitment rate and the required number of sites to yield the time varying enrollment. If SORM is adapted to accept randomly varying inputs (i.e., as a Monte Carlo process), an additional term may be added to reflect any expected correlation between aggregate site start-up time and subsequent enrollment. Note that the SORM model results shown in Figure 5 included a term to address this correlation issue.


SORM has utility in high level planning for trial enrollment. Like any tool, it is as useful as the assumptions and inputs you give it. It is essentially a "first approximation" of enrollment based on the logistic growth function that allows for a relatively straight forward calculation of the recruitment time during the site start-up phase, compared to the standard unconditional linear approach (i.e., FORM) that is commonly used in industry. The hope here is to provide clinical research professionals a useful tool to create conservative estimates of potential recruitment duration quickly, when time or resources do not permit more sophisticated forecast techniques.

Shaun Comfort, MD, MBA, is Chief Medical Officer at Adaptix Clinical Solutions, LLC, 7940 Floyd Curl Drive, Suite 1020, San Antonio, TX, e-mail: [email protected].


1. J. F. Collins, et al., "Planning Patient Recruitment: Fantasy and Reality," Statistics in Medicine, 3 (4) 435-443 (1984).

2. K. D. Barnard, et al., "A Systematic Review of Models to Predict Recruitment to Multicentre Clinical Trials," BMC Medical Research Methodology 10 (63) (2010).

3. B. Spilker and J. S. Cramer, Patient Recruitment in Clinical Trials, (Raven Press, New York, 1992).

4. C. E. Swartz, Used Math, 2nd Ed. (American Association of Physics Teachers, College Park, MD, 1993).

5. J. Taylor, A Survival Guide for Project Managers, 2nd Ed. (AMACOM, New York, 2006).

6. W. Fazar, "Program Evaluation and Review Technique," The American Statistician, 13 (2) (1959).

7. V. V. Anisimov, "Predictive Event Modelling in Multicenter Clinical Trials with Waiting Time to Response," Pharmaceutical Statistics, 10 (6) 517-522 (2011).

8. V. V. Anisimov and V. V. Fedorov, "Modelling, Prediction and Adaptive Adjustment of Recruitment in Multicentre Trials," Statistics in Medicine, 26 (27) 4958-4975 (2007).

9. V. V. Anisimov, "Using Mixed Poisson Models in Patient Recruitment in Multicentre Clinical Trials," Proceedings of the World Congress on Engineering 2008.

Related Content
© 2024 MJH Life Sciences

All rights reserved.