Portfolio Approach to Optimize Site Selection

February 5, 2019
Vadim Paluy, MD

Vladimir Shnaydman, PhD

Applied Clinical Trials

Site selection is one of the most important and at the same time challenging problems in clinical trials planning. Poor site selection may cause enrollment delays, resource waste on low or zero enrollment, and even potentially compromise trial results.

Site selection is one of the most important and at the same time challenging problems in clinical trials planning. Poor site selection may cause enrollment delays, resource waste on low or zero enrollment, and even potentially compromise trial results.

Site selection is a complex process, which includes sites identification, sites assessment, sites validation. and selection of a set of sites aligned with study goals and limited resources.

Many factors affect the site selection process. Some of them are generic, others are trial specific (Figure 1). Some factors are qualitative and subjective, such as “Experience and qualifications of Principal Investigator” or “Staff turnover.” The value of each factor can be evaluated qualitatively (e.g., excellent – 10, the worst – 1). Or it can be quantitative, such as “Planned patients’ enrollment” or “Projected enrollment rate.” The factors could be extracted from historical databases, questionnaires, etc. Some companies use data mining techniques for site identification. Examples of clinical trial-related factors are “Trial budget,” “Enrollment target,” etc. A variety of business rules are also related to the entire trial, e.g., “If a country is selected, then minimum two sites per country should be selected,” or “Priority site”-forced site prioritization.  

Traditionally, site selection efforts are focused on a selection of “best” clinical sites based on:

  • Predicted enrollment rates.

  • Predicted/planned site capacity.

  • Scoring of multiple site attributes (e.g., experience of principal investigator (PI) and staff, historical patient enrollment rates, overall facility quality, etc.) and ranking index calculation as a sum of weighted scores.

  • Handpicked approach to sites selection based on prior relations with sites and sites’ qualitative evaluation.  

Site rankings according to a criteria has limitations. For example, site selection based on sorting of predicted enrollment rates does not take into account parameters like cost per patient, site capacity, or “soft” attributes such as experience of PI, facility quality, etc. The value of site selection based on extrapolation of historical performance may be limited due to high site turnover [Getz, 2018] or a limited or lack of historical data for new sites. Site selection based on sorting of ranking index also does not take into account budgeting and other constraints, as well as study goals and business rules.

Often, the site selection process mistakenly identifies with only site identification and feasibility assessment. After identifying the “best” sites and evaluating them, it is assumed that the best sites will be selected somehow from a feasible set of “best” sites.  

Often, a feasible set of sites is identified based on an informal process which may include, but not be limited to:

  • Contact familiar sites

  • Call on referrals

  • Literature and database search

  • Phone interviews and site visits

  • Contract negotiations

More advanced techniques (AI, data mining) could be applied for large sites’ databases but not transferrable to all sites.

Usually, all feasible sites are divided into three major tiers. Their advantages and disadvantages are presented in Table 1.

Sites could “migrate” from tier to tier depending on most recent performance assessments.

A high-level overview of the traditional site selection process is presented in Figure 2 and described below.

  • From sites database (A) (it could be proprietary or commercial), a feasible portfolio of sites is defined based on certain attributes (e.g. “A list of doctors treating multiple sclerosis in USA”).

  • Then, all feasible sites are assessed based on scoring[1] of each site attribute and its weight (B). A list of attributes may include: (1) Experience of PI and site employees; (2) geographic location; (3) transportation to the site; (4) enrollment risk; and many others. Each attribute is scored (e.g., from 1 to 10) and weighted (in %). A list of attributes can be trial-specific. Scores and weights could be obtained from questionnaires, previous experience, historical data, etc. Small companies usually rely on questionnaires. Then, site value is calculated as follows:

After site evaluation, all feasible sites are ranked according to their value[2] or other criteria.

  • As mentioned, ranked sites are divided into three tiers. First, sites from Tier 1 are selected, then from Tier 2, and if there are not enough patients or other goals have not been met, from Tier 3 - (C). Then, preselected portfolio of sites needs to be aligned manually with available budget, enrollment target, sites capacity, number of patients, and other goals and rules. If site selection meets study goals, site selection is finalized (F). If not, but other feasible sites are available (D), they need to be included in the selection process (E) with subsequent alignment to study goals and available resources (C), etc.

  • If number of feasible sites is not sufficient (D), a new search in sites’ databases needs to be performed (A).

  • It is assumed that after multiple iterations, site selection is finalized (F).

Let’s consider an example illustrating the site selection approach based on sites’ rankings.


Case study: Contingency planning for Phase III CNS global clinical trial

For a Phase III trial, an initial set of sites was already selected. However, some sites canceled their participation due to various reasons. Therefore, in order to meet study goals (enrollment rate, number of patients, and limited budget), there was a need to select additional sites. Twenty five new feasible sites were identified.

Site selection needed to meet the following goals:

  • Number of patients should be ≥ 85

  • Enrollment target  ≥ 5pt/month

  • Budget ≤ 1.26 million

The technique included several steps:

  • Sites evaluation. Multiple site attributes were identified, as presented in Table 2. Some of them may be unique for a trial. Total site value is calculated according to the formula (1).

  • Sites are ranked according to their value (or other attributes such as predicted enrollment rate could be used as well). Data used in the ranking and the ranking results are presented in Table 3.


Ranking algorithm for site selection

Sites parameters, such as number of randomized patients, projected enrollment rate, and patient-related costs, are accumulating until their values meet or exceed study goals. For example, cumulative number of patients, cumulative enrollment rate, and cumulative site costs are obtained by adding site #2 data to the site #6 data (Table 3). It means that cumulative number of patients = 10 (site #6) + 10 (site #2) = 20. The same applies to projected enrollment rate for sites #2 and #6 = 0.52 (site #6) + 0.24 (site #2) = 0.76, etc.

Unfortunately, often study goals cannot be met simultaneously. For example, sites-related budget was met by adding site #4, but enrollment target goal was not met. Therefore, according to the ranking algorithm, sites #10 and #11 have to be added to meet the enrollment target goal. It means that budget was exceeded by $147,500, or by 12%.

If sites are ranked according to their enrollment rate, budget has to be increased by 8%.[3]

This process is time- and labor-consuming and does not guarantee optimal site selection. Adding more sites does not solve the problem, because it is associated with increased cost of a clinical trial beyond the budget and inclusion of riskier sites into a pool of feasible sites.

Is there a better site selection solution aligned with study goals and within the budget?

This paper presents a portfolio approach to site selection similar to selection of financial portfolios or portfolio of projects. For site selection, this approach was formulated in [1].


Portfolio approach to site selection

Portfolio approach to site selection means that instead of selecting individual sites, clinical trial planners need to select a portfolio of sites based on advanced analytical models, where the goal of site selection is to maximize the overall value of a portfolio of sites, and to align it with clinical trial goals and limited resources. As shown in [2], the most effective approach to portfolio selection is based on the mathematical optimization model. The model replaces the loop, including steps B (except sites evaluation), C, D, E, with advanced modeling algorithms, automating site selection aligned with study goals and resources.

Optimal site selection

In the context of decision-making, optimization means determining the most favorable solution, outcome, or course of action from a set of alternatives that satisfies all constraints and dependencies based on the mathematical optimization model.

In order to optimize site selection, an optimization model was developed. The model was formulated as a mixed integer programming (MIP) model. The MIP models are a subset of the linear programming (LP) [3] models, where some variables are binary (0, 1). LP models find the globally optimal value (e.g., total value of selected sites) of a linear function of a certain number of variables, given a set of linear constraints on these variables (equalities or inequalities).

The model for optimal site selection includes four components:

A. Decision variables
Xi = (0 or 1). Their value will be defined automatically. If Xi = 0, i-th site is not selected. If Xi = 1, i-th site is selected.

B. Parameters
Estimates for each site, such as cost/patient, projected enrollment rate, and others.

C. Constraints

  • Business rules (forced site prioritization, minimum number of sites per country (if a country is selected), requirements for patients’ allocation across regions, and others).

  • Resources-budget, manpower, etc.

  • Study goals-number of patients to be enrolled (trial power), enrollment target, etc.

  • Other rules relevant for particular clinical trial.

D. Criteria
Single (e.g., maximum value of sites’ portfolios), or multiple criteria (maximum enrollment target, minimum budget, etc.) can be used.


Modeling experiments

The model (Site Selection Optimizer) uses the same data as presented in Tables 2 and 3. The optimization algorithm found a better solution than the one based on the ranking algorithm. It automatically selects a portfolio of sites aligned with study goals and budget (Table 4). At the same time, in order to reach study goals, ranking algorithm requires ~12% bigger budget and more sites (17 - ranking vs. 16 - optimization).


Selected portfolios of sites using ranking vs. optimization (baseline scenario) are presented in Table 5.


Optimization results may not look intuitive. For example, site #2, despite its high score/value, was not selected due to high costs/patient, high number of patients, and total costs/site.[4] Also, site #2 was not selected because of a “knapsack effect.” That means that it’s harder “to pack” a large item (site capacity = 10pt) vs. several small “items” (most sites capacity =5 pts). Sites #9 and #16 were selected in the optimization model despite their relatively low score, because cost/patient is low and enrollment is high enough.

The model validation table is presented in Table 6.

Could the solution presented in Table 6 be obtained without the model? Potentially, yes. However, more than two million portfolios have to be analyzed, and the probability of picking up an optimal portfolio of sites ~1/ (2*106) is similar to winning the lottery. Therefore, in a reasonable timeframe, only suboptimal portfolios could be generated. At the same time, for the case study, the model generated an optimal portfolio of sites in two seconds.

Model advantages:

  • Flexibility (ability to modify the model according to customer needs).

  • Uncovering best solutions.

  • Fast and comprehensive analysis of multiple “what-if” scenarios.[5]

The model allows the calculation of multiple metrics related to multiple parameter allocations across counties, such as clinical trials costs, clinical sites, patients, and enrollment ratem, as presented on Figure 4.


‘What–if’ scenarios: Exploring the capabilities of this model

Scenario #1. Forced selection of site #2.


In order to meet study goals, forced selection of site #2 (high value, but low enrollment rate (see Table 6) modifies baseline solution. For example, sites #6, #9, #14, and #25 were not selected. At the same time, sites #10, #19, #21 were selected. Also, this scenario requires higher budget ($1.34 million vs. $1.26 million in baseline scenario) and a larger number of sites (16 in baseline scenario vs. 17 in scenario #1).

Scenario #2. Forced removal of sites #6 and #15.

At the last minute, sites #6 and #15 decided not to participate in a clinical trial. The model recalculated the sites’ portfolios. In this case, the number of sites was increased by two (from 16 – baseline scenario), to 18 (scenario #2). Budget was increased from $1.26 million to $1.30 million.

Model enhancements

  • Multi-criteria model

Very often, it is hard to meet all requirements by using a single criteria optimization model. In some cases, multi-criteria optimization could be more effective in instances of multiple conflicting goals. The model modification requires the introduction of multiple criteria (in our case, “Maximum Value,” “Minimum Budget,” and “Maximum Enrollment”).  Each criteria has a weight in %. Sum of weights = 100%.

Five scenarios were compared against the baseline optimal site selection scenario (Table 9). The first three scenarios (S1, S2, and S3) are equivalent to a single criteria optimization (weight of a criteria = 100%), scenario S4 is associated with highest weight=50% on criteria #2 - “Minimum Budget”, 30% weight on criteria #1 - “Maximum Value”, and 20% weight on criteria #3 - “Maximum Enrollment.”  Scenario S5 is associated with equal weight to all three criteria – 33.33%.  The model generates different portfolios of sites for each scenario presented in Table 10.

It was noticed that there are sites selected in all scenarios, e.g., sites #5, #7, #11, etc., and sites not selected in all scenarios, e.g., sites #17, #18, etc.

  • Stochastic enrollment

One of the challenging aspects in site selection is uncertainty in enrollment predictions. At the same time, deterministic site selection has to be made.

In order to address this issue, the model was modified. Three enrollment scenarios (favorable, realistic, and conservative) and corresponding subjective probabilities for each site were considered instead of the deterministic enrollment rate in the baseline model (Figure 5).  

The stochastic model may generate different solutions than the deterministic one. For example, site #5 in Table 11 was selected in the deterministic model, and not selected in the stochastic one. Inclusion of a site into a portfolio depends on other parameters involved in the optimization.

Key points

  • Site selection model generates optimal set of sites of highest value aligned with clinical trial budget, target enrollment, site capacity, site enrollment, clinical trial power and a variety of business rules.

  • The model could significantly save time and money by automatically generating an optimal set of sites, minimizing decision-making delays, and enhancing decisions quality.

  • The model was validated for a clinical trial. It produced better site selection portfolios than the ranking model did, because required budget was 12% less in optimization than in ranking. For multiple clinical trials, savings could be substantial.

  • The tool could be integrated with any sites’ databases utilizing both methodology of successful study start-up and specific tools (e.g., enrollment forecasting, financial planning, benchmarking).


Vadim Paluy, MD, is Clinical Research Medical Director, Novartis; Vladimir Shnaydman, PhD, is President, ORBee Consulting




  1. IQVIA. https://www.iqvia.com/solutions/research-and-development/precision-site-targeting
  2. Getz K. (2018). The Need and Opportunity for a New Paradigm in Clinical Trial Execution. Applied Clinical Trials (27)6.
  3. http://www.appliedclinicaltrialsonline.com/need-and-opportunity-new-paradigm-clinical-trial-execution
  4. Getz K., Fisher S., Brookman S. (1995). Managing research centers as a portfolio of strategic resources. Drug Information Journal, (29): 551-562.
  5. Shnaydman V. (2018). Practical portfolio optimization. https://www.projectmanagement.com/articles/463371/Practical-Portfolio-Optimization.
  6. Dorfman, R., Samuelson, P. A., & Solow, R. M. (1986). Linear programming and economic analysis. Revised edition. Mineola, NY: Dover Books on Computer Science.  


[1] More advanced techniques could be used if information is available.

[2] Some companies rank sites based on their forecasted enrollment rate. Enrollment rate is very important, but this approach does not take into account other aspects of site selection, like cost per patient, budget, etc.

[3] Please, contact Vladimir Shnaydman (vladimir.shnaydman@orbeeconsulting.com) for details

[4] Data was modified in order to highlight model capabilities.

[5] Especially for Phase III studies where a large number of sites has to be selected.


Related Content:

Investigative Sites