News|Articles|April 20, 2010

Trial Recruitment Without Trial and Error

Validating protocols and identifying investigators with secondary data sources.

For the past 10 years, pharmaceutical marketers have been using comprehensive, secondary databases and sophisticated analytics to reach the right physicians with the right educational messages. That same information and analytical approach can be used just as effectively to help clinical research organizations (CROs) select investigators and reach the right potential subjects for clinical studies.

The challenge reads much like a mathematical word problem from your school days: of all the practicing physicians, what is the fewest number needed to study 3000 patients diagnosed with Alzheimer's disease? Implicit in that question is another: which specific clinicians treat the most patients meeting the study criteria?

CROs, faced with just such challenging questions for every study they undertake, have long reasoned that the most expedient way to meet a study’s quota is to “return to the same well” they’ve used in the past. The usual suspects, those clinical investigators who’ve successfully completed prior trials, are their top candidates when recruiting for similar new studies.

To identify other prospective investigators, CROs purchase physician lists, access data from the Food & Drug Administration on other studies, and consult publicly available population data. These sources, however, only reveal high areas of concentration for a condition. They offer no visibility to patients prior drug use, which can be an exclusion criterion, nor do they link drug use with disease states and patient demographics.

Now, CROs can tap into secondary data resources that are familiar to the commercial side of the pharmaceutical business, and improve their recruiting productivity, by identifying relevant patient populations in all possible sites. Using some of the tools and methodologies of the pharmaceutical market researcher, CROs can base their recruitment decisions on evidence of physicians’ patient populations and treatment practices, rather than on physicians’ own estimations of how many patients they can enroll. In doing so, they can:

• Speed the investigator recruitment process, thus lowering the opportunity cost for the drug sponsor
• Improve the recruitment success rate, widening the CRO’s wafer-thin margins
• Reduce the risk that investigators will miss their targets, causing costly “do-overs” and delays

Borrowing a Page from the Market Researcher’s Book

The methodology relies on different types of databases that market researchers have used for nearly a decade to understand physician practices and patient populations. The first is a longitudinal database of integrated medical and pharmacy claims data on millions of anonymized patients, which is available in the United States. Gathered from health plans, the database includes anonymized inpatient and outpatient treatment claims, diagnoses, procedures, prescriptions, and various demographics such as patient age and gender. The database thus tracks the movement of anonymized patients through the health care system, and the findings are projected to the total insured population of the country.

The second, available in Europe, is a longitudinal data set made of electronic medical records (EMR) gathered in medical practices.

The third is a longitudinal database of prescription transactions (LRx) gathered from pharmacies (again with the patient’s identity anonymized), traced back to the prescriber. In the United States, this database captures details on 65% of all retail prescriptions filled in the country. The data elements include the physician’s specialty, patient year of birth and gender, the product form and strength, the quantity dispensed, the days of treatment supplied, and the method of payment. It also enables tracking of anonymized individuals over time by a HIPAA compliant recoding of variables. The diagnosis, however, is not captured.

By following a two-stage process that uses these databases to complement one another, it is possible to:

• Determine if the study protocol will yield enough patients to meet the required end points
• Identify the physicians who are most likely to have a sufficient patient pool for study

Assessing the Viability of a Protocol

Using the first of the two database types described above—the health plan or the EMR data sets—it is possible to quantify the number of patients with a specific profile, as defined by: age, gender, diagnosis, disease severity, treatment pattern, co-morbidities, drugs prescribed, adherence, prior hospital stays, associated procedures, laboratory tests, economic burden, and the treating physician’s specialty.

For example, a large global pharmaceutical manufacturer needed to know how many patients diagnosed with multiple sclerosis would fit its study criteria in Germany. By systematically applying exclusion criteria to the 291,256 patients initially identified through the database, the company learned that there were 12,740 German patients suitable for its trial.

Identifying the Best Investigators

In an ideal world, the longitudinal prescription data (LRx) collected from pharmacies would include the diagnosis for which the prescription was filled. If that were the case, it would be an easy matter to determine which physicians had the best patient populations for a study. However, since pharmacy prescription data do not include the diagnosis, it must be inferred from what is known. In some situations, this is a rather straightforward exercise.

If a disease is treated almost exclusively by one physician specialty (such as HIV, which is treated by infectious disease specialists) or if a drug is used for a single indication (such as statins for hyperlipidemia or TZDs for diabetes) then the physicians in the LRx database can simply be sorted into deciles to find those with the highest volume of:

• Prescriptions written
• Therapy initiation—i.e., physicians who see the most patients naïve to therapy for the condition
• Treatments in a certain drug class for first or second line therapy

If, on the other hand, a drug has multiple approved indications, the solution is less straightforward, although there are other clues in the anonymized prescription data that can suggest the diagnosis. In some situations, the diagnosis can be inferred from the physician specialty and the presence of concomitant markers. For instance, immunosuppressants are used for organ transplant patients as well as to treat autoimmune diseases. If the treating physician is a dermatologist, rheumatologist, or neurologist, one can assume that the diagnosis is an autoimmune disease. If the patient has been prescribed other immunosuppressants in the current or preceding four months, it is highly likely that the patient underwent an organ transplant in that time.

There are times, though, when it is necessary to create a predictive model to estimate the number of patients whom individual prescribers are seeing for a particular condition. In such situations, the clue to the diagnosis can be found in the physician specialty, combined with other attributes such as the patient age and the average daily dose of a medication.

For instance, consider that the same therapy (dopamine agonists) is prescribed to treat both Parkinson’s disease and Restless Leg Syndrome (RLS), although the average daily dose is very different. The dosing, together with the patient age, the presence or absence of concomitant medications such as carbodopa/levodopa or muscle relaxants, produce a reliable indicator of which disease is being treated. In one study, this methodology, when applied to a holdout sample of patients with a known diagnosis of either Parkinson’s or RLS, achieved a validation score of 93%.

Another predictive modeling approach involves using health plan data to isolate anonymized patients with a clean diagnosis. Those patients and the physicians who treat them are then profiled using an array of attributes from patient age and gender, to physician specialty and geography, to medication history, daily dose, and payment method. At this point it is possible to identify those factors that are both high and low predictors of the diagnosis. In the case of fibromyalgia, for instance, the presence of a muscle relaxant is a high predictor of the condition, whereas the presence of a diabetes drug is a negative predictor.

The next step is to create a statistical model capable of predicting the diagnosis in question when applied to the longitudinal prescription data. The goal is to be able to place individual physicians listed in the LRx database into deciles based on their estimated volume of patients having the particular diagnosis and desired patient demographics.

In the fibromyalgia example, there are an estimated 5.3 million likely fibromyalgia patients in the United States being treated by 582,000 physicians. In decile 10, there are 4,547 physicians who between them are treating 944,000 fibromyalgia patients. Clearly, these physicians should become the primary target for trial recruiters.

To test how well the model performs, it is applied to the subset of the LRx database for which claims data also exits, comparing what the model predicted with the diagnoses actually contained in the claims database. In this example, the model yielded 22% false positives—obviously not a perfect model, but a very good screening in preparation for investigator recruitment.

Benefiting from Up-Front Analysis

The first step in the process, assessing the viability of the protocol using the integrated medical and pharmacy claims database, takes just a few weeks. Because the different types of secondary databases yield sample sizes much more robust than those acquired through primary research, the results should be far more accurate. Companies can understand relatively quickly and easily if there are enough of the right types of patients to support their clinical trial needs and if their investment level will be sufficient.

The second step, developing a target list of clinical investigators, takes some additional time up front, but pays big dividends. Rather than using unqualified physician lists or simply relying on known investigators from previous studies, this methodology allows investigators to select physicians who have demonstrated the necessary patient load to meet the protocol requirements. It greatly reduces the chances that a CRO will select an investigator who cannot deliver in the end.

Today, when drug sponsors and CROs alike are faced with rising study costs and mounting time pressures, such a well-established method for speeding and perfecting investigator recruitment is a welcome solution. Since market researchers have already proved the methodology, investigator recruiters have nothing to lose and everything to gain.

Michel Denarié is Leader for IMS Health’s Customer Insights Center of Excellence. He can be reached at 610-832-5483 or at MDenarie@us.imshealth.com.

Stay current in clinical research with Applied Clinical Trials, providing expert insights, regulatory updates, and practical strategies for successful clinical trial design and execution.

Subscribe Now!

Trial Recruitment Without Trial and Error

Newsletter

Related Content

Accelerate Clinical Trials with AI-Enhanced Financial Management

SCOPE Summit 2026: Reducing Patient Burden Is the Foundation of Wearable Success in Oncology

Evolving FDA Risk Tolerance Reshapes Global Trial Alignment

SCOPE Summit 2026 Panel Discussion: Diversity in Clinical Trials—What’s Working, What’s Next

SCOPE Summit 2026: Elevating Patient Experience in Clinical Operations

Trending on Applied Clinical Trials Online

SCOPE Summit 2026 Keynote Panel: Is Radical Acceleration in Clinical Research Possible?

SCOPE Summit 2026 Panel Discussion: Diversity in Clinical Trials—What’s Working, What’s Next

Accelerate Clinical Trials with AI-Enhanced Financial Management

SCOPE Summit 2026 Keynote Fireside Chat: Aligning Purpose, Innovation, and Operational Excellence in Clinical Development

SCOPE Summit 2026: Reducing Patient Burden Is the Foundation of Wearable Success in Oncology