Feature|Articles|February 28, 2023

Calculating Diversity in Clinical Research Studies

Measuring different dimensions of diversity beforehand can lead to more thoughtful planning and execution.

Over the past few years, there has been a groundswell of support for diversity in clinical studies. However, it is unclear what we mean by diversity or how to quantitatively measure it in a clinical study. This article addresses these fundamental questions and provides a spreadsheet tool to calculate a study’s diversity in a multidimensional manner.

FDA’s April, 2022 draft guidance, “Diversity Plans to Improve Enrollment of Participants from Underrepresented Racial and Ethnic Populations in Clinical Trials,” provides “recommendations to sponsors developing medical products on the approach for developing a Race and Ethnicity Diversity Plan … to enroll representative numbers of participants from underrepresented racial and ethnic populations in the United States…” FDA also “advises sponsors to seek diversity in … other underrepresented populations defined by demographics such as sex, gender identity, age, socioeconomic status, disability, pregnancy status, lactation status, and co-morbidity.” In addition to scientific rationales for diverse study populations, the guidance discusses issues of equity for “all clinically relevant populations,” e.g., “health disparities and differential access to health care in certain racial and ethnic populations, many of whom are part of underserved communities.”

According to the IQVIA report, “Advancing Diversity in Clinical Development through Cross-Stakeholder Commitment and Action,” as of June 1, 2022, only 32% of the 13,060 Phase I, II, and III studies on ClinicalTrials.gov (with industry involvement and at least one US site) reported any race and ethnicity diversity data at all. The report found that IQVIA’s Inclusivity Quotient (IQ) of race and ethnicity diversity varied widely across the across eight therapeutic indications.

What do we mean by “diversity”?

Diversity (along with equity and inclusion) is finally getting the attention it deserves in clinical research. However, if we are going to make real, measurable progress, we need a quantitative way to measure diversity. Without a standard methodology for calculating the diversity score for a study, we cannot answer the following fundamental questions:

How diverse is a study population?
How does the diversity of a study population compare to that of other studies?
How is the diversity of study populations changing over time?

Two questions are even more fundamental: What do we mean by diversity and how do we measure it? These questions are easy to ask but difficult to answer until we address the following questions that are even more fundamental:

Why do we want diversity?
What population should we consider?
Which dimensions of diversity should we measure and, perhaps, prioritize?
Which, if any, dimensions of diversity should we measure?
How do we calculate a study diversity score?

Why do we want diversity?

There are at least three good reasons to enroll a diverse study population:

Justice. A diverse study population ensures that, as per the Belmont Report, the burdens and benefits of clinical studies are shared fairly.
Science. A diverse study population is more likely to identify the risks and benefits of a medical treatment.
Marketing. A diverse study population can support a broader label and generate favorable public opinion.

On the other hand, there are good reasons to not enroll a diverse study population. A diverse study population is often more costly and time-consuming to enroll than one that is homogenous. A diverse study population requires a larger sample size than a homogeneous study population. It may require the participation of groups, e.g., the elderly, with higher rates of comorbidity. It is generally not a good idea to test a medical treatment on children until it has been proven safe and efficacious in adults.

It may thus be faster, more economical, and less risky to conduct a series of smaller studies with different homogeneous populations, which would also allow the medical treatment to reach the market sooner for the initial population. In this case, diversity should be measured across a clinical development program.

What population should we consider?

A study population is diverse in the sense that it reflects the diversity of a larger population. There are several options for determining the population from which to enroll study participants:

The entire population
The clinically-significant population, i.e., people who can benefit from the eventual medical treatment. For example, gynecological treatments are clinically significant to women.
The ethical population, i.e., people whose participation would not raise questions of exploitation or undue risk, often specified by regulations and guidances, e.g., prisoners.
The accessible population, i.e., people who, realistically, can enroll in a study (with reasonable accommodations). For example, people in remote areas of Alaska are not accessible.
The market population, e.g., people who are likely to have the economic capacity or insurance to afford the eventual medical treatment.

It is fair to say that a diverse study population should reflect the diversity of the clinically-significant population, subject to issues of ethics and accessibility. With respect to ethics, it would, for example, violate the principle of justice to exclude participants from populations that are unlikely to become paying customers for the eventual medical treatment since supportive financial arrangements can be made.

Accessibility is a trickier issue. To start with, multiple factors, e.g., geographical location, language, and mobility, contribute to accessibility. To further complicate matters, by excluding a population from the diversity calculation that is difficult to enroll because of accessibility issues, the study sponsor can probably increase its diversity score, while saving the cost of translation, home visits, low-enrolling sites, and other measures to increase accessibility. The goal of diversity thus demands that reasonable measures be taken to increase accessibility, despite the cost and possible negative impact on the diversity score. However, reasonability is a matter of judgment, so the justification for excluding from the calculation any population because of its inaccessibility must be clearly documented.

Which dimensions of diversity should we measure?

Most discussions about diversity in clinical research address race and ethnicity. However, based on our reasons for wanting diversity, we can consider diversity across the following dimensions (and perhaps other biological, medical, social, or other characteristics):

Race, ethnicity, and ancestry
Sex and gender identity
Age
Socioeconomic status
Educational attainment
Health (comorbidities, e.g., obesity, cognition)
Disability
Income
Geography (e.g., urban vs. rural)
Language (e.g., English-speaking)
Legal status (e.g., immigration status)
Genetic variation (e.g., liver enzymes, blood types)
Disease severity

Measurement of diversity requires knowing the distribution of a population across its subpopulations. The decennial US census collects accurate data on certain measures of diversity, e.g., age, sex, ethnicity and race, income, and geography. The Census Bureau also conducts numerous other surveys. For example, the American Community Survey collects data on disability status, educational attainment, marital status, ancestry, language spoken at home, and other demographic elements. Data can also be collected from commercial databases, e.g., of prescriptions. Studies and surveys can also provide estimates.

The measurement of diversity works best when subgroups are clearly delineated and data is accurately reported. Unfortunately, one of the most important types of diversity is ethnicity and race, which meets neither requirement. Combining ethnicity and race into a single measure confounds the measure. Lumping Mongolians and Filipinos into the Asian category does not promote diversity. And, of course, classifying people by self-reported race is problematic when the meaning of race and ethnicity is unclear, largely a social construct, and of limited scientific relevance.

Which, if any, dimensions of diversity should we prioritize?

There are at least three reasons to consider prioritizing (i.e., statistically overweighting) a study subpopulation:

Scientific evidence may suggest that a subpopulation will be more amenable to the study treatment.
Enrolling a rare population may require extra effort.
It may be desirable to address historical wrongs against a disfavored population.

Any overweighting must, of course, be justified in the statistical analysis.

How do we calculate a diversity score?

To measure diversity, we can take the following steps:

Select the diversity dimensions to measure.
Establish the categories within each dimension (e.g., age ranges).
Establish any exclusions or limits (e.g., ages 18 to 65).
Obtain population data by category.
Enter prevalence percentages to capture clinical significance.
Classify and count study participants by diversity dimensions.
Compare study population percentages vs. available population percentages.
Combine comparison data into a single study-diversity score (up to 100%).

The procedure for measuring diversity should be specified in a diversity plan prior to launching a study.

Once a subpopulation is fairly represented in the study population, additional enrollments from that sub-population do not increase diversity.

It may make sense to weight different dimensions differently. For example, educational attainment and income are correlated, so if both dimensions are included, they should be underweighted.

It may make sense to weight different categories within a dimension. For example, only 0.2% of the U.S. population is in the Native Hawaiian & Other Pacific Islander category, so enrolling people from that sub-population will barely affect the race & ethnicity diversity score. In such cases, it may make sense to overweight such dimensions.

For a spreadsheet (hosted on the author's website) that can be used to calculate diversity in a study, click here. For demonstration purposes, the spreadsheet employs data from the US 2020 Census for race and ethnicity, sex, age, and household income. The author invites volunteers to advise on upgrading this spreadsheet to a more powerful, flexible, and intuitive website-based tool.

How do we compare diversity scores across studies?

Comparing diversity scores across studies and tracking progress over time requires a consistent, practical, and objective measurement system. Diversity dimensions, categories and weightings should be transparent and applied as consistently as possible. Transparency is also important because different, equally diverse, configurations of the study population may serve different reasons for wanting diversity. Publishing raw data will allow independent analysts to make their own comparisons.

The complexity of diversity invites obscurity in its measurement. For example, a study sponsor may design a measurement procedure that generates a high diversity score but that does not support the principle of justice. Such inclinations should be discouraged.

A universal diversity score is probably beyond reach. However, we should be able to compare studies within a therapeutic indication for the clinically significant population.

Final observations

With the benefit of experience, the cost and time to enroll a diverse study population can be projected and managed. Tracking diversity during the patient recruitment process enables resources to be reallocated to underrepresented subpopulations.

The current impetus for diversity is the principal of justice—including but not exploiting disfavored populations. However, scientific and marketing considerations cannot be ignored.

Leaving aside clinical relevance, are certain dimensions of diversity more important than others? For example, is race and ethnicity diversity more important than gender, age, or income diversity, not to mention dimensions of diversity that are totally ignored? Taking the default position that all dimensions of diversity are equally important is a decision itself.

There is probably a practical limit to the measurement of diversity, so some simplification probably makes sense. A consistent and universally accepted method of calculation will not solve the problem of diversity. However, it will clarify the issues, challenges, and tradeoffs; elevate the discussion to a more objective level; clarify the work that must be done; and recognize those who achieve it.

Norman M. Goldfarb is managing director of Elimar Systems, executive director of the Site Council, and executive director of the Clinical Research Interoperability Standards Initiative (CRISI).

Stay current in clinical research with Applied Clinical Trials, providing expert insights, regulatory updates, and practical strategies for successful clinical trial design and execution.

Calculating Diversity in Clinical Research Studies

What do we mean by “diversity”?

Why do we want diversity?

What population should we consider?

Which dimensions of diversity should we measure?

Which, if any, dimensions of diversity should we prioritize?

How do we calculate a diversity score?

How do we compare diversity scores across studies?

Final observations

Related Content

What the "Dead Time" Problem Actually Is and Why It Matters for Real-Time Monitoring

ACT Brief: Inspection Readiness Throughout Lifecycle, AI Churn and Operational Discipline, and Global Trial Stakeholder Alignment

What Inspection Readiness Actually Looks Like When It's Built In From the Start

The AI Churn Trap in Life Sciences and How to Break It

ACT Brief: Execution Failures and Upstream Origins, European Data Governance Reshaping CROs, and Phase-Dependent Trial Competition

Trending on Applied Clinical Trials Online

The AI Churn Trap in Life Sciences and How to Break It

ACT Brief: Execution Failures and Upstream Origins, European Data Governance Reshaping CROs, and Phase-Dependent Trial Competition

Real-World Evidence in Transition: CRO Adaptation to the European Health Data Space Framework

ACT Brief: Sponsor Oversight Capability Gaps, Community-Embedded Trial Partnerships, and Trialblazer Timeline Realism

What Inspection Readiness Actually Looks Like When It's Built In From the Start