Measuring different dimensions of diversity beforehand can lead to more thoughtful planning and execution.
Over the past few years, there has been a groundswell of support for diversity in clinical studies. However, it is unclear what we mean by diversity or how to quantitatively measure it in a clinical study. This article addresses these fundamental questions and provides a spreadsheet tool to calculate a study’s diversity in a multidimensional manner.
FDA’s April, 2022 draft guidance, “Diversity Plans to Improve Enrollment of Participants from Underrepresented Racial and Ethnic Populations in Clinical Trials,” provides “recommendations to sponsors developing medical products on the approach for developing a Race and Ethnicity Diversity Plan … to enroll representative numbers of participants from underrepresented racial and ethnic populations in the United States…” FDA also “advises sponsors to seek diversity in … other underrepresented populations defined by demographics such as sex, gender identity, age, socioeconomic status, disability, pregnancy status, lactation status, and co-morbidity.” In addition to scientific rationales for diverse study populations, the guidance discusses issues of equity for “all clinically relevant populations,” e.g., “health disparities and differential access to health care in certain racial and ethnic populations, many of whom are part of underserved communities.”
According to the IQVIA report, “Advancing Diversity in Clinical Development through Cross-Stakeholder Commitment and Action,” as of June 1, 2022, only 32% of the 13,060 Phase I, II, and III studies on ClinicalTrials.gov (with industry involvement and at least one US site) reported any race and ethnicity diversity data at all. The report found that IQVIA’s Inclusivity Quotient (IQ) of race and ethnicity diversity varied widely across the across eight therapeutic indications.
Diversity (along with equity and inclusion) is finally getting the attention it deserves in clinical research. However, if we are going to make real, measurable progress, we need a quantitative way to measure diversity. Without a standard methodology for calculating the diversity score for a study, we cannot answer the following fundamental questions:
Two questions are even more fundamental: What do we mean by diversity and how do we measure it? These questions are easy to ask but difficult to answer until we address the following questions that are even more fundamental:
There are at least three good reasons to enroll a diverse study population:
On the other hand, there are good reasons to not enroll a diverse study population. A diverse study population is often more costly and time-consuming to enroll than one that is homogenous. A diverse study population requires a larger sample size than a homogeneous study population. It may require the participation of groups, e.g., the elderly, with higher rates of comorbidity. It is generally not a good idea to test a medical treatment on children until it has been proven safe and efficacious in adults.
It may thus be faster, more economical, and less risky to conduct a series of smaller studies with different homogeneous populations, which would also allow the medical treatment to reach the market sooner for the initial population. In this case, diversity should be measured across a clinical development program.
A study population is diverse in the sense that it reflects the diversity of a larger population. There are several options for determining the population from which to enroll study participants:
It is fair to say that a diverse study population should reflect the diversity of the clinically-significant population, subject to issues of ethics and accessibility. With respect to ethics, it would, for example, violate the principle of justice to exclude participants from populations that are unlikely to become paying customers for the eventual medical treatment since supportive financial arrangements can be made.
Accessibility is a trickier issue. To start with, multiple factors, e.g., geographical location, language, and mobility, contribute to accessibility. To further complicate matters, by excluding a population from the diversity calculation that is difficult to enroll because of accessibility issues, the study sponsor can probably increase its diversity score, while saving the cost of translation, home visits, low-enrolling sites, and other measures to increase accessibility. The goal of diversity thus demands that reasonable measures be taken to increase accessibility, despite the cost and possible negative impact on the diversity score. However, reasonability is a matter of judgment, so the justification for excluding from the calculation any population because of its inaccessibility must be clearly documented.
Most discussions about diversity in clinical research address race and ethnicity. However, based on our reasons for wanting diversity, we can consider diversity across the following dimensions (and perhaps other biological, medical, social, or other characteristics):
Measurement of diversity requires knowing the distribution of a population across its subpopulations. The decennial US census collects accurate data on certain measures of diversity, e.g., age, sex, ethnicity and race, income, and geography. The Census Bureau also conducts numerous other surveys. For example, the American Community Survey collects data on disability status, educational attainment, marital status, ancestry, language spoken at home, and other demographic elements. Data can also be collected from commercial databases, e.g., of prescriptions. Studies and surveys can also provide estimates.
The measurement of diversity works best when subgroups are clearly delineated and data is accurately reported. Unfortunately, one of the most important types of diversity is ethnicity and race, which meets neither requirement. Combining ethnicity and race into a single measure confounds the measure. Lumping Mongolians and Filipinos into the Asian category does not promote diversity. And, of course, classifying people by self-reported race is problematic when the meaning of race and ethnicity is unclear, largely a social construct, and of limited scientific relevance.
There are at least three reasons to consider prioritizing (i.e., statistically overweighting) a study subpopulation:
Any overweighting must, of course, be justified in the statistical analysis.
To measure diversity, we can take the following steps:
The procedure for measuring diversity should be specified in a diversity plan prior to launching a study.
Once a subpopulation is fairly represented in the study population, additional enrollments from that sub-population do not increase diversity.
It may make sense to weight different dimensions differently. For example, educational attainment and income are correlated, so if both dimensions are included, they should be underweighted.
It may make sense to weight different categories within a dimension. For example, only 0.2% of the U.S. population is in the Native Hawaiian & Other Pacific Islander category, so enrolling people from that sub-population will barely affect the race & ethnicity diversity score. In such cases, it may make sense to overweight such dimensions.
For a spreadsheet (hosted on the author's website) that can be used to calculate diversity in a study, click here. For demonstration purposes, the spreadsheet employs data from the US 2020 Census for race and ethnicity, sex, age, and household income. The author invites volunteers to advise on upgrading this spreadsheet to a more powerful, flexible, and intuitive website-based tool.
Comparing diversity scores across studies and tracking progress over time requires a consistent, practical, and objective measurement system. Diversity dimensions, categories and weightings should be transparent and applied as consistently as possible. Transparency is also important because different, equally diverse, configurations of the study population may serve different reasons for wanting diversity. Publishing raw data will allow independent analysts to make their own comparisons.
The complexity of diversity invites obscurity in its measurement. For example, a study sponsor may design a measurement procedure that generates a high diversity score but that does not support the principle of justice. Such inclinations should be discouraged.
A universal diversity score is probably beyond reach. However, we should be able to compare studies within a therapeutic indication for the clinically significant population.
With the benefit of experience, the cost and time to enroll a diverse study population can be projected and managed. Tracking diversity during the patient recruitment process enables resources to be reallocated to underrepresented subpopulations.
The current impetus for diversity is the principal of justice—including but not exploiting disfavored populations. However, scientific and marketing considerations cannot be ignored.
Leaving aside clinical relevance, are certain dimensions of diversity more important than others? For example, is race and ethnicity diversity more important than gender, age, or income diversity, not to mention dimensions of diversity that are totally ignored? Taking the default position that all dimensions of diversity are equally important is a decision itself.
There is probably a practical limit to the measurement of diversity, so some simplification probably makes sense. A consistent and universally accepted method of calculation will not solve the problem of diversity. However, it will clarify the issues, challenges, and tradeoffs; elevate the discussion to a more objective level; clarify the work that must be done; and recognize those who achieve it.
Norman M. Goldfarb is managing director of Elimar Systems, executive director of the Site Council, and executive director of the Clinical Research Interoperability Standards Initiative (CRISI).