Diversity in Clinical Trial Data Transparency: A New Horizon for Data Sharing with Truthful Statistics


The pursuit of balance between data utility, privacy protection, and equitable representation.

Image Credit: © Egor - stock.adobe.com

Image Credit: © Egor - stock.adobe.com

The need to preserve data quality, uphold transparency, and mitigate potential biases is underlined by new challenges of managing individual participant data (IPD) for larger minority groups and multiple minority groups. Transitioning from raw clinical data sharing to new technologies in statistical sharing emerges as a promising opportunity in the landscape of IPD anonymization.

The shift towards producing accurate statistics presents a viable solution to the complexities of diverse anonymized data, while ensuring privacy and data utility. Within this context, the attributes of accuracy, reliability, and impartiality take center stage, forming the foundation of this approach.

The challenges associated with managing diverse data in clinical trial transparency efforts cannot be overstated. As participant diversity continues to expand, the intricacies of data anonymization become even more complex, requiring new approaches to ensure accurate statistical sharing while maintaining privacy and data utility.

From addressing challenges of data diversity and anonymization to seizing opportunities for statistical sharing, the aim remains steadfast—to foster a robust research ecosystem that simultaneously safeguards privacy, amplifies data utility, and advances the understanding of medical science.

Leveraging statistical sharing for enhanced data utility

Strategies encompassing synthetic datasets, aggregated counts, and advanced analytics to artificial intelligence and machine learning (AIML) would collectively shape a pathway between data utility and protection. The endeavor underscores a balance between insightful research and ethical data handling.

Addressing the challenge of ensuring truthful and diverse anonymized data within the context of clinical trials calls for a shift from sharing raw data to sharing truthful statistics. This approach hinges on the generation of accurate, reliable, and unbiased statistics. These attributes are needed to achieve valid inferences while adhering to the tenets of data anonymization, as shown in Figure 1.

Figure 1. Truthful statistics are accurate, reliable, and unbiased.

Figure 1. Truthful statistics are accurate, reliable, and unbiased.

The fundamental premise of statistical sharing centers on the assumption that statistical outcomes follow a discernible probability distribution. By providing accurate, unbiased statistics accompanied by appropriate confidence limits, the end users' confidence in their inferences is bolstered. This remains true even in scenarios where variations in the data might arise.

In essence, the shift toward statistical sharing represents a pragmatic approach to circumvent the challenges of producing anonymized IPD that accurately reflects the diversity inherent in clinical trials. By relying on robust statistical analyses, researchers can ensure that their findings are both reliable and sound, enabling accurate insights to emerge from trial data without compromising privacy or data integrity.

Transforming protected trial data into reliable and practical statistics

In the realm of IPD anonymization, a noteworthy opportunity presents itself in converting safeguarded clinical trial data into secure and practical statistics. This approach can be thought of as passing the raw data through a privacy-protective layer, ensuring both data security and utility.

Within this framework, distinct use cases come to light, such as those show in Figure 2, each characterized by the transformation of protected trial data into valuable statistics. These statistics provide insights without exposing the raw data. The distinct use cases are divided into three classes for the purposes of this article: synthetic dataset, aggregated counts, and analytics to AIML.

Figure 2. Various options for producing useful statistics, including synthetic datasets, aggregated data, and specialized Analytics or AIML platforms.

Figure 2. Various options for producing useful statistics, including synthetic datasets, aggregated data, and specialized Analytics or AIML platforms.

The options for producing useful statists without exposing the raw data all have their pros and cons. The decision of which to use will largely depend on the use cases, including the environment in which they are deployed as this will inform risks.1 In Table 1, we summarize the pros and cons of each technique introduced.

Table 1. Summary of strengths and weaknesses of different.

Table 1. Summary of strengths and weaknesses of different.

The suggested approaches above can also be combined in novel ways to meet various design objectives. Overall, the opportunity to transform protected clinical trial data into secure statistics provides a pragmatic approach to navigating the challenges posed by data anonymization with diversity in mind.

By ensuring the privacy of the raw data while deriving meaningful insights, researchers can harness the potential of their data while adhering to greater standards of data security. This could pave the way for the safe utilization of data across diverse applications.

Balancing diversity and privacy when navigating complexities in IPD anonymization

The endeavor of anonymizing IPD meets with challenges that are heightened by the implications of diversity. The goal is to strike a balance between safeguarding participant privacy and preserving data diversity.

With diversity's amplifying impact on the data, both within larger minority groups and with the introduction of multiple minority groups, several implications for anonymization strategies unfold. The challenge is to maintain data quality, combat potential biases, and uphold transparency while navigating the diversity of anonymized data.

Impact of increasing diversity on IPD anonymization

The drive to enhance diversity within clinical trials introduces a nuanced dimension to IPD anonymization. The impact of this endeavor on data transparency through anonymization hinges on a complex interplay of factors, contingent upon the specific circumstances and data distribution.

Expanding the representation of a single minority group by incorporating a larger number of patients can yield contrasting outcomes. Specifically, the increased number of patients within the minority group can potentially diminish the feasibility of singling out any individual participant, thereby enhancing the efficacy of anonymization measures. The very act of becoming part of a larger cohort within the minority group serves to obscure the identity of participants, facilitating anonymization efforts.

On the other hand, the introduction of multiple minority groups into the clinical trial framework introduces an alternative dynamic. This expansion can lead to an averaging effect, potentially diminishing the effectiveness of anonymization measures. By dividing the participants into a greater number of minority groups, individuals who could be singled out may become further dispersed, making them potentially easier to identify, and thus complicating the anonymization process.

Navigating these challenges mandates a delicate balance. The concern is maintaining accurate marginal counts and producing truthful research outcomes. Understanding the utility impact on outcomes, such as changes in false negatives or false positives, becomes critical.

Addressing this challenge can require the implementation of dedicated algorithms that account for the effects of data anonymization. These algorithms would aim to ensure the preservation of absolute and relative measures that can effectively detect and quantify disparity while safeguarding data transparency.

Implications for diversity in IPD anonymization

The anonymization of IPD stands as an important step in safeguarding patient privacy while enabling data sharing. This process involves transforming data in a manner that precludes the identification of specific individuals, thwarting any potential linkage, inference, or disclosure of their identities.

There are some key definitions that underpin IPD anonymization strategies that we need to understand in order to describe their potential impact on diversity, as described in Table 1.

Table 2. Selected terms and definitions for IPD anonymization.

Table 2. Selected terms and definitions for IPD anonymization.

Small population subgroups are more susceptible to disproportionate transformations during anonymization. By virtue of their limited size, these subgroups are inherently easier to single out, potentially compromising their privacy in the process.

Generalization, which reduces the granularity of data, can increase the variation in outcomes. This can lead to diminished separation between outcomes and introduce false negatives, potentially influenced by biases. Conversely, suppression (redaction) of information can decrease outcome variation, thereby accentuating the separation between outcomes and introducing false positives, possibly biased in nature. Noise addition, often used to protect privacy, augments outcome variation, yet it can also reduce the separation between outcomes and introduce false negatives, usually without biases.8

In addition to these complexities, the application of imputation or data augmentation can result in misleading counts, presenting challenges for transparency in reporting. Emerging techniques that promise "comparable utility" while providing heightened privacy guarantees for minority groups are gaining attention.9 However, these methods come at a substantial implementation cost, thereby raising practical considerations.

Addressing disparities for equitable research

The performance of clinical trials is often marked by lack of diversity, barriers to participation, and consequent efforts towards enhancement. The underrepresentation of minority groups in trials sparks concerns that impact public trust and health equity. When these groups are excluded, a knowledge gap emerges, hindering comprehensive insights into diseases and their treatments. This inadvertently perpetuates disparities in health outcomes and access to effective interventions.

Rooted in factors like historical apprehensions and practical constraints, barriers to diversity persist, as summarized in Figure 3. However, recent developments underscore a notable shift. Regulatory initiatives, particularly the impending FDA requirement for diversity inclusion plans, signal a pivotal turning point.10

Figure 3. Reasons for lack of diversity in clinical trials.

Figure 3. Reasons for lack of diversity in clinical trials.

Pharmaceutical companies are aligning with these mandates, initiating proactive commitments to enhance inclusivity. Strategies ranging from early recruitment to overcoming socio-economic barriers collectively shape an evolving landscape. While challenges persist, the momentum towards equitable representation within clinical trials is gaining traction, reflecting a broader recognition of the ethical and scientific imperatives at play.11

Underlying factors creating barriers to diversity in clinical trials

While interest in clinical trial participation is evident among underrepresented groups, persistent low participation rates underscore the existence of various impediments that deter potential participants from engaging in such studies.12 Core reasons behind the lack of diversity in clinical trials highlight possible factors that influence the decision-making process of underrepresented populations.13

One notable factor contributing to this underrepresentation is the apprehension stemming from the medical field itself. Concerns about experiencing discrimination, whether overt or subtle, from medical professionals and researchers can lead to hesitation among potential participants.

Historical contexts also play a role in shaping the reluctance of underrepresented individuals to participate in clinical trials. A history marked by instances of unethical medical testing, such as the Tuskegee syphilis study in the United States, has engendered a profound fear of exploitation.

Limited access to these centers restricts opportunities for underrepresented populations to learn about and participate in relevant trials. Pragmatic constraints such as time and financial resources are also significant barriers that hinder participation.

The commitment required to engage in a clinical trial, including attending appointments, adhering to study protocols, and potential travel, can be demanding. Coupled with financial considerations, individuals from underrepresented backgrounds may find it challenging to allocate resources, both time and money, to participate in a clinical trial, further contributing to low enrollment rates.

Regulatory impetus and industry response to enhancing diversity in clinical trials

With the imminent implementation of new FDA regulations, researchers and pharmaceutical companies seeking approval for late-stage clinical trials will be required to incorporate diversity inclusion plans.14 In response, pharmaceutical companies are proactively publicizing their commitments to address this issue, signaling a broader industry awareness of the need for improved inclusivity. We summarize in Figure 4 recommendations to increase diversity in clinical trials.

Figure 4. Recommendations to increase diversity in clinical trials.

Figure 4. Recommendations to increase diversity in clinical trials.

Central to these efforts is the adoption of vigorous recruitment and retention strategies that commence early in the trial planning process. Diversity and inclusion are integrated into trial design from the outset, ensuring that the patient population’s diversity is reflected not only in the trial personnel but also in the trial design and site selection.

To foster an environment of trust and engagement, a focus on community involvement at trial sites is integral. Strategies involve employing patient-oriented staff and establishing safe and welcoming settings that resonate with diverse populations. These measures aim to mitigate apprehensions related to participation and bridge the gap between underrepresented communities and the clinical research process.15

The drive for inclusivity also extends beyond the trial duration. Emphasizing the delivery of patient value before, during, and after a trial underscores the commitment to the community. This approach encompasses not only the trial's immediate impact on patients' health but also the broader value it brings to the community by advancing medical knowledge and improving treatment options.

Fostering accountability and accessibility through data transparency

The field of clinical trial data transparency encompasses a varied landscape, underpinned by essential standards and end-user expectations. Vital to this endeavor is the promotion of comprehensive and timely access to trial data, documents, and results.16

Essential elements and standards for promoting data transparency

Ensuring transparency in clinical trial data is important in upholding scientific rigor, accountability, and informed decision-making. The expectations and key components of clinical trial data transparency underscore the various aspects that collectively contribute to a more open and comprehensive sharing of trial-related information.

At the core of clinical trial data transparency lies the desire to make trial data accessible to the public and the scientific community. This practice not only promotes transparency and accountability but also facilitates independent scrutiny and analysis, essential for ensuring the robustness and validity of trial outcomes.

Another aspect of clinical trial data transparency is the sharing of anonymized IPD and its associated clinical documents (eg, trial protocol and statistical analysis plan) with qualified researchers upon request.1 While ensuring privacy protection and adhering to ethical considerations, this practice promotes collaborative research and facilitates secondary analyses that can further enrich the understanding of trial outcomes.

Data quality and integrity are of course necessary in clinical trial data transparency. High standards must be maintained throughout the data collection, analysis, and reporting processes to ensure that the information shared is accurate, reliable, and representative of the trial's conduct and findings.

End-user expectations and needs from data transparency

Ensuring the availability of IPD from clinical trials for responsible and transparent utilization fosters accountability, promotes independent analysis, and advances scientific integrity. In this context, understanding the preferences and requirements of end-user researchers is paramount.

Findings from a recent survey shed light on the perspective of end-user researchers regarding the availability of various elements of clinical trial data.17

  • The survey revealed that approximately 93% of respondents considered a variable-level transformation report to be mandatory, important, or useful.
  • Similarly, about 90% of respondents emphasized the significance of data redaction on transparency, particularly for adverse events, demographics, and laboratory values.
  • Over 50% of the respondents indicated the importance of transparency in redactions before accessing study data, underscoring the value placed on unaltered information.

End-user researchers prioritize certain fundamental needs when engaging with clinical trial data. Foremost among these needs is access to truthful data—information that is accurate, reliable, and unbiased. The requirement for accurate marginal counts exemplifies the demand for unadulterated data that faithfully represents factual information without distortion or misrepresentation.

The need for complete and comprehensive data is also of great importance. Minimal transformations from the original dataset are desired to ensure the replicability of scientific findings. Researchers value unaltered data as it aids in verifying and building upon existing knowledge, thereby advancing the reliability and credibility of research outcomes.

Equally important is the concept of safe data and outputs. Researchers seek data that can be shared and reused for beneficial purposes without compromising ethical standards or individual privacy. This concept is critical for accelerating research progress and fostering scientific advancement, ultimately contributing to a broader scientific benefit.


The disparities introduced by demographic variations necessitate careful consideration to strike a balance between preserving data integrity and minimizing the risk of biases. Health and medicines regulators and pharmaceutical companies are seeking methodologies that champion inclusivity without compromising the reliability of research findings.

Addressing the intricacies of anonymizing IPD reveals an opportunity to shift the current approach of data sharing towards statistical sharing and truthful statistics, by producing accurate statistics without access to the underlying clinical trial data. This approach reconciles the tension between data sharing and privacy protection, allowing researchers to glean insights while safeguarding participant identities.

Enhancing inclusivity within trials, coupled with the adoption of sophisticated strategies for data transformation, aims to foster trust and bolster the credibility of research findings. These efforts collectively advance the mission of transparency, accountability, and informed decision-making. The intricate interplay of diversity, anonymization, transparency, and utility underscores the complexity inherent in managing clinical trial data while ensuring both ethical considerations and scientific advancement.

As clinical research advances, the considerations and methodologies described herein hold significance for both the research community and the broader societal sphere. The balance between data utility, privacy protection, and equitable representation remains an ongoing pursuit—one that underscores the continuous evolution of ethical, practical, and scientific approaches in the realm of clinical research and its data management.

Stephen Bamford, Head of Clinical Data Standards & Transparency, The Janssen Pharmaceutical Companies of Johnson & Johnson, Luk Arbuckle, Chief Methodologist, Privacy Analytics, and Pierre Chetelat, Research Associate, Privacy Analytics


  1. Stephen Bamford, Sarah Lyons, Luk Arbuckle, Pierre Chetelat. Sharing Anonymized and Functionally Effective (SAFE) Data Standard for Safely Sharing Rich Clinical Trial Data. Appl Clin Trials. 2022;31(7/8):30–43.
  2. Gonzales A, Guruswamy G, Smith SR. Synthetic Data in Health Care: A Narrative Review. Johnson A, editor. PLOS Digit Health. 2023 Jan 6;2(1):e0000082.
  3. Giuffrè M, Shung DL. Harnessing the Power of Synthetic Data in Healthcare: Innovation, Application, and Privacy. Npj Digit Med. 2023 Oct 9;6(1):1–8.
  4. Castro J. Statistical Disclosure Control in Tabular Data. In: Nin J, Herranz J, editors. Privacy and Anonymity in Information Management Systems: New Techniques for New Practical Problems [Internet]. London: Springer; 2010 [cited 2023 Dec 9]. p. 113–31. (Advanced Information and Knowledge Processing). Available from: https://doi.org/10.1007/978-1-84996-238-4_6
  5. Matthews GJ, Harel O, Aseltine RH. Privacy Protection and Aggregate Health Data: A Review of Tabular Cell Suppression Methods (not) Employed in Public Health Data Systems. Health Serv Outcomes Res Methodol. 2016 Dec 1;16(4):258–70.
  6. O’Keefe CM, Chipperfield JO. A Summary of Attack Methods and Confidentiality Protection Measures for Fully Automated Remote Analysis Systems. Int Stat Rev. 2013;81(3):426–55.
  7. Mansouri-Benssassi E, Rogers S, Reel S, Malone M, Smith J, Ritchie F, et al. Disclosure Control of Machine Learning Models from Trusted Research Environments (TRE): New Challenges and Opportunities. Heliyon [Internet]. 2023 Apr 1 [cited 2023 Dec 9];9(4). Available from: https://www.cell.com/heliyon/abstract/S2405-8440(23)02350-2
  8. Xu H, Zhang N. Implications of Data Anonymization on the Statistical Evidence of Disparity. Manag Sci. 2022 Apr;68(4):2600–18.
  9. Christ M, Radway S, Bellovin SM. Differential Privacy and Swapping: Examining De-Identification’s Impact on Minority Representation and Privacy Preservation in the U.S. Census. In: 2022 IEEE Symposium on Security and Privacy (SP) [Internet]. San Francisco, CA: IEEE; 2022 [cited 2023 Dec 9]. p. 457–72. Available from: https://ieeexplore.ieee.org/document/9833668
  10. Office of the Commissioner. Diversity Plans to Improve Enrollment of Participants From Underrepresented Racial and Ethnic Populations in Clinical Trials; Draft Guidance for Industry; Availability [Internet]. Rockville, MD: U.S. Food & Drug Administration; 2022 Apr [cited 2023 Dec 9] p. 9. Report No.: FDA-2021-D-0789. Available from: https://www.fda.gov/regulatory-information/search-fda-guidance-documents/diversity-plans-improve-enrollment-participants-underrepresented-racial-and-ethnic-populations
  11. National Institute of Health. National Institute on Minority Health and Health Disparities. 2023 [cited 2023 Dec 9]. Diversity and Inclusion in Clinical Trials. Available from: https://nimhd.nih.gov/resources/understanding-health-disparities/diversity-and-inclusion-in-clinical-trials.html
  12. Denise Messer. How Clinical Trial Design Impacts Enrollment of Diverse Populations [Internet]. IQVIA Applied Data Science Center; 2023 Feb [cited 2023 Dec 10] p. 11. Available from: https://www.iqvia.com/library/white-papers/how-clinical-trial-design-impacts-enrollment-of-diverse-populations
  13. The Editors. Clinical Trials Need More Diversity. Scientific American [Internet]. 2022 Jun [cited 2023 Dec 9];31(3). Available from: https://www.scientificamerican.com/article/clinical-trials-need-more-diversity/
  14. Office of the Commissioner. FDA Takes Important Steps to Increase Racial and Ethnic Diversity in Clinical Trials. US Food & Drug Administration [Internet]. 2022 Apr 13 [cited 2023 Dec 9]; Available from: https://www.fda.gov/news-events/press-announcements/fda-takes-important-steps-increase-racial-and-ethnic-diversity-clinical-trials
  15. Maria I. Florez, Emily Botto, Zoma Foster, Earl Seltzer, Barbara Valastro, Linda Ashmore, et al. Improving Diversity in Clinical Trial Volunteer Participation by Addressing Racial and Ethnic Representation Among the Clinical Research Workforce. Applied Clinical Trials [Internet]. 2022 Jun 13 [cited 2023 Dec 10]; Available from: https://www.appliedclinicaltrialsonline.com/view/improving-diversity-in-clinical-trial-volunteer-participation-by-addressing-racial-and-ethnic-representation-among-the-clinical-research-workforce
  16. PHUSE Data Transparency Working Group. A Global View of the Clinical Transparency Landscape—Best Practices Guide. PHUSE; 2020 p. 60. Report No.: WP-35.
  17. Ernest Odame, Tracy Burgess, Luk Arbuckle, Andrei Belcin, Gwenyth Jones, Peter Mesenbrink, et al. Establishing a Basis for Secondary Use Standards for Clinical Trials. Applied Clinical Trials [Internet]. 2023 Mar 8 [cited 2023 Dec 9]; Available from: https://www.appliedclinicaltrialsonline.com/view/establishing-a-basis-for-secondary-use-standards-for-clinical-trials
© 2024 MJH Life Sciences

All rights reserved.