Past decades have shown gender-based differences in clinical trial results are often overlooked when considering safety and effectiveness.
It is increasingly apparent that many physiological and pathological functions as well as patterns in gene expression differ between women and men and gender differences have also manifested in the outcomes of treatments.1 Throughout the last decades, alarming evidence of gender-based differences in the safety profiles of treatments has accumulated. A study including 513,608 patients estimated that women experience a 1.5 to 1.7-time greater risk of developing adverse reactions to drugs than men.2 In 2001, a report from the U.S. General Accounting Office (GAO) made the observation that 70% of drugs withdrawn from the market between 1997 and 2000 presented greater health risks for women.3 On top of that is data suggesting that women are overdosed: the dose of zolpidem, for example, was reduced by 50% in women, after studies showed that women have five times more risk than men of driving impairment after 10 mg of zolpidem.4 Gender differences have also been spotted in effectiveness of treatments: 20 years after the Physician’s Health Study (a clinical trial that excluded women), a large placebo-controlled primary-prevention trial involving 39,876 women demonstrated that daily low dose aspirin has no significant efficacy on the risk of myocardial infarction or death from cardiovascular causes in women, as opposed to results in men.5
These differences in the outcomes of treatment can be explained by well-studied gender differences in all the phases of pharmacokinetics6 and the growing literature in pharmacodynamics describing gender differences in drug-receptor affinity, receptor density, or signal transduction pathways.7,8,9,10,11,12 Despite these evidences, gender is still is one of the most underappreciated variables in the clinical development of drugs. To address this shortcoming, it is important to clearly define the place of clinical trials in identifying gender differences.
.Since the thalidomide disaster of the 60s, it became a common practice to protect fetuses from clinal trials, rather than protecting them through well controlled clinical studies. In 1977, the FDA excluded “women of child-bearing potential” from Phase I and early Phase II clinical research.13 This 1977 FDA guideline was, however, applied to all pharmaceutical research and the expression “woman of childbearing potential” was defined too extensively as any woman capable of becoming pregnant, regardless of their sexual activity, their use of contraceptives, their sexual orientation, the possible sterility of their partners, or even their desire to have a child. Additionally, several publications indicated that teratogenicity can also be transmitted via sperm.14,15,16,17 This dangerous ban was in application until 1993 and countless drugs were put on the market through clinical research that underrepresented women, exposing (until this day) women to drugs that were not or less tested on them. The FDA 1993 guidance reversed the ban and recommended that clinical studies include men and women “in numbers appropriate to allow the detection of clinically significant gender differences in drug responses.”14,18
This raises several questions: Firstly, what is an appropriate enrollment of women? Gender bias is a complex problem and will not necessarily be solved by the simple achievement of a 50:50 distribution of participants’ gender. Rather than considering only rates of enrollment, a participation to prevalence ratio (PPR) superior to 0.8 is an indicator of appropriate enrollment of women and addresses gender differences in disease prevalence. In the beginning of the new millennium, the GAO suggested that women are “sufficiently represented in clinical trials”19 but several authors disagree.20,21,22 Regarding late phase clinical trials, Pool et al. estimated in 2009 that the participation of women between 2007 and 2009 was 43.3%, and 3.5% of the clinical trials still unspecify the sex.22 In 2020, Jin et al. revealed that among 740 cardiovascular trials conducted between 2010 and 2017, only 38.2% of participants were women. Furthermore, a participation to prevalence ratio inferior to 0.8 was associated with arrhythmia, coronary heart disease, acute coronary syndrome, and heart failure trials.23
Secondly, how early in drug development should enrollment of women be a concern? Pinnow et al. estimate that only 30.6% of Phase I trial participants are women and 34.1% of early trials enroll exclusively men.24 These findings indicate that all the data imminent from early clinical trials, including dose tolerability, dose and use of a drug, the metabolic and pharmacologic data, side effects associated with increasing doses and even the choice of the investigational drug tested in large-scale trials, are mainly tailored based on data from men. Still, is Phase I early enough to worry about avoiding gender bias? Most of our fundamental knowledge of drugs comes from non-human models including animals, tissues, and cell lines, but about 80% of non-clinical studies use only male animals.25 The differences in drug response between female and male animals could forecast the differences in treatment outcomes between women and men patients. In cardiotoxicity studies of dofetilide in rabbit, 100% of female rabbits vs. 30% of males developed severe cardiac arrhythmias and female rabbits developed cardiac arrhythmias at doses 50% lower than those given to males.26 This can be explained by the protective action of testosterone through shortening of the baseline QT intervals.27 Since women are more sensitive to QT interval prolongation than men,28 testing drugs on male-only animals could lead to widely under-estimate cardiotoxicity.
Even if the rate of enrollment of women is appropriate, pooling data of both genders may still yield inexact results and problems of reproducibility. Several theoretical models and clinical trials have demonstrated that pooling data from women and men can mask important male and female differences in baseline data, treatment response and also in sex treatment interactions and leads to biased results that are not adapted to women, nor are they rigorously adapted to men either.29
The Digitalis Investigation Group clinical trial is a prime example of the complexity of subgroup analysis. In 1997, the results of the digoxin trial were published, and the outcome was positive. In 2002, Rathore et al. obtained a public-use copy of the data base of the Digitalis Investigation Group and conducted the same study again but performing an analysis based on gender, and the results were alarming. The results for men were identical to the original results, but in women, it was found that digoxin significantly increased mortality and digoxin-associated reduction in the rate of hospitalization for heart failure was smaller.30 The results of this post-hoc analysis were not reproductible in large observational studies.31,32 When gender-based analysis of randomized controlled trials (RCT) and observational studies are inconsistent, it may raise some confusion. On one hand, post hoc analysis of RCT have less statistical power to detect an interaction between gender and effectiveness of treatment, and even less power to detect interaction between gender and uncommon adverse events and are more prone to type 1 errors (false positive) to detect interaction. On the other hand, observational studies are not randomized, and are biased by unmeasured confounders that affect the interaction analysis. To reduce the weaknesses in the observational design in this regard, propensity-matched analysis reduced the bias due to confounding variables that could be found in an estimate of the treatment effect. To avoid overinterpretation, gender-based analysis must thus be performed but in very strict conditions:
In 2007, it was reported that among RCT performing a gender-based subgroup analysis, only 35% of studies performed a proper subgroup analysis.33
Regarding the sample size, including women and men analysis does not necessarily imply to double of the number of patients to maintain statistical power. More efficient designs, including factorial design (in which two experimental factors with multiple levels are tested, and data are collected across all possible combinations of factors and levels) allow to maintain power at low cost, by increasing the sample size by only 14% to 33%.34
In 2020, it is estimated that more than 300,000 clinical trials start every year. This represents therapeutic opportunities but also potential harm for subgroups that are not well represented in clinical trials. In order to be reliable and useful, clinical trials need to be both internally and externally valid. Lack of external validity has created a call for more pragmatic “real life data” trials and the production of data that are applicable to all categories of patients. During this era of personalized medicine, “one-size-fits-all” is not acceptable anymore. Patients need a treatment that is well suited for different subgroups, and women and men are different. Sex is a fundamental variable that should be used to disaggregate data and explain differences in treatments outcomes. To avoid this bias, two main criteria are determinant: inclusion of women in the right number and correct gender-based analysis. In late phase clinical trials, there has been improvement, but men still predominate. In 1993, when the ban of exclusion of women in clinical trials was lifted, one on the main concerns was the underrepresentation of women in cardiovascular trials. In February 2020, the same concern is still raised concerning drug trials on cardiovascular diseases, the leading cause of death. In early phases of clinical trials, women are still largely underrepresented or even excluded, fostering a gender bias in the data of dose tolerability, appropriate dosing, metabolic, and clinical pharmacology. Moreover, when research lacks or excludes female subjects, the guidelines should clearly state that the evidence has been obtained mainly from men. Once a drug is marketed, the scientific or patient leaflet should mention the ratio and total number of women and men that participated in the clinical trials. Regarding analyses by subgroup, it is estimated that only half of clinical trials perform gender-based analysis and only 35% conduct proper subgroup analyses. Investigators should follow guidelines to ensure the proper conduct of subgroup analysis to prevent misleading conclusions from becoming adopted by clinicians. Overall, inclusion of women and gender-based analysis must continue to improve to ensure external validity, reduce gender bias, avoid distrust in controlled clinical trials, and protect women’s health from drugs that are less adapted to them.
Younes Benjeaa obtained a Master in Biomedical Sciences from the University of Namur, Belgium and Yves Geysels is a Professor of Clinical Trials at the University of Namur, Faculty of Medicine, Department of Biomedical Sciences, Belgium. He is the Honorary President of the Belgian Association of Clinical Research Professionals (BeACRP).