Exploring stat-based testing for variables that identify deliberate data manipulation in clinical trials.
“Identifying ... potential data manipulation and data integrity problems” are an International Council for Harmonization of Technical Requirements for Pharmaceuticals for Human Use (ICH) E6 (R3) requirement and foundational for ensuring research integrity.
In this article, we examine statistical tests for continuous variables that have identified deliberate data manipulation in clinical trials and evaluate their usefulness in monitoring.
Both sloppiness and fraud can create data integrity challenges. Fraud involves deliberate falsification of data where the human is likely to make an effort to avoid outliers (keep within expected range), avoid propagation across visits, and produce plausible means. However, accurately replicating the variability between patients at the site and across visits remains a challenging task. Utilizing statistical tests that focus on these aspects can, therefore, be highly beneficial in detecting fraud.
At CluePoints, across 786 studies using statistical data monitoring, nearly 30,000 risk signals have been identified on continuous variables—~18 per study—of which 32% were evaluated to be issues. This may account for both sloppiness and fraud. We find that there’s a general tendency to not pursue variability-related signals where the data appears to be within range and not propagated.
In Figure 1, we share an example observation from a cardiovascular disease study where blood pressure was critical data. The comparison highlights a stark contrast in behavior between the two sites, with the blue center exhibiting significantly lower variability in the systolic blood pressure than the grey one.
The identified site’s other variables were evaluated, and further monitoring confirmed that the site has committed fraud.
In our experience, such analysis on non-critical variables can point toward systemic issues in attitudes to data integrity, which lead to sloppiness or fraudulent behaviors. The ability of data analytics to highlight integrity issues in non-critical data allows site monitors to focus their checks on samples of critical data—again in line with E6 (R3).
Our evidence suggests that statistical data monitoring is an integral part of data quality assurance, made all the more important by specific requirements of ICH E6 (R3). It is a way to spot, in particular, deliberate attempts to fabricate data where there are likely to be systematic efforts to evade routine human checks (e.g., range, mean, or propagation). Any central monitoring initiative should consider not only quality tolerance limit and key risk indicator surveillance but also statistical evaluations with sophisticated tests.
Sylviane de Viron, Data and Knowledge Manager; Sas Maheswaran, Vice President, Strategic Consulting; and Ken McFarlane, Vice President, Strategic Consulting; all with CluePoints
Unifying Industry to Better Understand GCP Guidance
May 7th 2025In this episode of the Applied Clinical Trials Podcast, David Nickerson, head of clinical quality management at EMD Serono; and Arlene Lee, director of product management, data quality & risk management solutions at Medidata, discuss the newest ICH E6(R3) GCP guidelines as well as how TransCelerate and ACRO have partnered to help stakeholders better acclimate to these guidelines.