OR WAIT null SECS
Examining the practicality of implementing CM techniques to drive trial oversight efficiency while saving on-site monitoring resources and costs.
In the context of multicenter clinical research, centralized monitoring (CM) is the most efficient way to ensure patient safety, trial integrity, and data quality.
As it permits the study team to proactively detect anomalous data trends, CM improves the quality of the regulatory submissions with a direct impact on the time to marketing approval.
Since publication of the regulatory guidance on risk-based monitoring (RBM) five years ago,5-7 the concept of CM has developed amid the emergence of technological enablers that make clinical research more data-driven than ever. Today, regulators encourage the use of CM in conjunction with on-site monitoring to oversee clinical trials.8,9 Despite its unique potential for improving the quality of clinical trials, CM can appear so technical that sponsors often elect to renounce its use in favor of costly and less efficient traditional monitoring methods.10
In reality, only a few concepts that are relatively easy to master-and which most life sciences professionals are already familiar with-are required to properly implement CM.11 In fact, to plan a CM strategy, one should be familiar with the concept of risk management, which involves identifying risks, estimating their potential impact, and devising efficacious mitigation strategies. Then, to perform CM, one needs to understand how simple statistics related to the means and the standard deviations can be used to detect outliers. Additional CM skills include the ability to detect scientific misconduct using the chi-squared distribution, which is closely related to the normal distribution.
The objective of this article is to show that performing CM is relatively easy and accessible to any research professional inspired by the objective of overseeing trials with optimal efficiency while simultaneously saving on-site monitoring resources. The central monitoring techniques presented in this article can be implemented using readily available tools such as Microsoft Excel.
Risk assessment and management
Because CM is a tool within a risk management process, central monitors must first understand how to identify and mitigate risks. A risk assessment, which is an integral part of a risk management process, allows one to identify a protocol’s inherent scientific and operational risk factors, rate their respective potential impacts, and either eliminate them or develop risk mitigation strategies to control them efficiently. In the context of clinical trials, the risk assessment should focus on risks relevant to a subject’s safety, the trial integrity, and the data quality. A proper risk assessment is especially important as regulators require that sponsors document the rationale for a chosen central monitoring strategy.12
Key risk indicators
Key risk indicator (KRI) metrics are risk-factor correlates that can be calculated from the data available, and they are identified during the risk assessment process. While KRIs provide quantitative information, they offer a view that may lack context. As such, qualitative information obtained from communication with on-site monitors and study coordinators represent key risk information that should be used in conjunction with KRIs for the proper analysis of risks and the choice of mitigation actions. The purpose of central monitoring is not only to measure and reduce risks but also to provide perspective to the processes under review so that the most effective control strategy can be adopted.
Each KRI metric has associated values corresponding to limits, also known as tolerance thresholds, which are determined during the risk assessment process. When site-specific metrics fall beyond set limits, the root cause should be analyzed by central monitors and mitigation actions, devised during the risk assessment process, implemented as necessary. Figure 1 illustrates how a typical site-specific KRI metric (ex. error rate) may differ from the rest of the population and fall beyond set limits, thereby triggering local specific mitigation actions. Figure 1 also illustrates how limits may be changed according to the observed values as a study progresses.
Using a risk matrix requires judgment for the probability of occurrence, the potential impact and the detectability of each risk factor, in order to generate a score that permits ranking the KRIs according to their importance. This method of assessing risks is used in the risk assessment categorization tool (RACT) published by TransCelerate BioPharma Inc.13 It should be noted that the relative importance of each KRI does not influence the level of oversight on them but rather serves as a scale for the intensity of the mitigation actions put in place. For example, addressing a KRI of low importance that falls outside its normal limit may require nothing more than emails and phone calls to the site, whereas addressing a KRI of high importance that falls outside its critical limit may require more aggressive and resource-intensive approaches such as the dispatching of on-site monitors or the initiation of corrective and preventative action (CAPA) processes.
The relative risk importance changes as the study progresses. For example, the enrollment rate at the beginning of a study is an important indicator of trial viability, but after the enrollment is closed, it becomes only an indicator of high enrollers, which does not directly impact trial integrity. In comparison, a high query rate at the beginning of a study might be addressed by retraining research coordinators without significant consequences. But at the end of the study, it may directly impact study quality and the time to database lock. Accordingly, risk assessment should evaluate a study at different phases and the focus of risk management should change with time.
Table 1 includes the most common clinical trial KRIs and the typical output of a phase-relative risk assessment. Note that additional protocol-specific KRIs identified through a risk assessment process should be considered in different trials.
Central monitoring reports
Communication between different stakeholders is instrumental to the traceability of the CM process. The periodic central monitoring report should include the site-specific risk factors that are outside tolerance thresholds at the time of review, the specific metrics values, and their variations since the last review. To achieve a traceable central monitoring process, a concise analysis of the reasons why the outlying values are observed and the mitigation actions which are implemented in response should also be included in periodic reports. CM reports specifically serve to indicate how the situations progress in response to mitigation actions. Reporting frequency may vary from weekly to monthly depending on the data acquisition rate, which typically is higher during the study start-up period and lower during the close-out period.
Centralized monitoring: Statistical background
The FDA defines CM (aka “central statistical monitoring” or “statistical data surveillance”) as “a remote evaluation carried out by sponsor personnel or representatives (e.g., clinical monitors, data management personnel, or statisticians) at a location other than the sites at which the clinical investigation is being conducted.”14 Essentially, CM is used to perform the risk analysis part of a risk management process and involves performing calculations on an ongoing basis to discriminate normal from abnormal data. Today’s technological enablers allow for the calculation of statistics from the accumulating data and it is thus essential for central monitors to be able to interpret the results correctly. The following sections cover the statistical notions that central monitors should be familiar with.
The normal distribution
The normal distribution is the most important concept in statistics and in order to evaluate the normality of calculated metrics, central monitors must understand its parameters-namely the mean and the standard deviation. The mean represents the anchor of normality and the standard deviation represents the stretch from the mean beyond which an observation may be considered relatively abnormal. Accordingly, the standard deviation and its multiples can be used to set tolerance thresholds, also known as outlier limits, beyond which an observation may be considered abnormal. It must be noted that judging what is normal and what is abnormal is a subjective endeavor, but the mean and the standard deviation remain the best parameters on which to base one's judgment. As described in the next sections, the mean and the standard deviation can be used to standardize observations as z-scores and the associated probabilities (p-values) of observing such z-scores.
Central monitors should take a moment to review Figure 2 and realize how the mean and the standard deviation relate to the z-scores and their p-values. The following section aims to clarify the implications of the term “normal,” statistically speaking, in the context of multi-centered clinical research.
The z-score and p-value
The z-score corresponds to the number of standard deviations an observation is from the population mean. It is calculated as follows: Z = (χ – μ) / (σ / √n) where χ is the value for which the z-score is calculated, μ is the population’s mean, and σ is the population’s standard deviation, while n represents the sample size that corresponds to the number of observations made to compute the sample mean (e.g., a subject’s mean blood pressure). When calculating a Z-score for a single observation (e.g., a site-specific KRI), n=1 and, therefore, Z = (χ – μ) / σ.
Once standardized as z-scores, the observations can easily be compared to the mean and to each other. Namely, Z-scores have associated p-values that can be used to judge the normality of the observed values. The z-score p-value is the probability of observing a value equal or more extreme than the value actually observed (χ), by chance alone and assuming that the population from which the value was obtained is normally distributed. The smaller the p-value, the less likely a person would be to observe a value “as extreme” as the one observed.
The z-score and p-value are very useful in the context of identifying outliers in clinical research datasets. Statistical software, including Excel, can be used to obtain p-values that correspond to the probability that an observation would be smaller (right-tailed p-value), or larger (left-tailed p-value) than an observed value. Figure 3 illustrates common scenarios.
The cumulative probability
The cumulative probability corresponds to the left-tailed probability of observing a value as small as, or smaller than, the observed value. As such, the smaller the cumulative probability, the further to the left of the population an observation is. On the other hand, the larger the cumulative probability is, the further to the right of the population an observation is. For example, if a site-specific KRI metric value corresponds exactly to the population KRI mean value, its associated cumulative probability will be 50%. If another site-specific metric has a cumulative probability of 100%, it means that approximately 100% of the other sites’ metrics values fall below the KRI value of that site and as such, it is safe to call it an outlier. In the context of central monitoring, the cumulative probability is particularly useful because outliers are often observations that are greater than the rest of the population.
CM should not be limited to the monitoring of KRIs. Other methods, such as the ones presented ahead, should be used to detect scientific misconduct. Deliberate data fraud is rare but can have significant impact on trial integrity. One straightforward way to fabricate data is to take existing data and copy them within or across study subjects. Such a data propagation method results in certain values occurring more often than others; a simple way to detect this type of data fabrication is to calculate the frequency of each observation. As such, frequency analysis can effectively detect if vital signs taken from only one subject were copied into two subjects' charts or if a single blood sample was split into two before being sent to the laboratory. In most cases, however, fraudsters are unlikely to be so careless as to copy data without modifying some of the values. Fortunately, there are other ways to detect fraud, which are harder to evade given the predictability of human nature.15,16
The chi-square distribution
The chi-square distribution is graphically different from the normal distribution, but it can be used in the same manner to assess the normality of values that follow its distribution pattern. Importantly, the sum of squares, which is defined as the squared differences between observed values and expected values, follows a chi-square distribution.17 The chi-square statistic (χ²) can thus be used to evaluate the difference between what is observed and what is expected as being normal. It is calculated as follows:
Like the z-score, the χ² statistic has associated p-values. Figure 4 shows how the χ² statistic (right-tailed p-values) corresponds to the probability of observing a difference between the observed and the expected values that would be as large or larger than the actually observed difference. The larger the χ² p-value, the closer the observed values lie from the expected ones. On the other hand, the smaller the χ² p-value, the farther the observed values will be from the expected ones. The degrees of freedom (DF) of the chi-square distribution correspond to the number of observations taken into account in the calculation of the χ² value. Figure 5 shows how the shape of the χ² distribution changes according to the degrees of freedom. Note that the degrees of freedom also correspond to the mean chi-square value for the different χ² distributions.
Because expected values are based upon calculated averages and real data never lie too close or too far from the expected values, the χ² p-value may indicate fraud if it is either too small or too large. Indeed, counterfeiters are bad at mimicking the randomness of nature18-22 and when one observes data that are either too far or too close to expectations, it is reasonable to suspect fabrication. As described ahead, terminal digit analysis and inlier analysis are two types of analyses that use the χ² statistic to evaluate if data lie too far or too close to the expected values, respectively.
Too far from expectation: Terminal digit analysis
If we consider measurements that have two or more digits, we are expecting the frequencies at which the digits from 0s to 9s appear in the rightmost position to be approximately equal. To perform terminal digit analysis using the chi-square “goodness of fit” test with the χ² formula indicated at left, the observed values correspond to the number of times each digit, from 0 to 9, appears in the dataset of interest while the expected values for each digit corresponds to the number of measurements taken into account divided by 10, since each digit is expected to appear with equal frequency. The degrees of freedom associated with the terminal digit analysis is 9, considering the sample size of 10 possible categories (digits 0 to 9) minus 1. The calculated χ² p-value indicates the “goodness of fit” between the observed last digit distribution and a perfectly uniform last digit distribution.
Figure 6 illustrates how the χ² p-value relates to the last digit distribution. Note that if certain digits appear in the terminal position more often than others, it might be caused by measuring instruments or by normal practice that requires rounding.23 In other cases, it might be because of fraud, as humans tend to favor certain digits when fabricating numbers.24,25 The latter instance constitutes reprehensible behavior that has significant impact on the trustworthiness of all data provided by the site of interest.
Too close to expectations: Inlier analysis
Real subject-specific data are expected to vary to a certain extent from one physician’s office visit to the next. An inlier analysis can be used to detect whether this is the case or not. A single fabricated variable-let’s say heart rate-may remain plausible on its own, but if considered with other fabricated variables such as respiration rate, systolic BP, diastolic BP, and temperature, the combined data are likely to exhibit an abnormal multivariate pattern that can be detected statistically.26,27 Inlier analysis specifically evaluates how close to their respective means a set of multivariate observations lies and suggests fabrication if those observations, taken together, lie abnormally close to their respective mean. Specifically, if a subject’s data have been chosen to mimic real data, its measures will consistently lie close to an anchor value,28 such as the population’s mean, and the sum of the differences between its observed measures and the population means for those measures will be smaller than the sum of differences calculated for the rest of the population. The sum of squared z-scores follows a chi-square distribution and it can be used, in place of the χ² statistic, to obtain p-values corresponding to the probability of observing a given sum of squared z-scores.29 The following steps describe how to perform a multivariate inlier analysis:
• Step 1. Choose the variables to evaluate. In the context of clinical research, variables that can be easily fabricated include physical examination and vital signs data.
• Step 2. Calculate subject-specific z-scores for each variable using subject-specific mean scores (χ), population`s means (μ) and population standard deviation (σ) using Z = (χ – μ) / (σ / √n) where n represents the number of subject-specific samples used to calculate subject-specific means χ.
• Step 3. Square those z-scores and add them up to have subject-specific summed Z² values. Subject-specific summed Z² values should follow a χ² distribution with a degree of freedom corresponding to the number of variables considered in the calculation of the subject-specific summed Z².30-33 An inlier can be identified as a subject with an unusually small summed Z² value and an associated p-value can be obtained using the subject-specific summed Z² value as a χ² value.
To visualize the inlier analysis, one can graph all subject-specific summed Z² values as cluster points along with anchor points corresponding to the number of variables taken into account for the calculation of the subject-specific summed Z² values (the degrees of freedom). As stated, the degrees of freedom represent the mean summed Z² value and this value corresponds to the normal distance from their respective means. Inliers will be apparent on such graphs as points that lie unusually far to the left from the mean summed Z² value, relative to other subject-specific summed Z² values. For better visualization, one can transform all summed Z² values using natural log function.
In Figure 7, the subject-specific natural log of summed squared z-scores were calculated using five variables, including heart rate, respiration rate, systolic BP, diastolic BP, and temperature. The normal distance from the multivariate mean is indicated by the red dotted line that corresponds to the natural log of the degree of freedom. The graph indicates that two subjects at site 17 and 18 have measures consistently close to their respective means. With such a scenario, central monitors have good reasons to inquire further with the site staff as to why those subjects’ measures are so close to their respective means.
Limitations of CM
It is important to consider that there is no single universally applicable or generic outlier detection approach34-36 and a direct confirmation of discrepancy or proof of fraud is seldom obtained from statistical evidence alone. Abnormal data analysis only serves as support for further investigation. In addition, because statistical power is dependent on sample size, it is important to consider that a large amount of false positive signals may be observed when the data sampled is small, such as is the case at the beginning of trials or when trials are of small size.37 Thus, site-specific metrics should always be interpreted with the consideration of sample size. One may elect to wait for a sufficient amount of data to be accumulated at a site before initiating analysis for that site. Considering these limitations, CM should not rely only on statistical algorithms. Simple analysis such as checking if examination dates correspond to weekend or holiday dates can serve the purpose of flagging suspicious sites.38,39 Also, the fact that a given site does not manage to generate enough data to be considered for analysis may constitute a signal in itself.
The core objective of CM is to support a risk management process that aims to ensure subjects’ safety, trial integrity, and data quality in the most efficient way. CM can undoubtedly play an important role in increasing the quality of clinical trials as it allows sponsors to intelligently decrease the amount of costly on-site monitoring. This is very important since the overall cost of monitoring can represent up to one-fourth of trial costs40-42 and the efficiency of monitoring efforts, including onsite and central monitoring, has a direct impact on the cost of clinical trials and the price of treatments that ensue. Abnormal data patterns can be readily detected even by simple statistical methods, and the skills required to perform CM do not necessitate extensive training nor the most advanced technology. In fact, most clinical research professionals have already been exposed to the statistical notions covered in this article and can carry on the task of CM with readily available tools such as Excel. Appendix 1 contains a list of functions and formulas which can be used to perform the types of analyses discussed in this article using MS Excel.
Adam Beauregard is Clinical Data Manager at EndoCeutics Inc. and Consultant at XLSMetrics Inc.; Vadim Tantsyura is Senior Director of Data Management at Target Health Inc. and adjunct faculty at New York Medical School; and Fernand Labrie is Founder and CEO at EndoCeutics Inc.