The Next Generation of Clinical Trial Performance Measurement

Applied Clinical Trials

Organizations involved in clinical trials have embraced the idea that they should collect performance measures to assess how well they conduct clinical trials.

Organizations involved in clinical trials have embraced the idea that they should collect performance measures to assess how well they conduct clinical trials. Both sponsors and CROs have invested considerable time and money to develop performance measures, but there is still uncertainty (and some frustration) about what should be assessed and how it should be measured.

Performance measurement is an important issue for clinical trials. Sponsors and vendors are under cost pressures to deliver trials more efficiently. Regulators are increasingly demanding that sponsors demonstrate appropriate governance and oversight over their clinical trials. Clinical trials managers operate in a complex environment where an oversight today can come back to be a major problem months or years from now. Clinical trials providers want to know how they are performing relative to competitors and how they can invest-to-improve.

In this article, I review the current state of clinical trial performance measurement and suggest the next generation of clinical trial performance assessment. My perspective is that of an academic trained in the measurement of performance, service quality, and customer satisfaction who has been conducting clinical trial performance research for the past four years.

A Brief History of Clinical Trials Performance Measures

The various approaches to measuring clinical trial performance can be organized into three distinct approaches, in-house assessments, customer evaluations surveys, and operational and financial metrics. In-house assessments were developed by CRO executives to track customer satisfaction. Every firm used different surveys and the questions tended to change to address the important issues in each trial. These surveys were very useful because they were customized, but frequently could not be used to track performance because of the changing metrics.

Customer evaluation surveys are developed and administered by research organizations like Avoca or ISR.1 They typically measure the customer’s psychological state (e.g. satisfaction) that results from the clinical trial performance. These internal states are important because they drive repurchase decisions and word-of-mouth. I find that executives sometimes (and mistakenly) believe these surveys are ‘qualitative.’ But these ratings of internal states are quantified by the respondent and assigned a number, so they are actually quantitative. A more accurate critique of customer evaluation surveys is that they have low reliability. That is, different subjects often have different responses to an identical service experience. Offsetting this reliability problem, customer evaluation surveys have high validity. That is, these surveys are highly accurate in capturing the customer’s response to a service.

Much of the recent activity in clinical trials performance has emphasized operational metrics or many of the clinical trial management systems (CTMS) systems. This approach to performance measurement captures process indicators such as the number of protocol amendments or deviations (to assess quality), the elapsed time between site activation to first patient entered (to capture cost and efficiency), or the number of days between protocol approval to site activation (to evaluate timeliness). The advantage of operational metrics is that they are highly reliable. If you ask how many days it took to recruit patients in a trial, you would get a fairly consistent response across multiple subjects. The disadvantage of operational metrics is that they have weak validity. Counting the number of protocol deviations does not fully and accurately capture the construct of clinical trial quality.

Figure 1 illustrates the trade-offs involved between customer evaluations and metrics.

Critique of Performance Measurement Practices

Looking across all of the approaches to measuring performance in clinical trials, I believe that the industry is far ahead of other sectors in healthcare. Improvement is needed, however, in order to achieve scientific performance measurement that fundamentally improves clinical research. I see four limitations to current clinical trial performance measurement.

First, the industry does a great job in identifying potential items to measure, but there is little validation of the items. Identifying items is a necessary, but not sufficient step. There must be statistical validation to completely understand what you are measuring. Validation can answer questions like:

  • Are the items significantly related to the construct you are trying to measure?

  • Are the items repetitive or are they contributing additional measurement of the construct?

  • Does an item really belong with this construct or does it belong with another?

Without statistical validation, in other words, it is very difficult to know exactly what your items are measuring.

Secondly, I see an artificial tension between validity and reliability (i.e. customer evaluations vs. operational metrics). There is no need to choose one over the other. Statistical models today can assess the validity and reliability of a measurement model and managers can then select models that balance validity and reliability. For organizations that emphasize financial or operational metrics, these items can easily be included in performance measurement models.

Thirdly, it is common practice in the industry to benchmark on averages. In this approach, organizations assess their performance based on how they compare to a metric’s average score. This can lead to the metrics trap, which occurs when the industry is under- or overperforming (Figure 2). Imagine that we are managing a study team that recruits subjects in 70 days. We get data that describes the average recruitment time of 85 days. Since we performed better than the average, we claim that this was high-performing recruiting. Imagine, however, that high-quality recruiting only happens at 55 days. Now the 70 days to recruit subjects is a low-quality recruiting performance, even though it is above average. In this age of big data and predictive models, it is simply below the standard of practice to make decisions based on simple descriptive statistics. Predictive models that link to performance quality are needed in order to avoid the metrics trap.

The fourth performance measurement limitation is the problem of ‘measurement-in-isolation.’ I see a tendency to assess performance on only a single variable and not account for the other drivers of trial performance. Clinical trial activities do not occur in isolation. A whole string of activities is necessary for a clinical trial to happen. The number of days that it takes to recruit subjects, for example, is dependent on other factors like the project manager, investigator meetings, or the recruiting materials. If these factors are not integrated into the measurement, the metrics will be biased. Since clinical trials include multiple activities, it is only proper that our models must also include all of the performance activities.

Again, I emphasize that I believe that the clinical trials sector does an above-average job in measuring performance. But we now have the tools available for the clinical trials industry to move to the next generation of clinical trial performance measurement.

The Next Generation of Clinical Trial Performance Measurement

You do not need to take a psychometrics course or register for advanced statistics to develop your own performance metrics. The key ingredient with any metrics project is the ability to clearly think about what they want to measure and the discipline to maintain that focus. In this section, I will explain the current scientific processes in enough detail so that the process could be used to manage a metrics team effectively. I will follow the steps illustrated in Figure 3 and use examples of our research on developing clinical trials performance measures.

Step 1: Define what you are measuring. Before you start collecting data, it is important to clearly identify what (exactly) you are measuring. This stage can be surprisingly difficult because disagreement commonly erupts on what you should measure. You will need to exhibit leadership at this point to achieve buy-in on a manageable scope for the project.

In our case, we initially focused on full clinical trial performance but quickly found it was an enormous construct with many complex component parts. In discussing with experienced clinical trials managers, we found that they thought about clinical trials in stages (e.g. study startup, conduct, and closeout) so we were able to focus on each stage individually.

Step 2: Gather items to measure your constructs. Once you have clearly defined what you want to measure, you need to identify the key drivers and collect items (e.g. questions to assess performance or financial/operational metrics) to assess the presence of the drivers.

Don’t rely on your own experience. Get out and interview people with recognized expertise about both the key drivers and how they should be measured. This can get complex, because there a commonly multiple sequential steps involved in performing a complex service like a clinical trial. It often helps to create a process map (i.e. box-and-arrow path model) to organize your thinking. Remember that you also need to collect items to assess the performance quality and satisfaction that result from the service for the validation model.

How many interviews do you need to know whether you have a complete set of potential items? The general rule of thumb is that you have done enough interviews when you are no longer learning anything new from your subjects. For us, it was about three-dozen interviews.

Step 3: Statistical Validation. Once you have collected the measurement items, you are ready for validation. You will need to organize the items and create a measurement instrument using scaled items or fill-in-the-blank response boxes. This instrument can then be loaded on to an online form for data collection.

The statistical analysis usually involves path modeling, e.g. structural equation modeling or PLS path analysis. Again, you don’t need to do the statistics yourself, but you will need to talk to statisticians. It will help to have a basic overview of their procedures.

Here is the basic idea: we are trying to measure abstract concepts (e.g. investigator recruitment, project manager performance, or data management). We use items (e.g. a survey question or operational/financial metric) to assess the degree to which the concept is present. The large circle on the left side of Figure 4, for example, represents a concept. On the right side of the figure, we see that each additional item contributes a bit more to our measurement of the concept. The statistical output can tells us how much measurement each item contributes and whether it is significant. The statistics also give us the reliability of the items and provide evidence of validity. Based on these assessments, we can then make scientific decisions about the efficiency, validity, and reliability of the instrument.

Let me illustrate validation with an example from our research on study closeout. In our initial interviews, the executives described two independent related to data in closeout performance – processing the data and query resolutions. When we validated the data, however, we found that all the data management issues combined into a single driver that included that included both the data processing and query resolution. The new ‘data management’ driver had high reliability (.97) and demonstrated improved validity. Recognizing these validation issues leads to unbiased metrics, greater efficiency in measurement, and opportunities to improve performance.

*

Step 4: Link to Performance Quality. The final step in the next generation of performance measurement is to include all of the validated measures in a predictive model. This is the necessary step that helps you to avoid the metrics trap. In predictive modeling, we focus on the relationships between the key drivers and the quality of the performance. In statistical terms, we assess the structural relationships. For managers, this step is typically the most intuitive and easiest to grasp.

Let me illustrate the measurement-in-isolation problem with an example from our research. In study closeout, we included items on the timeliness of closeout activities like drug reconciliation, drug returns, and the closeout visits. This timeliness variable (b = .54, t= 6.49, p< .001) was positively and significantly associated with closeout performance. If we include the additional closeout drivers (closeout visit activities, query resolution, data management, and collection of documents), then timeliness switched to be negative and significant.2 A CRO who only looked at a single variable could be investing solely in closeout timeliness and be degrading their performance!

You can see why clinical trials managers should be involved in this process to help guide the development of the measurement model. In developing models, it is important to explore multiple models in order to understand how key drivers interact to lead to improved performance quality. The statisticians may be able to understand the output, but it is important that the measures make sense to the subject matter experts in the field.

Discussion

The clinical trials industry has recognized the importance of measuring performance and enthusiastically embraced various measurement approaches. Present-day measurement practices, however, are not grounded on scientific performance measurement techniques. In this article, I have offered a critique and suggest steps to move to the next generation of performance measurement.

These measurement approaches will not create additional work for you. In fact, this approach should be more efficient for both those who manage the performance measurement process and those respondents who must provide the data. Without statistical validation, managers must resolve conflicts about what to measure through long discussions and compromise. They inevitably end up simply adding items so the instrument gets inflated and burdensome. With statistical validation, items can be tested to see if they contribute to the predictive model. If so, they can be included in additional data collection. If they don’t load, there is no point in collecting the additional data. This also leads to much more efficient data collection process for respondents who must provide the data. In our research, it typically takes 3 to 5 minutes to complete an instrument.

This next generation of performance measurement also provides confidence. Clinical trials managers must track innumerable details and worry about what they are missing. Unless they are recognized and managed, these blind spots have the potential to corrupt a clinical trial. If the trial is outsourced to a CRO, then these performance measures can proved a scientific basis for contract oversight. Regulators are increasingly emphasizing the importance of governance in clinical trials. The scientific measurement described in this paper provides a sound basis on which to demonstrate proper governance of a clinical trial.

Finally, CROs are interested in this approach to performance measurement because it provides a scientific basis to guide investments in performance improvement. These predictive models can identify the key drivers that have the greatest impact on the quality of clinical trials performance. CROs can then maximize their ROI on quality.

Footnotes

1 For any of the companies I use as exemplars, it is important to note that they do a variety of research, but emphasize these approaches in clinical trials performance measurement.

2 The technical explanation is that these are partialled regression coefficients. That is, the coefficients are estimated with redundant information removed.