Measuring clinically relevant response requires routine and regular collection of outcomes data.
Life is defined by our achievements and failures. Making goals and striving to reach them are how we categorize our day-to-day lives. This is true in our careers, our finances, our relationships, and also our health. Particularly for those living with a chronic condition, health goals must be agreed, set, and rigorously maintained. In some cases, the goals are defined by clinical parameters; for example, in diabetes, where control of blood glucose to within-recommended targets is the goal for most people. Achieving this goal helps to reduce the likelihood of micro- and macrovascular complications. In other cases, the goals are defined by our behavior; a month of no alcohol consumption among people taking blood-thinning therapy, for example. In line with the principles of behavioral and developmental psychology, it is important to reward successful outcomes where goals have been met. These rewards can be intrinsic or external. However, if we treat the outcomes in a simple binary fashion, where they are met or not, we miss important and rich information on those goals.
In order to better understand goal attainment, data is commonly collected using continuous interval data, so as a researcher can measure the achievements of an individual in terms of their status change (and the magnitude thereof), rather than simply whether a pre-defined goal was achieved. This measurement strategy is the cornerstone of in-vivo clinical research for the development of pharmacotherapy, and is the basis on which pharmacotherapeutic medications are developed, licenced, regulated, and made available within the healthcare system. Specifically, pivotal trials aim to answer two questions: (1) “did the experimental therapy achieve clinical benefit in patients who enrolled in the study”, and if so; (2) “is the benefit greater with the experimental therapy than with standard of care?”
The answer to this question can, at its simplest, be achieved by the analysis of data at two timepoints; baseline (prior to initiation of therapy) and one additional timepoint, some days, weeks or months later, when the pharmacokinetics and
pharmacodynamics of the drug indicate it should have reached peak efficacy. In reality, longer-term trials are usually carried out to ensure maintenance of an effect over time, with data collection visits occurring at regular intervals (normally one to six months apart). In Figure 1 at right, a graph shows the effect of a hypothetical experimental medication (medication A) compared to two active comparator medications (medications B and C) on physical functioning in rheumatoid arthritis. Assessment of physical functioning is made using a well-established patient-reported outcome (PRO) questionnaire, the SF-36.
Items on this questionnaire are measured and scored using interval data, and lower scores are indicative of less impairment. In this trial, all three medications are efficacious, in that they have improved physical functioning over the three months of the study. They have done so to a “similar” (non-inferior) degree. In response to the two questions above, one must conclude based on the data presented in Figure 1, that (1) yes, there was a significant improvement in physical functioning from baseline to trial endpoint (3 months) with the experimental medication, but (2) no, the benefit was not greater than with standard of care (using traditional inferential statistical analysis).
However, the apparent similarities in the impact of these medications on physical functioning may be a function of the simplicity of the study design; more specifically,
the infrequent measurement of a concept which may change frequently may be masking important differences between the medications. Imagine that, in fact, the medications behaved as per Figure 2 at left, captured in the same trial using twice-weekly diary data rather than just once at the end of the trial. This shows clear differences between the medications. Although our answer to the first question (“did the experimental therapy achieve clinical benefit in patients who enrolled in the study?”) remains yes, the answer to the second question (“is the benefit greater than with standard of care?”) is now not so clear. Utilizing area-under-the-curve analyses (in addition to change-from-baseline analyses) would suggest medication A differs significantly to medication B, but not to medication C. However, the data now provides a basis for hypothesizing that these three drugs are fundamentally different. The differences are in the patient experience.
It is too simplistic to suggest that more data points are more informative. Rather, the frequency of data capture must relate to the dependent variable (outcome) of interest, and be captured in a manner that reduces measurement error. For example, daily collection of “quality-of-life” data may decrease the variability in the measurement owing to participant burden and generalizations by participants when completing the scales. This would reduce the sensitivity of the assessment to identify a meaningful change from baseline within and between cohorts.
On the other hand, researchers have shown reduced error variance by collecting episodic data in trials of overactive bladder, largely as a function of reduced recall bias and heterogeneity within participants over the course of a trial. The timing of assessment is also key; while random momentary assessment may be appropriate and beneficial for capturing “in-the-moment” fatigue in multiple sclerosis (which is known to follow a diurnal pattern), it is appropriate to standardize collection of pain and stiffness data once during a 24-hour period among rheumatoid arthritis patients.
Data collected on an appropriate assessment schedule gives power not only to answer the two questions regularly posed in pivotal trials, but also to explore individual responders to therapy. Figure 3 below provides population-level data from a trial of a hypothetical medication for insomnia.
Although there is no comparator in this trial, we are able to conclude that there was a significant improvement in sleep time from baseline to trial endpoint (36 weeks) with
the medication. Figure 4 at left shows data from four participants in the trial with very different experiences. The frequency of data capture allows us to define these as responders (for whom the medication worked) and non-responders. Participant 1, 2, and 4 could all be classified as responders, but for different reasons:
The temptation in clinical research is to collect some data on a more frequent basis than others. A good example of this is treatment satisfaction, an endpoint utilized in clinical trials as a partial proxy for adherence (anticipated medication-taking behavior outside of the trial environment), but collected less frequently (in general) than clinical data. Satisfaction is a latent variable likely to be a function of expectations and experience of efficacy (positive clinical benefit) and adverse effects of medication (negative side-effects). Some combination of these will contribute to a person’s stated satisfaction with
treatment, which can change day-to-day, in line with the perceived benefits and detriments of treatment on that day. Regular collection of all three types of data (efficacy, treatment satisfaction, side-effects) allows the researchers to overlay the observed data and understand the main contributing factors to whether high levels of satisfaction are achieved within-and between-patients.
As the medical community moves toward individualized and tailored therapeutics, and pharmaceutical companies are increasingly opting into risk-sharing agreements with payers, it is important to understand the patient journey with a medication, as well as the goal achievement of that medication. Collecting this data in pharmacotherapy development programs allows an understanding of the true impact of the medication in that context. This allows for more targeted prescribing, but also for appropriate expectation setting by treating physicians in clinical practice; if the patient journey does not match expectations, then adherence is likely to be poor. In addition to the use of frequent data collection to understand and prescribe medication optimally, there may be other hypotheses that can be generated from the data which can prompt future research for all stakeholders. For example:
Answering these questions, and other clinically relevant questions, requires routine and regular collection of outcome data according to a schedule which allows for maximum understanding of the pharmacotherapy under investigation. In an era of “real-world evidence”, this data can support healthcare tools to enhance medication-taking behavior and clinical communication at appropriate and relevant timepoints, potentially increasing both the quality of life of patients and quality of care they receive.
Matthew Reaney, M.Sc., was Sr. Scientific Consultant, ERT, at time of research; Stephen Raymond, PhD, is Chief Scientific & Quality Officer, ERT