AI and Machine Learning in Diagnosing Clinical Trial Patients

Published on: 

Applied Clinical Trials

John Rigg, senior principal, Predictive Analytics, Real World Solutions at IQVIA, discusses how patient disease modeling and diagnostic prediction is made possible with artificial intelligence and machine learning.

We know that early detection of disease is critical and that delays can result in adverse outcomes. The use of routinely collected historical data, such as medical and prescription claims, can reveal patterns that can be used to estimate the risk of disease among potentially undiagnosed patients. In this article, John Rigg, Senior Principal, Predictive Analytics, Real World Solutions at IQVIA, discusses how patient disease modeling and diagnostic prediction is made possible with artificial intelligence and machine learning.

Moe Alsumidaie: How prevalent is the under-diagnosis of patients, and why is it so challenging to achieve an accurate diagnosis?

John Rigg: The truth is that we don't really understand the full extent of under-diagnosis for many possible diseases. We have lots of estimates for different diseases, but often this is a lot of guesswork. There are many diseases where we think more than 80% of people are undiagnosed. This is a massive problem for both common as well as rare diseases. For rare diseases, we know that it takes, on average, almost five years between the first presentation of a symptom and a correct diagnosis. That is a very long time and a huge problem. The problem is pervasive; it’s expensive, and, in many ways, it’s unquantifiable and unquantified, but we know it's a big problem.

MA: How does proper diagnosis assist with research? How can identifying the right diagnosis assist with a clinical trial’s inclusion and exclusion criteria, properly qualifying the subjects, and pre-qualifying them and identifying them for recruitment?

JR: Let's begin with rare diseases. There are many undiagnosed patients with rare diseases. We need to reach out to these undiagnosed populations to make sure that we have enough people in trials. How do you find these people? Typically, we tend to use fairly broad data assets with extensive population coverages. We build algorithms. We try to identify where these patients are and where they might be undergoing treatment. This is important in terms of trial site identification and optimization.

We also want to consider all the people who are seen within a practice, can we identify those specific types of patients who are likely to be undiagnosed? Using screening algorithms and screening tools helps to recruit patients more quickly and ensures that the patients who are assessed for the trial are more likely to be eligible. All of these efforts are aimed at trying to improve recruitment rates and targeting.

MA: How are AI and ML helping to lead to earlier diagnosis of these diseases?

JR: One of the significant diagnosis challenges is that it's often challenging to understand. What are the attributes that might be indicative of a patient who is undiagnosed? In some instances, it's not rocket science. There are specific symptoms and, if you have these, it's easy to recognize that this might be someone with a particular condition, and then an evaluation can be done to confirm. But often we just don't know what those pre-diagnosed risk factors and attributes look like, and typically what happens is people complain about specific symptoms and they get bounced around the health system from pillar to post. They see multiple specialists undergo multiple tests, and it's a real puzzle for any one individual, either primary care or specialist physician, to figure out. The way that AI/ML can help is by pulling together these often disparate, heterogeneous pieces of information on symptoms, procedures, and lab values, to try and determine whether this patient may be at risk of a particular condition. AI/ML supplements clinical decision-making by pulling together these different threads and patterns of activity to try to assign specific patterns and probabilities of particular diseases.

MA: When it comes to adoption across study sites, if you’ve just launched a clinical trial and you've selected a hospital and a site, how can you deploy something like this to assist with patient identification in clinical trials?

JR: Currently, the most common way to identify patients is through patient-level data derived from insurance claims and prescriptions, which are integrated. These do not contain any patient identifying information, so patients remain anonymous. This information is used to identify potentially undiagnosed patients at sites, and then the sites are targeted. Now, what you're talking about is going one step further. By deploying some algorithms in different ways-algorithms that are integrated into electronic health records-alerts can be generated if a patient presents with a particular likelihood of a condition or to flag patients who may be at risk.

There are examples where this has been done by different companies, but it's not easy. I think it's probably fair to say that the industry is still learning the best way to scale these models. There are many challenges around understanding how this fits in with clinical workflow. The use of identifiable patient records in huge populations - there are challenges around governance structures and interoperability. One data asset we use has over 200 million active anonymous patients.


The use of those for site ID and to quantify a potential number of patients is now reasonably well established. There are a lot of efforts to deploy these algorithms so we can identify specific patients. There are some examples of good practice, but it's an emerging capability. This is innovation, and I think it’s undoubtedly the way forward and it's exciting. However, we haven’t yet witnessed many examples of large-scale implementation, and this is not just around clinical trials. This is one application and a one-use case, but the same underlying infrastructure, technology, challenges, and so on are also helpful to identify, to support early diagnosis of patients for routine clinical practice-nothing to do with identifying for clinical trials. For sure, there are challenges, and the industry as a whole-not just the pharma sector and the vendors supplying that, but the technology companies around this and the EMR vendors and so on-is working hard to solve these challenges.

MA: Let's say I'm a prospective subject for a clinical trial, and I'm part of a health system. How would my journey change and what differences will the patient go through, now that we have technology like this? Let's assume it's 2025, and this technology is fully deployed at a study site. How would my journey change as a subject?

JR: It's all about bringing forward the point of diagnosis. On average, it could take five years for someone like you to be diagnosed; it could take a lot longer. It's all about shortening that journey. To answer that question, you'd think about, well, what is the typical journey for an undiagnosed patient, and generally, it's increasing symptomatology. It's increasing visits to different specialists, types of tests and so on, as the disease progresses before finding the correct diagnosis. The whole point of this is to try and shorten the time a patient is diagnosed by a few months or sometimes a few years so that, as a result, they become less ill because the severity of the disease is less at the point of diagnosis. It all depends on whether there is an existing treatment. It might not be relevant for clinical trials but, of course, if you get someone treated earlier, there's a lot of evidence for many diseases that early identification can lead to much more effective treatment. It can increase the quality of life and extend life expectancy.

In clinical trials, it’s relevant in terms of identifying a larger pool of patients with the disease who are potential recruits for trials. Often, depending on the particular inclusion/exclusion criteria of a trial, ideally, people need to be captured relatively early on in their journey. Currently, patients are diagnosed far, far too late for that to be meaningful. So, it's identifying people early in a timelier fashion, either as potential recipients of intervention today or getting them into a trial at a more appropriate point in their disease trajectory.

MA: What future applications do you see for using AI/ML in clinical research?

JR: It’s about deploying these algorithms we’ve talked about. We already see isolated pockets of best practices here. If we fast forward to a world in which we have a whole slew of different AI algorithms for different types of diseases, which are used for screening diagnostic purposes on a routine basis across substantial populations, I think that will be a fascinating world, because we will be preventing an awful lot of harm and we will save health systems a lot of money. It's entirely possible.

Beyond screening diagnostics to support early identification of patients with undiagnosed conditions, there are many other examples where these methods and technologies are relevant-for example, risk stratification. Once someone has been diagnosed with a condition, it's essential to understand who is more likely to experience rapid disease progression. If we can do that, then we can think about targeting more aggressive or more effective treatments for these patients earlier than we would do otherwise. The whole point is we're trying to mitigate the disease process, and early identification of people at risk of high disease progression is critically essential. People often talk about precision medicine as well, which is all about getting the right treatment to the right patient at the right time. I think AI has a role here too. But I think there are several other factors that need to come into play before we see large-scale deployments of these types of technologies and techniques.


There is also the possibility of clinical research as a care option. Imagine a patient in 2025 that enters a clinic, and the physicians are trying to diagnose a hitherto unspecified condition. The physicians are likely to run a series of tests on the patient. Meanwhile, the algorithms suggest the condition may be a rare disease for which successful treatment options do not yet exist but there are clinical trials. The physician would have the option to tap into the clinical research infrastructure to further screen and treat the patient.

So, this has accomplished several things. It has helped the physicians synthesize the massive amount of data at their disposal, reducing the patient’s cost for screening and diagnostics since the protocols often allow for sponsors to fund the screening. It has also provided the patient with access to cutting edge therapies not yet on the market for a disease with few treatment options, and it has helped the sponsor identify hard-to-find patient populations for much-needed research.


Moe Alsumidaie, MBA, MSF, is a thought leader and expert in the application of business analytics toward clinical trials, and Editorial Advisory Board member for and regular contributor to Applied Clinical Trials.