Research, Practice, and Learning


Applied Clinical Trials

Applied Clinical TrialsApplied Clinical Trials-10-01-2012
Volume 21
Issue 10

What will it take for healthcare data to regularly inform research?

The Learning Health System (LHS) has been a topic of much discussion among senior researchers and thought leaders for many years in the United States. Introduced as early as 2004 as part of the US Health IT Strategic Framework and explored in a series of forums sponsored by the Institute of Medicine (IOM) with ongoing support from the US Office of the National Coordinator (ONC), it has defined a set of core principles and objectives to capitalize on ONC's primary objective to implement and use advanced health information technology and electronic exchange of health information in the United States. The IOM defines the LHS as a system "in which progress in science, informatics, and care culture align to generate new knowledge as an ongoing, natural by-product of the care experience, and seamlessly refine and deliver best practices for continuous improvement in health and healthcare."

Wayne R. Kubick

One of the first goals of the proposed LHS is to define a core data set of common data elements recorded for all patients that can be extracted from any healthcare system—and also made available for research. This will be used to populate a federated data repository that promotes data sharing in actionable forms to help physicians and patients make informed healthcare decisions. The 10 stated core principles address matters such as privacy, transparency, accessibility, and scientific integrity to ensure the LHS becomes a viable and useful resource for generations to come.

This view of collecting and reusing health information for multiple purposes for the benefit of all may seem utopian, but in the context of today's available advanced technological capabilities also seems long overdue. For example, consider the situation of a patient being newly diagnosed with a serious disease that can be treated under many different options. Today such patients can surf the web, which is cluttered with inaccuracies and information colored by bias with medical practitioners and vendors asserting the superiority of their particular methods or specialties and bloggers with a particular ax to grind. Think of the possibilities if there existed a gold standard learning health database where a patient could look at accurate, current information about others who had similar profiles and conditions, and learn what therapies they chose and how they fared afterwards, or how such results have changed over time. Or think of the patients who choose emerging new therapies, and the possibility of comparing their disease progression to others under traditional therapies. In this way, we can all learn about new treatments as we go along so that we're all contributing to the overall improvement of health as we navigate through our complex and convoluted healthcare system as individuals.

Of course the LHS database has to be quite intelligent—there can be overlap and considerable variations in data quality between patient-entered databases such as PatientsLikeMe or an LHS database already being accumulated by the Joseph H. Kanter Family Foundation and the squeaky clean data collected in clinical trials. So it's necessary to understand where data comes from, how data are meant to be used, and when such data can be logically pooled together. And the ultimate goal of the LHS would be to integrate this type of data collection into everyday healthcare practice, so new data are continuously made available for research as an automatic by-product of everyday healthcare encounters.

Now when we speak of research in this context, we're not just talking about a web search by an individual patient, but also about epidemiological studies, safety data mining and analysis, comparative effectiveness research and, yes, secondary analysis of data from randomized clinical trials. Clinical trials information has traditionally been siloed for single use—we're only beginning to see the potential knowledge we can gain from sharing and reusing such information in projects such as the Critical Path Institute's Coalition Against Major Disease, and other data sharing projects such as the DataSphere project sponsored by the CEO Life Sciences Consortium, which is looking to collect and share clinical trial data for oncology therapies. But we've already learned enough from such projects to recognize how important it is to capitalize further on clinical research data—as long as we can provide the necessary protocol context and represent the information in a consistent way that allows us to aggregate and explore the data in a scientifically valid manner.

Since every healthcare system always collects certain data about patients, why can't we use that same data for secondary research? And when such data are collected primarily for a clinical trial, why shouldn't that data be available secondarily within the healthcare record as part of the longitudinal patient history?

Recently a research team explored a tiny corner of this challenge as part of an ONC project under the Strategic Health IT Advanced Research Projects Area 4 (SHARPn) program, which will be discussed in detail in a forthcoming paper to be presented in October at the Association for Computing Machinery Managing Interoperability and Complexity in Health Systems. The researchers from Mayo Clinic, Intermountain Healthcare, and CDISC examined how much overlap there was in an examination of sample data elements between a set of healthcare metadata (using Intermountain's Clinical Element Models (CEMs) and the Biomedical Research Integrated Domain Group (BRIDG) model (which in turn is tied to standard CRFs and tabulation dataset models used for regulatory review and analysis). While it's impossible to repeat all of their findings, here's a brief glimpse at what the researchers found:

  • For general demographics data, four of 13 (30.8%) CDISC research core data elements corresponded with the CEMs

  • For laboratory data, 20 out of 53 elements matched

  • For medications data, 15 out of 61 were consistent.

While only a minority of data elements could be matched precisely during this exercise, there were several others that were found to have "common potential"—elements used in research that might also have value in healthcare. What was not explored directly is the converse—are there elements already typically collected in healthcare that might serve research instead of using separate research-only concepts? In other words, instead of asking "what elements do we need for this study?" each time we start a new trial, what if protocol developers looked at what elements are already typically present in electronic health record (EHR) systems that can be used, and ask "what else do we need for research purposes that could be added to that set?"

Now, to accomplish that goal, sponsors would have to re-evaluate the core data elements they're already using in many cases. And we all know how difficult it is to change our ways.

But fortunately, there is an opportunity for a partial do-over. As part of the requirements of the Prescription Drug and User Fee Act recently reauthorized by Congress, the FDA is committed to developing data standards for therapeutic areas. Already, a partnership of FDA and research professionals from a consortium of 10 of the top pharmaceutical companies along with CDISC and the Critical Path Institute are undertaking a project to identify the core data elements that are used for clinical trials in multiple therapeutic areas, how they fit together, and how to represent them for data collection, review and analysis, and sensible data pooling or sharing. This affords an opportunity to see how we can leverage work previously done—including detailed clinical data models already in use in the healthcare realm that address our needs.

A small core dataset shared among healthcare and research is not going to enable all of this. We need much more than just a few basic data points in three domains. Perhaps we can also easily add in vital signs—which are already modeled quite consistently in most EHR systems, as well as problems and diagnoses to amend the core dataset so it would be suitable for exploring drug safety goals of projects like FDA Sentinel, OMOP, and IMI PROTECT. We need to understand genetic, environmental, lifestyle, and social characteristics of each patient and represent such information in a semantically consistent manner. We need advanced big data technology to control, retrieve, and explore the data. And we need a purview that extends beyond the US healthcare realm—a global reach that eventually extends to all of ­humankind.

And, over time, we can continue to delve deeper into the data treasures we currently overlook, by systematically addressing the need to agree on a more structured way to represent other common, useful information that is currently collected somewhat haphazardly—such as medical history and smoking history and use.

But you have to start somewhere. Just having labs and meds recorded in a standardized way that can be used consistently for multiple purposes can already make a difference. Getting just this small subset of data right can open the door to further semantic interoperability. It means that a small part of the data relevant to every clinical trial can be more easily collected, with less room for error, and less transcription. It makes monitoring faster and remote monitoring easier. And it makes it possible for even a small collection of data in healthcare and research to be potentially combined for different types of secondary uses. Starting small is not a problem—what's important is to start at all. Momentum can come later—along with greater knowledge and improved patient health.

Wayne R. Kubick is Chief Technology Officer for the Clinical Data Interchange Standards Consortium (CDISC). He resides near Chicago, IL, and can be reached at

Related Videos
Related Content
© 2024 MJH Life Sciences

All rights reserved.