The Semantics of Health Care Interoperability

October 1, 2009
Wayne Kubick

Applied Clinical Trials

Applied Clinical Trials, Applied Clinical Trials-10-01-2009, Volume 0, Issue 0

Is computer system interoperability for clinical data an impending reality or just a dream?

The ongoing din of debate about health care reform and the associated transition to electronic health care records continues to influence clinical research and regulatory strategies. It's hard to dispute the belief that these two worlds should converge—the typical processes in clinical data collection are antiquated and redundant, and unnaturally separated from the process of data collection within health care itself.

Wayne R. Kubick

To some extent, recent demonstrations of health care data connecting with research cast a positive light on the use of data standards in general—since standards are so fundamental to the harmonization of these parallel worlds of health care and clinical research. On the other hand, it's a little premature to bet the ranch on a future utopia at the expense of other, albeit lesser alternatives that may provide benefits that are already available today.

The vision for fully realizing the ultimate potential of health care standards centers on the nirvana-like goal of semantic interoperability (or, more accurately, computer semantic interoperability—the ability for computer systems to unambiguously exchange information and meaning). Wikipedia calls this "the ability of computer systems to communicate information and have that information properly interpreted by the receiving system in the same sense as intended by the transmitting system." Too many existing standards in common use are primarily syntactic in nature—they provide a way to represent the structure and format of data, but do not sufficiently convey context or meaning. Thus, data sent in such a standard may consist of perfectly well-formed sentences that don't make any sense to the person who receives them.

The lack of semantic interoperability has been a criticism sometimes made of CDISC standards. The CDISC Study Data Tabulation Model (SDTM), for example, provides a way to represent the vast majority of commonly collected clinical trial data. But it breaks down around the edges for questions not explicitly described in the SDTM Implementation Guide and especially for data unique to a specific therapeutic area, leaving lots of room for creativity among individual implementers. A recent research project that is seeking to repurpose and pool clinical trials data from a group of different sponsors in the hope of achieving treatment breakthroughs for two catastrophic diseases illustrated the consequences of such creativity. Everyone was using the same CDISC SDTM domain model, but somehow, none of the data quite looked the same or fit together in the same way.

We can help close such gaps with experience and with increased effort on the content standards that would be used to populate the SDTM—what some call the common data elements. CDISC is directly pursuing a project, CSHARE, to build a repository to collect the metadata associated with the thousands of individual questions that can be asked on Case Report Forms—and, hopefully, eventually find a way to harmonize the many different expressions of similar concepts.

And as such information is made available, reviewed, mapped, and put in use, and we begin to adopt the same set of questions with the same terminologies within the same syntax, we can move in the desired direction toward semantic interoperability.

Questions at hand

But how far? Even in the absence of such tools as CSHARE, should we postpone using helpful but imperfect standards like SDTM entirely until the promise of full semantic interoperability can be achieved? Of course not. SDTM has shortcomings, but also offers many tangible benefits. Using SDTM gives companies a common reference point to begin with, and SDTM will be incrementally improved as more terminologies (including those in CSHARE), examples, and improved syntactical representations are made available.

Meanwhile, what about this visionary goal of full semantic interoperability—can it really ever be achieved? Of course it's desirable to capture clinical observations unambiguously—but how realistic is that? The HL7 standards development organization has a robust methodology ("V3") for developing messages that may eventually make it possible to achieve that interoperability,1 but it's not quite clear how close they are to achieving the vision in practice today.

Most V3 messages aren't pervasive in clinical research yet, and in many cases have not even been fully formulated or tested. And in a world where even a long-married couple can easily miscommunicate over the simplest of ideas, how sure can we be that two computers will always interpret complex medical concepts exactly the same? Meanwhile even within HL7 the jury is still out on V3, and there are many internal debates on which approaches work best and how well they can truly interoperate within health care alone.

And in practice, how much of the medical information (or context) detected in patient care is actually included in the written record? Since interoperability depends heavily on accurate and consistent coding, how can we be sure that two different people in two different places are coding the same condition in the same way for different patients?

What about medical establishments, where the purpose of coding is primarily to drive optimal reimbursements rather than capture precise clinical information—can this common practice be truly eliminated in any reimbursable health care system? And that's before we get into the nuances of clinical research itself, an application of the scientific method which by its very nature is inventive and creative about identifying new questions to ask. How can we ever expect to consistently, accessibly, and unambiguously represent every concept we'll ever need?

What we can do instead is pose models and constraints—ranging from the CDISC standards to HL7 and beyond—to guide, without necessarily prescribing the process. To borrow an analogy used by another old friend, you can imagine the metaphor of placing a few gates on a ski slope to shape a general path. Over time you add more gates, making the path more predictable and repeatable. You're not forcing everyone to make exactly the same motions, but by limiting the degree of variation you can more easily get them moving in the same direction.

Promises of the past

Maybe that's not so unreasonable for now. We've traveled down the path of promising yet ultimately unrealized technological visionary solutions before. Remember Natural Language Processing (NLP)? This was supposed to make it possible to query databases using natural English phrases, and allow them to respond in kind (like the matronly computer in Star Trek). There were several promising proof of concept projects during the 1990s, but I haven't found a database who can respond like Mom yet.

And then there's NLP's parent, Artificial Intelligence (AI), originally defined by John McCarthy as "the science and engineering of making intelligent machines." Teasingly predicted by Asimov and graphically depicted in apocalyptic science fiction (notably the Terminator series), AI is still a vigorous area of research that typically shows up in more nuanced applications such as robotics, gaming, and data mining—but we haven't quite found HAL yet—eight years after the predicted year of his demise.

So we can't yet tell how successful the efforts to achieve semantic interoperability will be, though we can be sure there will continue to be progress year after year. Meanwhile, CDISC standards like SDTM have been tested on several occasions, and though the full results are not always transparent (specifically what actually happens within the sponsor company, CRO, and regulatory agency using the data) we have enough evidence to indicate that much of it works even if some of it doesn't, and that it can be realistically implemented by almost anyone with significant benefits. That's a start. Does it get us to semantic interoperability? Not by a long shot, since there are far too many places for individual interpretation.

But perhaps the notion of achieving full semantic interoperability really follows Zeno's paradox of the Tortoise and Achilles. Before we can achieve the goal we have to get halfway there, by which time the goal itself will have moved ahead a little farther. We can come closer to the goal, just as a mathematical series can approach infinity. But we can never quite get there.

Of course, according to Zeno, the Tortoise's clever argument convinced Achilles to give up without trying (even though any child knows they could outrun a tortoise without breaking a sweat). It's difficult to say whether the paradox applies ultimately to computer semantic interoperability—but we'll never know unless we try to get there.


1. C.N. Mead, "Data Interchange Standards in Healthcare IT—Computable Semantic Interoperability: Now Possible but Still Difficult, Do We Really Need a Better Mousetrap?" Journal of Healthcare Information Management, 20 (1) 71-78.

Related Content: