OR WAIT 15 SECS
Promising data warehousing systems like Janus are still waiting to realize their potential.
For many of us involved in clinical research, the workday world revolves around the life-cycle of an individual clinical study. Each study is defined by a unique protocol, which is seeking to explore a hypothesis and answer a set of specific questions. The study is conducted, data collected and analyzed, and the results feed into a product marketing application while expanding the basis of experience for the next set of studies.
Once the study is over, we're on to the next, when the cycle begins all over again. Many of the questions asked on Case Report Forms will change (even when they're testing the same experimental product)—whether due to continuous improvement, range of knowledge or individual whim.
If another researcher wants to learn from our data, well, they can read the summary on the public registry, or maybe the clinical study report (if they can find a copy).
If they want to query the data to perform some retrospective or comparative analysis that expands the scope of the original study or application, well, good luck to them. They will need to learn the secret password, don the special glasses, and delve down into the dungeon archives to look back at the data through the lens of that original study—if they can find the data at all.
There must be a better way to absorb, interpret, and reuse clinical data. Many years ago, some scientists at the FDA considered one possibility—by conceiving of a clinical data warehouse that would enable FDA to make better use of the data submitted by sponsors and improve the review process by having better access to data in a more consistent format.
The idea was that putting all this clinical data that FDA regularly receive in one place would make it possible to look back more easily at the learnings of the past—in this case experience with similar drugs or similar conditions and indications. And perhaps, being able to distill such data would in some cases help them look forward, as by establishing a reference resource of placebo effects on patients or project problems that might occur with new drugs that hadn't been recognized previously.
They named this concept "Janus" after the dual-faced Roman god of gateways and change, who is depicted as simultaneously facing opposite directions in recognition of his ability to see both past and future.
Fundamentally, a clinical data warehouse is a place to store and organize trial data. But the warehouse analogy is really much too general. For example, a warehouse vault of safety deposit boxes may consist of a collection of individually managed bins, each of which may contain anything. The warehouse manager should know how many bins there are and who's responsible for them, but not what's inside—retrieval is limited to providing access to the owner.
So it is with many clinical data repositories today. We may know little more about the data other than a study number, and we have to know about the study to find out whether its data will answer a particular question. And woe to us if we need to pull together data from a whole bunch of studies to answer a complex question—this will undoubtedly require a great deal of time-consuming detective work, followed by a rigorous disassembly and reassembly operation before we can even begin to probe the knowledge within. It's not something we can easily pull off on demand.
A better analogy is that of a library (or repository) of books, each individually written in a specific language, printed in a particular alphabet, and characterized by specific attributes (its author, subject, title, publisher, and concepts represented within its pages). When we construct a library, we need to do more than allocate shelves to individual owners. We need an index and cataloging system to organize the books and their metadata, and help people find what may be relevant for their own inquiries.
A clinical data repository would have to rely on a rich set of metadata to describe the attributes of the data for each study, and a set of standards that would ensure the data is communicated in a common manner, according to a common dictionary of meanings. Quantitative data would need to be processed in a consistent way so that different people asking the same question in different ways using different tools could be assured of getting the same answer. That is justification enough for Janus.
Unfortunately, while the Janus concept has captivated many researchers, the reality has not yet lived up to the promise offered by its supernatural (albeit mythological) name.
As with many promising visions, it took time to find the necessary resources and support to get started. Meanwhile, while the original design aged, the world kept changing and the scope kept widening before the initial foundations were even in place. In this case, once the concept was transformed into a model and the model became a system, the practical realities of dealing with that inconsistent clinical data proved to be intimidating.
The model required strict adherence to the CDISC Study Data Tabulation Model and define.xml standards, which while providing many current advantages for general regulatory review are still not used consistently enough within industry to fulfill the strict Janus design requirements. The effort to load data from different studies in a consistent enough manner to enable comparative review and analysis of multiple studies turned out to be more difficult than anticipated.
There are simply too many variations between trials (even for the same sponsor and product) and too many gaps, ambiguities, and anomalies in clinical data for everything to fit together. As a result, only a small number of legacy studies have been successfully loaded, mostly for testing, demonstration or exploratory purposes rather than active regulatory review.
What's more, the Janus model also depended on a great deal of information that wasn't represented directly in the existing data, such as what assessments were supposed to be performed and what types of analyses were conducted with what assumptions, imputations, and judgments. Most of this information should have been available in the protocol, analysis plan, study report or other documentation, but not in a form that could easily be translated into the structures required by Janus. So the project's accomplishments haven't quite lived up to its ambitious goals yet.
But then, how realistic is such a static data model in a rapidly changing world? The original Janus model was designed primarily to store clinical data for human drug trials, but FDA's sphere of responsibility includes both clinical and nonclinical data for other regulated products such as devices, vaccines, veterinary, and foods. This is a broad scope, and the number of variations between the data for these separate products dwarfs the inconsistencies within clinical data for drug products alone.
Even with trials, the aggregation and comparison of mixed study data hinges on the establishment and widespread adoption of clinical data and protocol standards and common terminologies, which are a moving target in a complex scientific discipline like clinical research (not to mention health care). Meanwhile, the volume and types of data continue to proliferate, and much of that data (including unstructured text, genomics or imaging data) doesn't always conform easily to a relational data model.
The semantic Web offers promise of harnessing the power of some of these newer types of data, but it has not yet achieved widespread acceptance or use. And so the FDA has been developing a new, more robust Janus model that can address these expanded requirements while also being more compatible with an evolving new set of standards based on the HL7 reference information model.
The FDA's Janus program continues to evolve. It's certainly possible to tweak the current model and system to be more forgiving of the realities of trials data so more current data can be loaded, and to expand the current design to address some of the features that were identified since the original model was designed. But is it sensible to do this while simultaneously developing a new model that is intended to do much more?
And all of this will likely take a long, long time before we even have a good sense of how close the current path will take us to the goal. And can any model be designed to accurately represent such a rapidly changing world and anticipate where, as Wayne Gretzky would say, the puck will be when we finally reach the goal?
To borrow another analogy, consider the movement to develop alternative, sustainable energy sources to alleviate our excessive dependence on fossil fuels. We find ourselves enamored of promising technologies such as hydrogen fuel cells, but the pieces aren't all there yet. In the meantime, we make do with hybrid technologies that seek to bridge the gap while we await the realization of the promises.
A hybrid based on our existing CDISC standards might be a good enough solution now, especially while we wait for a more perfect one. But we'll never get that perfect solution if we don't look forward to it. After all, most warehouses meant to house any accumulating material will be outgrown eventually. So, looking back and forward simultaneously may be a reasonable way to go when you have two faces to work with.
Wayne R. Kubick is Senior Vice President and Chief Quality Officer at Lincoln Technologies, Inc., a Phase Forward company based in Waltham, MA. He can be reached at [email protected]