 Wayne R. Kubick
|
I often wonder how people with jobs like me find time to read books anymore. By the time I've gotten through a day of meetings,
e-mail, and business reading, there's not much time to absorb anything else other than a newspaper or a favorite periodical.
My solution has been to listen to audio books during my early morning workouts. I usually confine myself to non-fiction business,
science, and history books to help maintain a semblance of self-improvement, even though listening to a book does not quite
achieve the same level of understanding as reading by eye.
Once in a while, I pick up a book that is so compelling or complex that I find myself having to go back and purchase a print
edition afterwards so I can read it the traditional way. This past holiday, such a book was "The Information: a History, a
Theory, a Flood" by James Gleick. Gleick captivated me as he traced the history of communication, writing, messaging, cryptography
and, ultimately, computing from ancient times through pioneers like Charles Babbage, Claude Shannon, and Alan Turing. It was
surprising to me that the term "information" did not really exist prior to the middle of the 20th century—when people probably
had the time to not only read real books, but also write letters and converse in leisure—and how critical information theory
was in leading the way to computers, the Internet, and today's wireless computing world.
As Gleick portrayed the transition from the spoken to written word, and from paper to bits, I found myself wondering how we're
progressing as we handle information in the world of clinical research and development.
One new trend to consider in this light is "big data," which refers to databases, measured in terabytes and above, that are
too large and complex to be used effectively on conventional systems. Big data has attracted big vendors who have developed
powerful new systems that combine massively parallel hardware and software to quickly process and retrieve information from
such immense databases. In our world, big data solutions have been mostly employed to date in bench-research applications.
In such cases, scientists have already gained experience in how to represent molecular, genomic, proteomic, and other complex,
voluminous data types well enough so that they can benefit directly from the speed of processing and retrieval of big data
appliances.
But it seems that such systems would also be very useful for examining large observational healthcare databases of millions
of patients to try to identify and explore safety signals. Yet it is extremely challenging to meaningfully merge and combine
such data into a single research database, because the content, context, and structure of such data from different sources
is so heterogeneous. Continual movement toward electronic healthcare records, together with advancements in standards and
systems may get us closer eventually, but the path will likely continue to be long and tortuous. And the current business
model of either tapping into a data provider's system one at a time, or downloading local copies of each data source, coupled
with the risks of maintaining privacy and compliance compounds the problem.
Projects like OMOP, Sentinel, and EU-ADR are raising interest in exploring healthcare data, and available data sources and
tools are improving all the time. For example, the UK government has recently announced its intention to make available National
Health Service data to the R&D community. Yet while projects like OMOP reflect cross-industry, cooperative efforts to better
understand methods and develop open source tools, sponsors are still locked into creating their own local copy of a research
database by contracting with individual data providers one at a time and building their own data repository.
It would be much more logical and efficient if the industry could work together to make such data of common interest available
to all—as a public research "big data commons in the cloud," which would eliminate the need for everyone to set up their own
local environment populated by the same external data sources over and over again. Of course it would certainly be challenging
to establish an effective cooperative legal/business model and infrastructure to serve the interests of many different stakeholders,
including pharmaceutical manufacturers, researchers, regulators, and even healthcare providers and payers.