Data Warehouses: The Future of Pharma


Applied Clinical Trials

The Pharma industry is notorious for generating large volumes of data, with the exponential production of new facts and figures each year. Consequently, this data overload has caused an awareness of the current limitations in processing meaningful, timely, and actionable data. However, these limitations are not caused by a devoid of systems. Instead, the question surrounding these limitations has always been, “What do we do with the data?”

To address this question, the industry has turned to data warehouses to help control and access all of one’s data. However, it is evident that a shift to a bottoms-up approach to data warehouses is needed to make information available to a much wider audience – ensuring that decisions can be supported with relevant and appropriate data to adjudicate the risks associated with such decisions.

The definition of a data warehouse can vary greatly from one organization to another. Some define them as data repositories, others data marts, and finally others as true data warehouses. How these warehouses retain data can also significantly differ. For instance, sometimes data is retained in elegant, highly-customized data warehouses – driving Business Intelligence, with the original cost in the tens of millions of dollars. Other data can be stored on CD’s and or thumb drives, without the ability to index the data and retrieve it quickly. To understand where data warehousing is headed in the future, it is important to understand what has been thus far available to the market.

Data repositories are inexpensive and relatively easy to create; however, these simple and one-dimensional systems will not meet the needs of Pharma going forward. This is because they only act as a storage facility, accumulating a tremendous amount of data, but lacking the capabilities to manipulate that data. To showcase its true value to the Pharma industry, data must be transformed into meaningful information.

Data marts, although they can be free standing, are usually considered subsets of data warehouses. For example, a data mart may be established around a clinical program - there might be 12 individual trials incorporated into the data mart and the information is available for all users with access to the information. Furthermore, you could have a data mart that has a multitude of clinical programs aligned within a therapeutic area, such as cardiovascular, biologics or CNS. As a result, data marts are typically designed around a specific operational group, with a specific operational goal.

Data warehouses, on the other hand, can be an amalgam of data marts held within it. They can function dynamically, using elements of both a data repository and a data mart to optimize the storage of and reporting of information.

There is still, however, the added complexity of the Top Down, Bottom Up, or Hybrid design of a data warehouse. Each design produces a significant variation in the time and cost to build, deploy and maintain systems. Consequently, the design question becomes, “Which is the most efficient and cost-effective approach?”

A top-down design has been used in the so-called traditional data warehouse models in the Pharma industry. In this design approach, data is entered into the warehouse and then populated into the data marts, depending on how the framework is designed. However, this is a very significant design decision because typically when a top-down approach is selected, it has been agreed upon that the data entering the warehouse will be standardized. While in theory this may seem like a reasonable approach, experienced industry veterans know the inherent complexity and cost associated with the word “standardization”.

More specifically, the first step in this design phase is to establish standards for your data elements. For example, gender would always be identified as either male or female. However, any data sets scheduled to enter the data warehouse that were coded as “M” or “F”, depending on how the sites designated gender, would have to be converted over to “Male” or “Female”. The initial mapping of data for this type of design is extensive, requiring a tremendous amount of work and cost up front for the initial design. You may also find that there are specific data sources that cannot be placed into the warehouse because of the top-down approach.

The bottom-up approach is significantly different in that the data marts feed up into the warehouse, which is the palpable reverse of the top-down approach. The key benefit here is that you are not forcing data conversion when the data is passed through to the warehouse.

As a result, the warehouse could then have different labels for the same data. As highlighted previously, the warehouse may now maintain both male/female, and M/F for the same data type. Not only does this approach allow the end users to become more skilled in understanding the data, but you avoid the significant initial investment in time and money in standardizing your data as it enters the warehouse. Lastly, the hybrid model attempts to blend the quick start-up time of a bottoms-up warehouse, with the standardized data consistency of a top-down approach.

As always, the future of the Pharma industry is being defined by the strategic decisions industry leaders are making today. Over the past decade, the structure of the pharmaceutical industry has been moving from a centralized organizational model to a decentralized model. More specifically, organizations have re-aligned themselves around business units, therapeutic areas and/or disease states. The goal of this shift is to produce operational entities that have the ability to define the clinical programs that will ultimately align with the needs of present and future patients.

As the structure of Pharma shifts, so do its organizational needs. Currently, organizations are asking, “What can we leverage and what can we integrate?” However, the question should be, “What tools do we need to accomplish our goals?” The industry needs to actively retire their antiquated technology systems that have not realized the improved reaction time of more recent innovations. The old systems can no longer meet the needs of the organizations they were designed to service.

For example, in a large Pharma organization that currently maintains a top-down, highly structured data warehouse, a formal request is required to access data. This request is taken, and then put through a process within the centralized Business Intelligence department (BI). From there, it could take days or weeks for the response to get back to the individual that originally requested the data. In a decentralized world, this time delay would be unacceptable, as this defeats the purpose of the model altogether.

Data warehouses will become ubiquitous in the industry in the future, considering all entities large and small will require quick access to their data. It is essential that information be made available to a broad audience: If individuals can quickly access data, and then act upon that data, their findings can provide a strategic advantage. Since top-down, highly structured data warehouses may be prohibitive in terms of time and expense going forward, bottom-up or hybrid designs may be the path for many Pharma and Biotech companies. As a result, data marts could be the genesis of future data warehouses for many organizations going forward to address the critical issue of controlling, analyzing, and using their invaluable data.

Related Content
© 2024 MJH Life Sciences

All rights reserved.