The Value of Integrated Data Views for Clinical Trials

February 2, 2021
Teresa Montes

Emma DiBella

To have a successful integration of data from across clinical systems, a strong understanding of the data is a critical.

Clinical trials build a profile of a compound to answer: Is it safe? Is it effective? Is it better than other offerings on the market? The cost of gaining this knowledge is high, so maximizing the utilization of the knowledge collected through the trial is essential.

Individual systems or a few integrated systems provide a reasonable view of the data collected within a small domain, but growing organizations need to bridge data from multiple systems to support analysis across the clinical operation.

Data used to drive operational decisions is generated by a variety of contributing areas, including biostats, essential documents, clinical supplies, patient reported outcomes, protocols, disease prevalence and genetics, to name only a few. The list of data sources expands every day. Integrating data sets from different clinical systems opens paths to pursue answers for difficult questions, previously impossible to answer using one or two linked systems.

Clinical goals change rapidly, as clinical teams work to improve and save lives. Limits imposed by a lack of data inhibits the chances for success. In addition to assessing past and current progress, large, integrated sets of data are the essential to activate the power of Artificial Intelligence (AI) and Machine Learning (ML) tools to better project outcomes.

Integrating data from across clinical systems requires investment. So, the question becomes, how much integration is needed and how big is the investment? As you choose the integration options, building a strong understanding of the data is a critical, foundational first step. Using your integrated data successfully depends on this first step.

Build the foundation

Managing data across multiple tools requires an understanding of the data connection points. Looking at an example, you may want to observe trends with patient data quality, across studies, for a particular clinical site. To do this you need to look at potentially one system for study sites and locations, and possibly a different system for information on the organization. Next you gather contract information and find the site data quality and timeliness information in a different system.

Doing this task can be difficult. You need to be certain that the site information matches correctly across systems and they have the same organization, the same physical location and the same department.

Across systems or within systems, data may be inconsistent. You may see something as common as multiple studies using the same physical site, although they have different site numbers in each study. Having different names or a site, or errors in data entry such as multiple versions of organization names, can make compilation of a full picture nearly impossible. 

When you successfully associate your data across systems, the value of your data greatly increases; you create an opportunity to track site performance without investing in new site tracking systems. Getting an understanding of your data, within and across systems, is the single, most important factor in making your data aggregation work. Resources invested in standardizing data reap huge benefits.

Assembling the foundation

Determining the definition of the data (e.g., study start date: What event is the starting point for a study?) and the relationships between pieces of information (e.g., do clinical studies have one or more indications; no indication?) allows you to use collected system data to address a larger number of business questions. As more integrated system data is available from both within and outside of the organization, new business insights can be harvested. 

As an example, consider the potential number of available patients fitting the clinical study protocol requirements within a reasonable radius of a site. When considering funding and expected study completion timelines, this is a reasonable question to ask. An analysis of this question would consider the protocol requirements, demographics of the local population, the transportation opportunities for the target population and the number of competing specialists in the area around the clinical site. Working with individual systems and one-off queries of external data sets, this task becomes time consuming and error prone. Combining data from confirmed internal and external systems makes this an easier question to address. 

The task to define and test the combination of data requires resource commitment. How much will you need to start? Consider the areas of greatest need and risk, which provide an obvious avenue to begin the data integration journey. The risks you identify, such as data integrity needs across systems or data sets with limited availability, will inform choices between options for integration. 

The options

Different price points and levels of flexibility exist, depending on the scope and functionality required for a data integration. In many cases, a single organization may utilize more than one of the approaches described below.  

  1. The platform solution

    The platform solution hosts multiple functions in one tool. Why build interfaces when you can buy a tool that has consolidated the data and built the basic relationships? A platform solution provides multiple sets of functionalities, such as essential document management, trial management, study startup support and site communications, built into a single tool. Given that all of these functions relate to a clinical study, the structure is shared. Data is captured once and automatically shared across all uses. 

    With the platform solution, organizations must still manage data integrity, such as assuring there are not three versions of the same Jay Smith in the list of investigators. This solution will not be suitable for every organization, as the introduction of new systems can be costly.
  2. Integrated applications

    Integrated applications allow your tools to work together, despite being independent systems. Connectors exist to allow users within one system to access documents or data from another system, without having to leave their current interface. As an example, regulatory teams working on a submission may open a view into the eTMF system to link to a document of interest. Connectors are purchased separately and work with a limited set of systems.
  3. Point-to-point system integration

    Point-to-point system integration depends on building tools to share data. For example, product management information can be shared with a clinical study system to assure integrity is maintained in the use of the data. When the integration tool is triggered, identified data updates in the source system push to another system.

    These point-to-point integrations have been around for a long time and vary in complexity. The simplest tool is the Excel sheet, used to format data sets following strict formatting rules. The file is used to allow users upload data sets, and user groups trigger the updates, when needed.

    A scheduled data update may be tied to the clock or result from a change in the data. An integration tool checks for the trigger. When started, the backend routine pulls data from the source and pushes the update to another system.

    A newer option in this category is Robotic Process Automation (RPA). Conceptually, the robot (in ‘Robotic’ Process Automation) learns a set of update actions, based on data provided. You may recall using macros in Excel; a user programmed steps by memorizing keystrokes. RPA expands on this idea by applying AI and ML. The robot learns by watching users process updates. If a particular set of conditions exist in the input data, then the user normally takes a specified update action. Once trained, the RPA software is able conduct update steps, based on a complex pattern of inputs built and learned by the tool. Corrections over time hone the algorithm, improving accuracy. This approach allows the robot to replace repetitive user actions. 

    The costs for point-to-point solutions align with the complexity and robustness of the solution. The Excel solution takes minimal upfront investment, although it is labor intensive and prone to input error. The RPA option is more expensive and requires training time, but after initial learning routines, the maintenance is held at a relatively low level.
  4. Hub and spoke design

    Moving away from a direct system A to system B connection, a hub and spoke design pushes data sets to a central storage location. This central storage area will allow other systems to consume the data, potentially provided by multiple systems. 

    In these models, data definitions and data relationship definitions are needed to provide data to consuming systems. The difference between the options below is when those definitions and relationships need to be built. 

    Traditional data warehouse models predefine all pieces of data, integrate the data into a data model (hard-wiring relationships), establish rules about reliable data sources and support the generation of data views for consumer systems. Data warehouse models rely on the data definition and relationship work being completed before storing the structured data sets. Maintenance on these systems tends to be high, as new data sets may impact the existing schema for the data.

    Data lakes provide a way to ingest structured and unstructured data sets of varying size. Like books run through a blender, all the pieces of data are held there. To be useful, they need to be assembled. Data lakes allow for the definition of data relationships after the data is stored. To extract and use the data, relationships must be built. 

Leveraging data from sources both within and outside the enterprise may provide new and interesting opportunities for investment, resource planning, study planning and knowledge reuse—all essential for making the greatest use of information collected in and around clinical trials.

Teresa Montes is a Clinical Practice Lead; Emma DiBella is a Clinical Consultant; both for Daelight Solutions.