My 2020 Data Resolutions

February 13, 2020
Todd Johnson
Applied Clinical Trials

While “Big Data” is a great buzzword in management circles, small data (with big problems) never gets much attention.

As an industry, we produce data. Lots of it. It is inherent to what we do and clearly necessary for bringing innovative therapies to patients. While much of our time is dedicated to ensuring quality for the important safety and efficacy data in our submissions to regulatory agencies, the other data we indirectly produce-our operational data-can at times flounder (un-queried for quality) in our clinical systems (or ugh, myriads of spreadsheets). While “Big Data” is a great buzzword in management circles, small data (with big problems) never gets much attention.

Our operational data is extremely important. It is a reflection of what we do and how well we do it. We can use it to evaluate our processes and initiate discussions for improvement. We can use it to forecast and extrapolate trends. We can use it to predict what will happen next and optimize expected outcomes. We can analyze it for hidden patterns that can help direct future decisions. So, what is preventing us from using this rich resource as other industries have? Every major sports organization has invested in analytics to better understand which players will succeed or which plays work best in a given situation, but how can we translate this capability to our own industry to resolve challenges with unproductive clinical sites? Apart from the pharmacokinetic/pharmacodynamic (PK/PD) modeling and simulation that our industry currently utilizes, we fail to prioritize operational data to make better decisions. The question is why? While there are likely many different reasons for this (i.e. lack of a data culture and/or strategy, technology limitations, lack of adequate skill sets, etc.), I would like to focus on a very basic, but important requirement: data quality.

All the interesting things we can use our operational data for relies squarely on its quality. While analyzing data is exciting, unfortunately, cleaning it takes a great deal of time investment. A survey in the New York Times indicated that Data Scientists spend up to 80% of their time in data preparation.We may be very familiar with the effort required to lock a clinical database, but we seem to continue to ignore that our other data require the same type of rigor to ensure it is able to be used for measurement and decision making. 

So, as we look back on 2019 and look forward to the New Year, I would like to share some data resolutions that I will continue to evangelize throughout 2020 and beyond.

  • Exercise more. This resolution is usually at the top of everyone’s list so let’s start here. Exercise is all about measurement and at the end of a long week, there is nothing more rewarding than practicing the Friday Afternoon Measurement (FAM) method. This is an exercise designed by the “Data Doc” and fellow statistician Tom Redman that provides a glimpse into how truly high the error rate is in your data.The FAM method is actually quite simple and can be used on any spreadsheet (or validated data extract). The exercise is to assemble 10-15 critical data fields from the most recent 100 data records – and then simply count the number of records that are error-free. For example, your 100 recent records could be investigative sites and the critical data could include planned and actual visits dates (i.e. Qualification, activation, monitoring, closeout, etc.) and subject parameters (i.e. screened, consented, randomized, lost to follow-up, etc.). Even before completing the FAM method, how many error-free records do you estimate you actually have? If you said 3%, then you fit nicely with what Tom found in his study. And note, this does not even include records that have multiple errors (including missing data). Virtually every dataset is dirty to some extent and requires cleaning. 

  • Tell the Truth.What better way to be more honest than to establish a single, trusted source of truth? Yes, it is easier than it sounds, and it is increasingly important as companies inherit new clinical systems. Master Data Management (MDM) initiatives are not at the top of every leader’s priority list. But honestly, they need to be if they are serious about creating a data culture. How many different answers have you received when you simply ask, “when was first patient in (FPI)?”, “how many patients are enrolled?” or “how many sites are currently in the trial?”? MDM programs not only consistently define each of these milestones (i.e. if FPI screened or randomized), but they also define exactly the source of this data. Think data dictionary and data standards. No, MDM is not the sexiest initiative to focus on, but it is incredibly important and should be initiated prior to any reporting initiative.

  •  Save More Money. Automate, automate, automate. Being human, I have no problem trying to eliminate my fellow humans from the data collection, data aggregation and data reporting process. In a nutshell, we are not very good at any of this. We not only lack the requisite focus and speed, but we insist on working in manual spreadsheets. If you have a “Metric and Reporting” task code in your time management system, you are well aware of how much time your employees are spending on manually creating dashboards/reports/trackers. It truly is incredible and the perfect ROI example for an automated solution.  

  • Control Portion Size. Digest only what is meaningful. We all love metrics and reports. But are all of them relevant and meaningful? To everyone? Building a governance around the data that is consumed at your company will enable you to begin eliminating the noise in your dashboards/reports and focus on the relevant signal. Understand there are different consumers of data, with different appetites. Question existing lengthy metric lists if there is no business reason or action expected. Interestingly, regulators continue to push the industry to focus on only what is important for patient safety and data integrity-ICH E6(R2) clearly drives companies to adopt a risk-based approach to quality management. This is a good thing. Not only does this put the patient first, but it also makes good data sense. From vendor oversight metrics and Key Performance Indicators to Key Risk Indicators and Quality Tolerance Limits-we seem to overindulge on what we measure, inevitably leaving unconsumed data on our plates. 

  • Read More. Become more data literate. Understand that there are different skill sets required to build an analytical capability. Spending time developing critical and analytical thinking skills is an important and necessary step in building a data culture. Data literacy is now required from everyone-not just the statisticians or data scientists that work with it routinely. Understand when to use a different measure of central tendency (median vs. mean). Remind yourself what a standard deviation is. Understand that there is natural variance in your data and this is acceptable (so that we can avoid designing our dashboards with narrow thresholds). Develop internal standards of performance. Collect. Clean. Access. Analyze. Benchmark. Question.


  • Set Meaningful Goals. Everyone loves a good performance benchmark. We are egotistical beings by nature and love to see how we compare to others. Companies pay big money to see how they are indeed performing compared to each other. While benchmarks are important to understand one’s performance, a mandatory first step is beginning with a standard. A good standard includes not only an expectation around performance, but also a clear definition and other supporting information. If there are clear industry standard definitions available, use them. Unique, internally developed metrics and definitions may be provocative, but you will have a tough time comparing yourself to other folks that have no idea what it is you are measuring. TransCelerate Biopharma Inc. and the Metrics Champion Consortium (MCC)are incredible industry resources for process and metric standards. I highly recommend that any company in this industry learn more about the available resources these organizations have to offer.

    If you are not able to access any source of external industry standards, make sure your metric is at least standardized within your organization. Having different definitions (and different targets within the same function) leads to confusion – especially for new team members. Lastly, your standards should be centralized, versioned and documented within your organization, with clear expectations around review and updating.

  • Increase Self-Awareness. First, look in the mirror. Now, I am really sorry to inform you of this, but your data is wrong. Exhale. Just like all datasets, the data that you are currently making decisions on is likely incorrect. It’s dirty and needs to be cleaned. Understanding that this is indeed the case is the first step to realizing how endemic this issue is in every organization. Data quality checks were covered in the first resolution, but the point here is that business and process owners should proactively be doing these checks so they can better understand the inherent biases that exist within their data. Understanding how to spot these biases is critical to creating processes that remove them.

    Do you ever wonder why your planned dates match your actual dates in your clinical trial management system (CTMS)? Despite what our optimistic selves may think, it is not because we are exceptionally good at forecasting. It may be because the planned dates were only completed after the actual data was known.

    Another example-take a mental note the next time you are included in any kind of metric/key performance indicator (KPI) review that involves colored indicators. How is it that the data behind the red indicators are always challenged and/or excluded? Why not challenge the green? This is confirmation bias at its finest. Humans like to confirm what they think they know to be true. Mark Twain said it best: “It ain’t what you don’t know that gets you into trouble. It’s what you know for sure that just ain’t so.” Whether it is the overused (and erroneously applied) “80:20 rule” or anchoring on irrelevant data, critical thinking is required to ensure biases are not present in interpretation.

  • Travel More. Get out and see not only where data is being captured, but how it is being captured (insert your favorite polling joke here). Step back and take a long, hard look at your protocol and Case Report Forms (or CRF standards) and consider an external review. Collect only what is required. Just like ICH E6 (R2) has been the hot guideline for the past couple of years, I envision ICH E8 (R1) (and ICH E6 (R3)) will soon be on the tip of everyone’s tongues. Quality by design is not just pervasive in our guidelines, it makes good data sense. Don’t collect what you won’t use-especially when you have patients to consider. ICH E8 (R1) explicitly recommends patient input into study design.

    With the scope and complexity of protocols continuing to increase, understanding the data required to fuel these studies will become more burdensome. Virtual trials alleviate some of this burden by taking the trial to the patient-and data (coupled with technology) is the backbone that enables this to happen.

    On the operational side, understand how data is getting into your clinical systems (CTMS, TMF, etc.) and who is responsible for entering it. Understand the gaps and pain points of these processes since data quality is best addressed at point of entry.

  • Skip Dessert. More specifically, pass on the pie… charts. Data visualization is a science in itself-from understanding how our brains process images and color tones to simply understanding which visualization is the most appropriate for your messaging.

    For as limited as pie charts are for displaying quantitative data, they are ubiquitous in almost every PowerPoint presentation. I remember learning fractions by counting pie slices in grade school (and I do love a good pizza), but there must be an incredibly strong rationale for me to report on any data using a pie chart. They are the bright, shiny objects that serve to distract with limited purpose. And they take up valuable real estate on already information-packed dashboards and reports. To put it simply, pie charts are difficult for us to read (especially when there are >4 categories). Our brains are not good at differentiating differences when the pie slices are of similar size. And when the slices are not similar in size, you may be able to determine which slice is larger, but certainly not by how much. Ultimately, this leads to data labels being added, making the visualization itself not worth the precious space it is taking up.

    Don’t get me wrong, I have nothing against circular charts in general-donut charts are very effective for communicating certain messages. For those Apple Watch users out there, imagine if Apple had chosen to use the common pie chart (3 of them) instead of the donut charts used for your move/exercise/stand goals. How do you think your interface would be affected?  In a nutshell, we need to make data consumers think as little as possible so next time, use a bar chart or a table. 

  • The Golden Rule. While it is important to treat people the way you would like to be treated, it is also important to care for your data so other folks can consume it. And you should expect nothing less from them. As such, data democratization is an increasingly popular concept of simply providing enterprise-wide access to data. There is trust. There is transparency. And there is inherent quality built in. Data quality, like the data itself is an enterprise-wide and shared responsibility. Your quality warts are exposed as are your accomplishments. Data silos are demolished, and data literacy is fostered. This does not happen overnight-and culture plays an incredibly important role (as it does in every one of these resolutions). In this era of fake news and alternative facts, the importance of democratizing data access and data quality is paramount. The great quality management guru W. Edwards Deming once said, “Without data, you’re just another person with an opinion.” Precisely.

These resolutions are in no particular order as they are all important. And some are certainly more labor intensive than others. A critical underlying component is having a culture that understands the importance of data and fosters supportive behaviors around it. Just like having a partner to exercise with increases the likelihood that you will go to the gym, having a leadership team (equipped with a data strategy) that evangelizes data as an important asset is the most important element for analytical success. 

The question is not really if you will implement these, it is when you will need to. In 2020, the life sciences industry will inevitably see new growth in virtual trials, wearable technology, personalized medicine, artificial intelligence and machine learning – and GOOD data is the cog that makes all of this possible. Unfortunately, BAD data can go through these motions, too, and can get you to the wrong conclusion. Other than your people, data is the most valuable asset your company will produce. This year does not have to be the same as previous years when it comes to instilling a new appreciation and understanding of data. You can start today. 

Todd Johnson is a Senior Consultant at Halloran Consulting Group, Inc.



1. For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights, Aug 17 2014.

2. Only 3% of Companies’ Data Meets Basic Quality Standards, T Nagle, T Redman, D Sammon, Harvard Business Review, Sep 11, 2017.