OR WAIT 15 SECS
The “citizen data scientist” is the person with no official data scientist training who uses the latest tools and technologies to handle data wrangling duties, analyze data, and create reports and models.
The world of data management today is practically synonymous with electronic data capture (EDC). Data management staff spend most of their time not directly programming systems but maintaining the system itself-completing forms or reconciling queries. As data is changing-and it is, with massive increases in available data sources-we must consider what traditional data management activities look like on studies designed using only external data sources. As trials move to more agile means of data capture, including electronic medical record (EMR) and electronic health record (EHR) systems, biosensors, wearables, bring your own device trials, and more, EDC expertise risks becoming outdated.
A recent Impact Report from Tufts Center for the Study of Drug Development (CSDD) surveyed sponsors and CROs about data management and found a decrease in the prevalence of EDC as a primary point of data capture for clinical trials. The decrease isn’t stark-yet-but there is a downward trend. Data management is starting to show strain by trying to combine traditional methods and approaches with more recent advances in tools, data diversity, volume, and velocity.
The same report showed that now there is an average of six apps used to support each study, and that newer sources of data-EMR and EHR systems, biosensors and wearables, to name a few-are predicted to rise two- or three-fold in the next three years. These types of devices send massive amounts of data that are too vast and complex to be distilled into a simple row-by-column spreadsheet. Incredibly complicated data requires a multidimensional data structure-that cannot be reviewed manually-and needs to be addressed differently than standard EDC.
All signs point to a shift in how data management is changing and adapting to new challenges and new tools. Companies are looking for specialized data scientists, instead of just data management. These positions have a heavier technical bent and are prepared to take over more complicated tasks. However, finding trained data scientists prepared to tackle these complex, multidimensional data structures, is not always easy.
Hence, there’s been increased attention on a newer industry term, the “citizen data scientist.” Our industry hasn’t agreed on an official definition, but in short, the citizen data scientist is the person with no official data scientist training who uses the latest tools and technologies to handle data wrangling duties, analyze data, and create reports and models.
The below percentages reflect a break down of what data scientists spend the most time doing:
The tools and technology available today make it possible for non-data scientists, also known as citizen data scientists, to do some of the same work as data scientists. With the right tools and technology, a non-data scientist can manage cleaning and organizing data, leaving the trained data scientists to the more complicated work of building training sets, mining data for patterns, and refining algorithms. Some of these data visualization tool examples include SAS Visual Analytics, D3.js, Tableau, or even potentially home-grown systems, if you have in-house resources (like RhoVer). Additionally, data science/analytics online courses, such as DataCamp and Coursera, can be a good resource for potential citizen data scientists.
As data management experts, we must resist the urge to wait and see which new tool and technology will clear the next path for our staff. We need what is referred to as “adaptagility”-fluid, differentiated, unorthodox thinking that will help build a new model for data while replacing the old.
Derek Lawrence, Operational Service Lead, Clinical Data Management, Rho, Inc.