Health Data Privacy

Article

Applied Clinical Trials

Examining the privacy of data and the mosaic effect in healthcare.

No Need to Fall to Pieces if You Encounter the “Mosaic Effect”

More and more pieces of information, including medical information about you, are being collected from multiple sources by multiple organizations. Using sophisticated techniques, some of this data is integrated into ways that make it easier and potentially more likely to identify you.

This phenomenon is benignly called the Mosaic Effect, where like tiny shards of rock and glass, bits and bytes of data are pieced together in ways that create a fuller picture-of you.

For most of us, there is an expectation of privacy when we share personal information with organizations. But that belief has been shattered numerous times in high-profile cases, such as one involving a customer of AOL, the Internet company. In this instance, the New York Times used data on search queries to re-identify a woman living in Georgia. From her queries, including “dog that urinates on everything,” “numb finger,” and “60 single men,” a Times reporter was able to find her. From the number and type of queries she made on medical conditions, a reader could conclude she had all these maladies, but she explained that she was doing research on behalf of friends.

In another case, supposedly anonymized data on 173 million New York taxi cab trips were decrypted, making it possible to identify drivers, and even infer where they live. It took someone just two hours to harvest the cab-specific data after it was released by city officials in response to a public information request.

Whether data contains information about you, or it’s your job to protect the identities of people in the data, you need to know two things: the risk is real, and there are ways to lower it to an acceptable level. When risks are lowered in medical data, it can be shared for secondary uses, particularly research into disease causation and cures as well as improving how the healthcare system is managed.

 

Protect us from breaches

It’s a legitimate question to ask whether we’re too concerned about the risk to privacy. After all, there have not been that many data breaches, or such highly publicized ones like the AOL case. The answer is that even one breach is too many, and patient data is protected by law. Data custodians say that breaches remain their number one concern, according to a recent survey conducted by Privacy Analytics. The proliferation of medical data from a variety of sources including social media and wearable devices, coupled with advanced analytical techniques, further increases the risks of exposing personal information. Mosaics are becoming easier to assemble, and that trend is likely to continue.

To remain in legal compliance, those in possession of data and responsible for its protection are required by law to ensure that data not be used for unapproved purposes. The law does allow the sharing of data for secondary purposes, such as medical and public health research, without the permission of people whose data are included in the datasets. But the data must be de-identified unless patients give permission for it to be used for purposes other than their own care.

Different methods are used to de-identify data and lower the risk of privacy violations. No method lowers the risk to zero, but some come close. Different professional and regulatory bodies have, over the years, developed a rule of thumb in which a one percent risk of exposure is deemed tolerable. That is, one percent of individuals in a given dataset can be unique based on publically linkable information such as a year of birth, zip code, or gender. Considering the size of some of the datasets these days, one percent represents a lot of people, raising the question: Is even one percent too much?

The answer, which has become increasingly accepted, is that some risk can be tolerated when balanced against the public good created by advanced treatments for serious diseases that can save lives. Imagine the benefits, for example, if oncologists all over the country are able to share information about treatment approaches among tens of thousands of cancer patients.

Interest and capabilities in genomic research is an example of how advances in science create new potential. But it also represents more ways in which patient data can be compromised. While the risk of your even more finely detailed medical mosaic being viewed by others may grow in the future, tools available today can keep it acceptably low.

 

Solutions

These tools come in a couple of different forms. One, which includes data masking and Safe Harbor, act as a kind of blunt instrument, eliminating so many data points that the dataset is rendered largely useless to researchers. A second is a risk-based methodology that preserves data useful to researchers while anonymizing the rest within the optimal risk tolerance. It is called the expert determination method, which uses statistical probabilities to oversee the de-identification process to ensure a very low risk of re-identification.

Products are available that provide a visual indication of where the risk is, whether the dataset consists of thousands of people or tens of millions. This risk histogram shows the data analyst where the areas of highest risk are in the dataset, and then uses techniques to lower risk by obfuscating information that can cause re-identification while preserving data points that are valuable for analytics and other secondary purposes.

As part of the risk-based process, data flows are mapped to determine who uses the data and for what purpose. For incoming flows, the priority is assessing the risk of directly sourced data, including data from commercial sources or through collaboration with different health systems. Due diligence is applied to assess the security measures of the data recipient. If datasets are combined-the Mosaic Effect-additional assessment and de-identification procedures may be needed. Once the context of the dataset has been established, the re-identification risks in the data can be measured to determine what de-identification routines should be applied, preserving anonymity while maintaining the highest data quality and usability.

The risk-based approach and the commercially available product described here could have prevented what happened to AOL and the New York City government. And you might concede a belief in the capabilities of this approach, while insisting strongly that Blockchain will one day make the Mosaic Effect obsolete by building stronger data protections into the Internet of Things (IoT) and giving citizens more control over their own personal information, wresting it away from Google and others.

Such a remedy sounds fanciful for the immediate future; and even if it eventually materializes, there’s an even chance that there will still be a risk that needs to be managed. 

 

Pamela Buffone is the Director, Product Management for Privacy Analytics.

© 2024 MJH Life Sciences

All rights reserved.