Researchers: Your ‘Anonymous Data’ May Not Be As Anonymous After All
Americans could be signing over the keys to their identity when filling out medical forms that promise to “anonymize” their information, according to a new algorithm developed by scientists.
- By Haley Samsel
- Jul 25, 2019
When most Americans sign agreements allowing their medical records or personal information to be used for research, they are told that their data will be “anonymized” — in other words, it cannot be traced back to them. Residents who fill out Census Bureau forms, providing data that determines how government funds are distributed and may become public, are told the same thing.
But, according to research published in the journal Nature Tuesday, your data may not be as anonymous as you thought. Scientists at the Imperial College London and Université Catholique de Louvain in Belgium have come up with a computer algorithm that can identify 99.98 percent of Americans from “almost any available data set with as few as 15 attributes,” including gender, ZIP code or marital status, The New York Times reported.
In making the algorithm public, the researchers made a difficult choice in alerting the world to the massive amount of personal information already available via data sets that are bought and sold without regulation in many parts of the globe. Usually, the flaw is reported to a country or company, but the data privacy problem is so prevalent that the authors decided to publish it widely.
Read more: Healthcare Industry at Highest Risk of Cybersecurity Breaches, Study Finds
“It’s always a dilemma,” Yaniv Erlich, chief scientific officer at MyHeritage, a consumer genealogy service, told the Times. “Should we publish or not? The consensus so far is to disclose. That is how you advance the field: Publish the code, publish the finding.”
The finding poses a major issue for security experts tasked with protecting consumer data, particularly when it comes to medical and health data sets. Usually, researchers “de-identify” individuals by removing attributes, substituting fake values or by releasing only parts of anonymized data.
But this isn’t enough to protect people from being identified, either as individuals or part of a household data set, according to the study’s authors.
“We need to move beyond de-identification,” Alexandre de Montjoye, a computer and lead author of the paper, told the Times. “Anonymity is not a property of a data set, but is a property of how you use it.”
The balance between encouraging scientific research and potentially exposing the personal information of hundreds of millions of people to cybercriminals is extremely tricky, and the data gathered about individuals is never completely private, according to the researchers.
“You cannot reduce risk to zero,” Erlich said.
de Montjoye told the Times that medical professionals are now asking patients to sign forms letting them know that their medical data could be shared with other hospitals and a system that might give his information to universities, government agencies and private companies. One form he saw as a patient even said that he could be identified through the data he signed over.
“We are at a point where we know a risk exists and count on people saying they don’t care about privacy,” he said. “It’s insane.”