
Macquarie University researchers have demonstrated a new way to link personal records and protect privacy. The first application is in identifying cases of rare genetic diseases. There are many other potential applications in society.
The research will be presented at the 18th ACM ASIA Conference on Computer and Communications Security in Melbourne on 12 July.
A five-year-old boy in the US has a mutation in a gene called GPX4 that he shares with only 10 other children in the world. The condition causes abnormalities of the skeleton and central nervous system. There are probably other children with the disorder recorded in hundreds of health and diagnostic databases around the world, but we don’t know about them because they are kept confidential for legal and commercial reasons.
But what if status-related records could be found and counted while maintaining privacy? Researchers at Macquarie University’s Cyber Security Center have developed a technique to achieve just that. The team includes Dr Dinusha Vatsalan and Professor Dali Kaafar from the University’s School of Computer Science and the boy’s father, software engineer Mr Sanath Kumar Ramesh, who is CEO of the OpenTreatments Foundation in Seattle, Washington.
“I am very excited about this work,” says Mr. Ramesh, whose foundation initiated and supported the project. “Knowing how many people have a disease is at the heart of economic assumptions. If a condition used to be thought to have 15 patients, and now we know, after pulling data from diagnostic test companies, that there are 100 patients, that greatly increases the size of the market.
“It would have a significant economic impact. The valuation of a company dealing with the disease will rise. The value of the product will decrease. How insurance companies account for medical expenses will change. Diagnostic companies will target [the condition] More ▼. And you can start doing epidemiology more precisely.”
Linking and counting data records may seem simple, but in reality it involves a lot of problems, says Professor Kaafar. First, since we are dealing with a rare disease, there is no centralized database and records are scattered all over the world. “Hundreds of databases in this case,” he says. “And from a business perspective, data is valuable, and the companies that own it aren’t necessarily interested in sharing.”
Then there are the technical problems of matching data that are recorded, coded and stored in different ways, and counting individuals who are double-counted in and between different databases. And on top of that are privacy considerations. “We are working with very, very sensitive health data,” says Professor Kaafar.
These personal data are not necessary for a simple estimation of the number of patients and for epidemiological purposes. But until now it was necessary to ensure that the records were unique and could be linked.
Dr. Vatsalan and her colleagues used a technique known as Bloom filter coding with differential privacy. They created a set of algorithms that deliberately introduce enough noise into the data to blur precise details to the point where they cannot be extracted from individual records, but still allow patterns of records of the same disease state to be matched and grouped together.
The accuracy of their technique was then evaluated using North Carolina voter registration data. And the results showed that the method leads to a negligible error rate with a guarantee of a very high level of privacy, even with highly corrupted datasets. The technique significantly outperforms existing methods.
In addition to detecting and counting rare diseases, the survey has many other applications; to determine awareness of a new product in marketing, for example, or in cybersecurity to track the number of unique views of certain social media posts.
But it’s the rare disease application that Macquarie University researchers are passionate about.
There is no better feeling for a researcher than seeing the technology they develop have a real impact and make the world a better place. In this case, it’s so real and so important.”
Professor Dali Kaafar, Faculty of Computer Science, Macquarie University
The OpenTreatment Foundation partially funded the research.
“The foundation wanted to make this project completely open source from the beginning,” adds Dr. Vatsalan. “So the algorithm we implemented is published openly.”
The authors will present their research at the 18th ACM ASIA Conference on Computer and Communications Security (ACM ASIACCS 2023) in Melbourne on July 12.
source:
Journal reference:
Wu, N. and others. (2023) Privacy Preserving Record Linking for Cardinality Counting. ASIA CCS ’23: Proceedings of the 2023 ACM Asian Conference on Computer and Communications Security. doi.org/10.1145/3579856.3590338.