Cybersecurity
In 1951, a patient named Henrietta Lacks underwent a routine biopsy that changed the world. The cells that were taken that day went on to help create the polio and HPV vaccine, HIV medications, and much, much more. No one asked Ms. Lacks’ permission – back then, consent was not an established practice in much of medical research.
This resulted in Ms. Lacks’ genome being sequenced and analysed thousands of times, with trillions of her so-called ‘immortal cells’ in laboratories across the world – even in outer space – and in 2013, her DNA sequence was made available online, with far-reaching consequences. In fact, the proliferation of this genetic information has left Ms. Lacks’ five children and many grandchildren exposed. Her experience has raised the profile of the importance in protecting genetic data, highlighting that in a world of affordable and informative DNA testing, genomic data – the most private and unique data we possess – is increasingly vulnerable to leaks, theft and abuse.
Is there a way to protect the privacy of these data while using them to advance medicine at the same time? Over the past five years, researchers from the Critical and Extreme Security and Dependability (CritiX) research group at the Interdisciplinary Research Centre for Security, Reliability and Trust (SnT), have been working on a range of projects to address this problem along the entire supply chain of genomic data, from the moment it is decoded to the release of studies that use genomic information.
In their first project, GenoMask, the researchers developed a method to protect privacy-sensitive parts of the DNA immediately after they are digitised by sequencing machines. In its follow-up project, the FNR-backed GenoMask Proof of Concept, Dr. Federico Lucchetti and the CritiX alumni Dr. Maria Fernandes and Prof. Jérémie Decouchant further developed the technology into a fully-fledged product prototype, offering GDPR-compliant separation of the personal parts of genomic data from non-personal parts for protection purposes.
However, the researchers quickly realised the need to protect genomic data not only during the storing phase, but also throughout its subsequent processing. Once the new genomic data snippets (reads) are acquired, in fact, the genome needs to be reconstructed by comparing the snippets with the reference genome – such as Ms. Lacks’s.
“This comparison, which needs to be executed for billions of sequences, requires significant computational power and storage space, a challenge that is often solved by using cloud computing. This, however, poses significant risks in terms of privacy, when transmitting and storing data on third-party premises, possibly outside the EU,” said Prof. Marcus Völp, head of the CritiX research group at SnT.
"We showed that overlapping regions of releases can be used by adversaries to circumvent existing privacy-protection mechanisms,”
Dr. Túlio Pascoal, SnT Tweet
For this reason, the team developed MaskAI, a novel approach for privacy-preserving comparison (known as “read alignment”) of DNA data. After GenoMask filters and removes sensitive parts from the reads, MaskAl aligns the sanitised reads, and, in the rare cases where this is necessary, refines the alignment position using the masked-out information.
Regrettably, however, enforcing privacy-preserving storage and processing solutions is still not enough. The very genetic sequences that allow re-identification in specific contexts constitute exactly the valuable information that pharmaceutical companies need to develop, for example, personalised therapies, or cures for rare diseases.
In fact, using the very results of published genome-wide association studies (GWAS), potential adversaries are already able to launch attacks. “We showed that overlapping regions of releases (e.g., of individuals participating in more than one study) can be used by adversaries to circumvent existing privacy-protection mechanisms,” said Dr. Tulio Pascoal, former doctoral researcher at SnT.
By comparing the results of different studies, even when anonymising data as much as possible, attackers may still identify which subjects carry a certain genome variation. This is because it’s precisely these variations that are relevant to the study – even in anonymised DNA, these variations reveal enough information to re-identify the owner. This is particularly true for rare diseases, which are characterised by small data sets of genomic information.
"Protecting individuals’ private data is key to developing a better world. Especially when most people are not fully aware that if their genomic information gets leaked, it could compromise their life,”
Dr. Túlio Pascoal, SnT Tweet
To help solve this problem, Dr. Pascoal has proposed in his thesis mechanisms to combine privacy-preserving processing and the release of genome studies. For example, in dynamic studies, by controlling that enough information is changed in the underlying data set (including when participants revoke their consent under the GDPR regulation), so that the privacy of individuals remains protected. Another solution he proposed is GenDPR, an approach to implement a fully distributed environment where federation members do not need to exchange genomic data to jointly perform collaborative studies. What’s more, the data remains protected even if some of the members collude to get hold of the data of others (e.g., as a consequence of cyberattacks).
The SnT team is currently seeking industrial partners to further develop their solutions into products to protect access to our most private and sensitive data, the very fabric of life. As the demand for genomic data increases, so will the need for GDPR-compliant solutions enabling researchers to securely manage and use this highly personal information. “Protecting individuals’ private data is key to developing a better world. Especially when most people are not fully aware that if their genomic information gets leaked, it could compromise their life,” concluded Pascoal.