Skip to main content

HIMSS Forum: The challenge of advancing secure AI in healthcare

During the HIMSS Cybersecurity Forum, Dr. Xiaoqian Jiang of the University of Texas Health Science Center at Houston said removing patient identifiers does not always make data anonymous.
By Anthony Vecchione , Anthony Vecchione
Phishing Cyber Security Ransomware Fingerprint Email Encrypted Technology

   Photo: Just_Super/Getty Images

EVENT - During the HIMSS AI and Cybersecurity Virtual Forum on Tuesday, Dr. Xiaoqian Jiang, associate VP for Medical AI at the University of Texas Health Science Center at Houston, discussed how data sharing in healthcare operates on a broad spectrum.

"We are moving more and more toward a collective large consortium data sharing," Jiang said. "As the need for data sharing increases in this collective consortium, the risk and the complexity of data sharing also increases," Jiang said.

Jiang said that the challenge for the healthcare industry is to protect privacy and enable collaboration in a responsible manner.

"Simply removing identifiers does not make your data anonymous. If you take the name away, the data looks anonymous, but indeed, it is still not protected because a single attack can lead to re-identification of a patient," Jiang said.

He said that sometimes a combination of unique identifiers like a zip code, birthday and gender can identify an individual. 

Jiang pointed out that demographic statistics for certain cohorts can lead to privacy leakage, and distribution of disease can lead to re-identification risk.

Additionally, demographics combined with phenotypes provide strong clues to reveal individuals' information, he said.

Jiang cited a study that found that genome data privacy is vulnerable under various re-identification methods.

For example, surnames can be recovered from personal genomes, linking residents and public genetic genealogy databases while DNA information can also be used to reconstruct a 3D human face. 

Jiang pointed to another study in which adversaries can statistically link phenotypes to genotypes using publicly available genotype-phenotype correlations, such as expression quantitative trait loci to re-identify individuals. 

However, Jiang said there is positive technology that can protect against potential privacy risks.

One way is through secure collaboration where multiple parties can collaborate in an encrypted manner without sharing the raw data, he said. 

He also discussed the impact of federated learning and collaboration without sharing data.

In his presentation, Jiang noted that AI excels at merging different data modalities, but privacy and practical constraints often prevent centralizing raw data. 

He explained that federated learning is an AI approach that trains models across multiple datasets without pooling the data.

"At UTHealth Houston, we are also pioneering this kind of development," Jiang said.

UTHealth Houston developed a federated learning workflow manager that allows AI model training across distributed clinical data sources in a secure, privacy-preserving way.

It allows institutions to collaboratively build AI models without directly sharing patient data with each other, Jiang said. 

Finally, when addressing the issue of whether or not the healthcare industry was set up to be able to complete the tasks that he outlined at scale or if it takes further investment in technology, Jiang said, "Some institutions have the capacity; not all institutions are ready because few of them move really fast."

"I think in the future we would expect there will be a common layer and more and more institutions moving to the cloud so we can leverage the cloud layer and manage the cloud layer to conduct this kind of collaboration," Jiang said.