Your heartbeat could reveal your identity, even in anonymized datasets

Your heartbeat could reveal your identity, even in anonymized datasets

A new study has found that electrocardiogram (ECG) signals, often shared publicly for medical research, can be linked back to individuals. Researchers were able to re-identify people in anonymous datasets with surprising accuracy, raising questions about how health data is protected and shared.

Linking ECG data to real people

The research team tested how an attacker with limited information might connect public ECG data to private sources such as wearable devices, telehealth platforms, or leaked medical records. They used machine learning to compare patterns in heart signals, which are as unique as fingerprints.

Working with data from 109 participants across several public datasets, their system correctly matched ECG signals to individuals 85 percent of the time. Even when noise was added to the data, the method remained effective. The results show that basic anonymization techniques are not enough to prevent someone from tracing a signal back to a specific person.

Ziyu Wang, a co-author of the research, told Help Net Security that current privacy assumptions are outdated. “Current practices often assume that health data, including ECG, becomes safe once it has been ‘de-identified’ by stripping names or obvious identifiers. Our findings show that this assumption no longer holds. Because ECG signals carry stable, individual-specific patterns, they can act as biometric identifiers. This means that even without names, raw ECG can be linked across datasets or back to an individual if auxiliary data exists.”

The researchers explained that ECG data retains distinctive patterns over time. Unlike demographic data, these patterns cannot be removed or generalized without harming the medical value of the dataset. This creates a difficult challenge for health organizations that want to share data for research without exposing patient identities.

How attackers could exploit ECG data

The study highlights a growing risk at the intersection of healthcare and cybersecurity. Wearable devices such as smartwatches and remote monitoring tools collect and transmit vast amounts of ECG data every day. Telehealth services combine these signals with other sensitive medical information.

Overview of the proposed linkage attack method

If even a small portion of this data leaks or is made public, attackers could cross-reference it with other sources to identify patients. This type of attack, known as a linkage attack, does not require access to a hospital database or insider knowledge. It relies on finding overlaps between different sets of data and using algorithms to make matches.

Wang said policymakers need to catch up to these new risks. He recommended four steps:

1. Reclassify ECG as biometric data. ECG should be given the same sensitivity tier as other biometric identifiers, requiring higher protection standards than ordinary clinical data.

2. Mandate risk assessment and informed consent. Before health data is shared across institutions, providers should be required to estimate the re-identification risk — whether or not the data has been “de-identified” in the traditional sense. Patients should be informed that ECG functions as biometric information, and their consent should explicitly acknowledge potential linkage risks. Data consumers must also be bound by policies that prohibit attempts to cross-link records beyond approved uses.

3. Enforce cross-institution safeguards. Institutions collaborating on research should operate under controlled-access agreements and audited environments, rather than freely exchanging raw ECG files. Metadata that facilitates linkage, such as exact timestamps, device identifiers, or site codes, should be minimized or generalized.

4. Strengthen patient and data consumer awareness. Patients should give consent with clear warnings that ECG carries biometric re-identification risks, and data consumers (e.g., researchers, companies) should be bound by explicit restrictions on reuse and cross-linking.

Technical insights and limits

The team built its approach using a Vision Transformer model, a type of deep learning system that can process complex patterns in time-series data. The system worked in two stages. First, it tried to match each ECG sample to a known individual. Second, it decided whether the sample came from someone outside the known group.

To simulate real-world conditions, the researchers assumed the attacker did not know exactly which individuals were in the public dataset. Despite this limitation, the attack was still highly accurate. At a certain confidence threshold, only about 14 percent of signals were misclassified.

The study also tested scenarios where the attacker had knowledge of the dataset or where the data was noisy. Performance dropped slightly with noise, but the results were still strong enough to show that simple obfuscation would not protect privacy.

Other biosignals at risk

ECG is not the only data type vulnerable to linkage attacks. Wang said similar risks exist for other common biosignals. “Using a smartphone dataset, we were able to reproduce ECG-style linkage attacks on PPG with success rates comparable to our ECG findings. This highlights that PPG, which encodes stable cardiovascular features such as pulse morphology and heart-rate variability, is highly vulnerable.”

He added that voice data and EEG could also face rising threats. “Voice data represents another critical vulnerability. It has long been recognized as a biometric ‘fingerprint,’ and the availability of abundant public recordings combined with advances in voice synthesis and conversion technologies make identity linkage and misuse highly feasible and already observed in practice. EEG also carries subject-specific patterns, though current risks are lower due to smaller and more heterogeneous datasets. However, as consumer EEG and brain-computer interface devices become more widespread, the potential for EEG linkage attacks will likely grow in the near future.”

Where healthcare security goes from here

The authors argue that healthcare organizations need to adopt stronger privacy protections for biosignal data.

“For telehealth providers and wearable companies, the first step is to acknowledge that ECG and similar biosignals should be treated as biometric data, with the same sensitivity as fingerprints or voice,” Wang said. “This means implementing compliance frameworks that include privacy-specific consent. Patients must be informed that these signals carry inherent re-identification risks, and consent should reflect the actual ways data will be used and shared.”

Wang also described how companies can take immediate steps without waiting for advanced technologies to mature. “Providers should avoid ‘one-size-fits-all’ anonymization that distorts entire biosignals, since this can destroy clinically valuable patterns. Instead, privacy protection should target the most identity-revealing regions of the signal. Our group is actively exploring generative AI approaches to selectively modify or regenerate these personal patterns in ECG and PPG while preserving medically meaningful features for research and model training.”

For cybersecurity professionals, this research is a warning that biometric data is spreading beyond traditional identifiers like fingerprints or facial scans. ECG signals, once seen only as medical information, are now part of the broader security landscape.


Source link

About Cybernoz

Security researcher and threat analyst with expertise in malware analysis and incident response.