We often hear warnings about how machine learning (ML) models may expose sensitive information tied to their training data. The concern is understandable. If a model was trained on personal records, it may seem reasonable to assume that releasing it could reveal something about the people behind those records. A study by Josep Domingo-Ferrer examines this assumption and finds that the situation is less threatening than current discussions suggest.
How regulation frames the issue
The GDPR and the EU AI Act place strict rules around personal data used in ML. Both were drafted before the surge in GenAI, and that timing influenced how regulators treat trained models. Some policymakers work from the premise that releasing a model is similar to releasing the dataset itself. Recent shifts in United States policy show that national approaches can diverge, but the core regulatory pressure remains strong in the European Union.
The study argues that trained models do not carry the same type of exposure as raw data. To learn something about a person, an attacker would first need to run a privacy attack. That extra step changes the threat landscape. It creates a barrier that is not part of traditional database disclosure, where sensitive fields may be visible without additional effort.
Membership inference attacks are harder than they look
Membership inference attacks are the most common type of privacy attack used to assess how training data might be exposed in ML systems. These attacks aim to determine whether a specific data point was part of the dataset used to train a model.
Two conditions are needed before a membership inference attack can lead to firm conclusions.
First, the training data would need to be an exhaustive sample of a population. This is rare in practice. When a dataset is not exhaustive, an attacker cannot be sure that a target person was included because others may share the same visible attributes. In these cases, the target can still deny being part of the dataset.
Second, the confidential attributes linked to those shared visible attributes would need to be the same across all matching records. If the confidential attributes differ, the attacker cannot pinpoint the true confidential value for the target, even if the attacker suspects the target was included. The report notes that such uniformity in confidential attributes is uncommon.
These two conditions have long been studied in statistical disclosure control. Sampling prevents identity certainty. Diversity in confidential attributes prevents sensitive attribute disclosure. Most ML training datasets satisfy both conditions, which limits what membership inference attacks can reveal.
The analysis also reviews the technical limits of existing attacks. It lists four requirements for a membership inference attack to succeed:
- The model under attack should not be overfitted.
- It should have strong test accuracy.
- The attack itself must produce reliable membership signals.
- It must also have reasonable computational cost.
According to the report, no published membership inference attack meets all four requirements at the same time. Some attacks work only when the target model is overfitted, but such models do not perform well on their main tasks.
Property inference reveals very little about people
Property inference attacks aim to learn general characteristics of a training dataset. Examples include whether the dataset contains noisy images or whether a model was trained mostly on pictures of a specific demographic. These findings may reveal biases in the training data, but they do not expose sensitive information about specific people.
The report argues that property inference should not be viewed as a privacy attack against individuals. Instead, it serves as an auditing tool. It can show whether a model producer relied on unrepresentative data or made poor data collection choices.
That information may be relevant to regulators or procurement teams, but it does not compromise anyone’s privacy unless the training set contains only one or two individuals. That situation can occur in federated learning, but it is rare in most enterprise use cases.
Reconstruction attacks have practical boundaries
Reconstruction attacks attempt to recover parts of the original training data. Older theoretical work showed that reconstruction is possible in certain database settings when queries leak too much information. The report notes that similar ideas appear in ML when models memorize training points. Overfitting makes memorization more likely.
Some studies have shown partial success in reconstructing faces or text samples. These experiments tend to rely on models that are small, overtrained, or built with weak regularization. They also require favorable conditions, such as access to gradients in federated learning. These gradients are normally visible only to the server and not to outside attackers.
Reconstruction based solely on observing the final model is far more difficult. To recreate a tabular record, an attacker would need to test every possible combination of attribute values. The search space quickly becomes too large to exhaust. Even if an attacker guesses a plausible record, the attacker has no objective way to verify that the guessed record was part of the training data. Membership inference can serve as that check, but then reconstruction inherits the weaknesses of membership inference.
