Regulatory limits on explicit targeting have not stopped algorithmic profiling on the web. Ad optimization systems still adapt which ads appear based on users’ private attributes. At the same time, multimodal LLMs have lowered the barrier for turning these hidden signals into profiling tools. A new study examines this risk and asks how outside parties could use these signals to infer private attributes from ad exposure alone.
Conceptual overview of the adversarial profiling threat from the passive ad exposure to the user.
LLMs can reconstruct private attributes from ads
The researchers introduced a pipeline that uses an LLM as an adversarial inference engine to perform natural language profiling. It was applied to about 435,000 Facebook ad impressions from 891 users, with Gemini 2.0 Flash processing the image and text components of each ad. For each ad, the model produced a structured summary. The researchers then fed sequences of these summaries into the model to predict details such as age, gender, employment, education and party preference.
The authors found that the model could extract these details with accuracy levels better than a random guess. The model outperformed demographic baselines that represent what someone could guess without looking at any ads.
Gender reached about 59% accuracy in short browsing sessions. Employment reached about 48%. Party preference reached about 35%. Even when the model missed the exact age or income range, its guesses often landed close to the correct category.
Meta removed direct targeting options for sensitive categories such as political beliefs in 2022, but the study shows that its ranking system still introduces demographic patterns in ad exposure. These patterns can be copied by any party with access to the visible ads shown to a user.
Model predictions rival human guesses
The researchers also compared the model with human judgment. Human reviewers looked at the same ads and tried to extract key attributes. The model often matched or exceeded their accuracy.
It performed slightly better on gender and showed stronger results for education, employment and party preference. Its age estimates were also closer to the correct value. Both humans and the model struggled with income.
A browser extension can exploit this
One finding that stands out concerns how simple it is to collect ad content. Rather than shipping specialized malware, an adversary can slip this attack into the existing ecosystem of widely installed, normal-looking browser extensions, such as ad blockers, coupon tools or page translators. These extensions legitimately need permission to read web page content in order to work, which gives them convenient cover for harvesting data.
This route sidesteps user attention and platform checks. People tend to worry about invisible trackers and cookies, and overlook the signals hidden in the ads they can see. Store reviews for apps and plugins also tend to focus on code safety, not on what can be inferred from the content an extension is allowed to access.
That gap creates a regulatory blind spot where a harmless-looking extension can quietly collect ad content without triggering alarms. With the help of LLMs, an attacker can automate the whole process and turn it into profiling at scale.
As the survey shows, useful profiles can be pieced together from short observation windows, so an attacker does not need to stay active for long. This route also gives an advantage because it supports off-platform profiling that slips past a platform’s own privacy safeguards, including the removal of sensitive targeting options. In practice, the attacker leans on the optimisation logic of the ad system to build sensitive user profiles at low cost and without leaving an audit trail.
A global risk that extends beyond one platform’s dataset
The dataset comes from Facebook users, but the mechanism applies widely. Any system that tailors ads for engagement may create demographic patterns in delivery and expose information about a user through the content that appears in a feed.
The spread of advanced models and open access to APIs lowers the earlier need for collecting large labelled datasets and training custom classifiers, making the attack accessible to individuals with only basic technical skills.
Addressing this risk will require privacy rules that take into account the hidden signals inside the content people passively scroll past each day.
