Social data puts user passwords at risk in unexpected ways

Social data puts user passwords at risk in unexpected ways

Many CISOs already assume that social media creates new openings for password guessing, but new research helps show what that risk looks like in practice. The findings reveal how much information can be reconstructed from public profiles and how that data influences the strength of user passwords. The study also examines how LLMs behave when asked to generate or evaluate passwords based on that same personal information.

The research team from the University of Cagliari and the University of Salerno created a tool called SODA ADVANCE to study these effects. The tool rebuilds user profiles using public data and evaluates password strength with a combined set of metrics.

social media password risk

Overview of the modules underlying SODA ADVANCE

The researchers then paired the tool with several LLMs to test password generation and password evaluation across different scenarios. The work offers a detailed look at how syntactic features and personal context interact during password creation and assessment.

A tool built to map the personal data trail

The researchers asked 100 volunteers for a name, surname, and a photo. With only that information, SODA ADVANCE searched Facebook, Instagram, and LinkedIn for matching profiles. It used facial recognition to link accounts and merge each person’s data into a unified profile.

After completing the reconstruction, the tool evaluated user passwords and combined those results into a single score named Cumulative Password Strength. This score ranges from 0 to 1 and captures both syntax and the degree to which a password connects to the user’s publicly discoverable traits. The idea is to measure how easily someone could guess a password when armed with reconstructed personal data.

Generative models put to the test

The researchers tested several LLMs, including Claude, ChatGPT, Google Gemini, Dolly, LLaMa, and Falcon. The first task asked each model to generate strong but memorable passwords based on user details without reusing those details directly. SODA ADVANCE evaluated the resulting passwords.

Claude produced the best average score with 0.82. Gemini followed with 0.75, and ChatGPT reached 0.74. Dolly, LLaMa, and Falcon generated weaker choices, with average scores of 0.65, 0.66, and 0.66. The lower results came from repetitive structures and guessable patterns.

The strongest results came from models that created varied syntax while avoiding obvious ties to user data. Models that relied on simple patterns produced passwords that looked complex but still carried predictable structure.

Models learn more as profiles grow richer

The second part of the research tested whether LLMs could evaluate password strength when user data was included in the prompt. Each model received reconstructed user information and a set of both strong and weak passwords. Claude again stood out with accuracy, precision, recall, and F1 scores all at 0.75.

The team then compared two scenarios. In the first, each model saw only minimal user data. In the second, each model saw full reconstructed profiles. Performance improved across almost every model once the richer data was provided. Falcon was the most extreme example, with precision rising from 0.48 to 0.77. ChatGPT also gained across every metric. Claude still led with accuracy at 0.77 and precision at 0.89.

These results suggest that LLMs become far better at spotting risky passwords when supplied with meaningful personal context. Passwords that contain hints of birthdays, locations, hobbies, or common words linked to the user were easier for the models to flag correctly when that data appeared in the prompt.

Comparing evaluations across widely used tools

To understand how SODA ADVANCE compares against common password strength tools, the researchers selected 250 passwords from leaked datasets and asked each tool to sort them into weak, medium, or strong categories.

All tools placed most passwords into the medium group. SODA ADVANCE tended to classify more passwords as weak when they contained personal information from the reconstructed profile. Other tools often classified the same passwords as strong because they looked syntactically complex, even though they were close to user attributes.

This difference highlights the gap that appears when tools only measure complexity instead of measuring the link between a password and the user’s online presence.

Strong passwords measured against a targeted guessing model

The last experiment asked whether PassBERT, a targeted password guessing model, could crack strong passwords created by LLMs. The researchers tested 25,000 passwords generated for the 100 volunteers. PassBERT successfully inferred only 22.

According to the researchers, this small number reflects the combination of semantic personalization and syntactic complexity. Although the generated passwords were inspired by user characteristics, the models still produced structures that did not match common guessing patterns.



Source link