GenAI is exposing sensitive data at scale

GenAI is exposing sensitive data at scale

Sensitive data is everywhere and growing fast. A new report from Concentric AI highlights how unstructured data, duplicate files, and risky sharing practices are creating serious problems for security teams. The findings show how generative AI tools like Microsoft Copilot are adding complexity, while old problems like oversharing and poor data hygiene continue to create exposure.

Generative AI raises new concerns

On average, Copilot accessed nearly three million sensitive data records per organization during the first half of 2025. Researchers also found more than 3,000 user interactions per organization with Copilot, which means more chances for sensitive data to be modified or shared without proper controls.

The report warns that shadow GenAI use, where employees rely on unsanctioned tools, adds further risk since organizations may not even know where their data is going.

Too many permissions, too much exposure

The report shows that excessive sharing remains a core problem. Data is frequently shared with people and systems that don’t need access.

Across all organizations in the study, an average of three million sensitive data records were shared externally, making up more than half of all shared files. Financial services firms had the highest percentage of external sharing involving sensitive data at 73 percent.

One particular risk comes from Anyone links, which allow unrestricted access without sign-in. In healthcare, a large share of files shared through these links contained sensitive data. Across industries, many of the files shared this way also included sensitive information.

Internal sharing can be just as risky. In many organizations, a significant portion of files shared broadly within the company contained sensitive data. In retail and financial services, most files shared with personal accounts held sensitive information.

Data sprawl drives inefficiency and risk

As data grows, so does clutter. Researchers found organizations had an average of 10 million duplicate data records, with government and education organizations seeing duplication rates above 30 percent.

Stale data is also a problem. Across industries, organizations had an average of seven million stale records, with manufacturing having the highest percentage at about one-fourth of its total data. Holding onto old data not only increases costs but also makes it harder to manage risk.

The report also highlights orphaned data, which has no owner, and inactive user data, which belongs to former employees or dormant accounts. Organizations averaged four million orphaned data records and two million inactive user data records. In government and education, inactive data alone accounted for nearly 10 percent of all data.


Source link

About Cybernoz

Security researcher and threat analyst with expertise in malware analysis and incident response.