ITSecurityGuru

What to do When Your AI Guardrails Fail


I want to talk about a bug. Not because the bug itself was exceptional, but because what it exposed should change how every organisation architects AI governance.

For several weeks earlier this year, Microsoft 365 Copilot read and summarised confidential emails despite sensitivity labels and Data Loss Prevention policies being correctly configured to block that behaviour. The bug, tracked as CW1226324, affected emails in users’ Sent Items and Drafts folders. Sensitive legal communications, business contracts and health information could all be processed by an AI that explicit organisational policies said should never touch it.

Microsoft’s response was that users only accessed information they were already authorised to see. This may be technically accurate as Copilot operates within the user’s mailbox context. But the sensitivity labels weren’t there to stop users from reading their own email. They were there to stop AI from processing confidential content. The AI processed it anyway.

A single point of failure

The architectural reality that this incident made visible was that every control designed to keep Copilot away from confidential data (whether it be sensitivity labels, DLP policies, or access restrictions) lived inside the same platform as Copilot itself. When a code error hit, all controls failed at once. There was no independent layer that caught it, no secondary check, and no second chance.

We wouldn’t design physical security this way. Nobody would build a vault where the door lock, alarm, and surveillance cameras all run through a single circuit breaker. But that’s what happened here. Microsoft was the AI provider, the security control provider, and the only entity with visibility into whether those controls were working. When the platform broke, organisations had no independent way to detect the failure.

A question of trust

I’m not writing this to single out Microsoft. Copilot is a powerful tool, and code bugs happen. Plus, the team deserve credit for identifying the issue and rolling out a fix. The problem isn’t that Microsoft had a bug. The problem is the architecture that turned a single bug into a complete governance failure with no independent detection for weeks.

This pattern isn’t unique, of course. Whether it’s Copilot, Google Gemini for Workspace, Salesforce Einstein, or any other enterprise AI tool, the typical model is the same. The AI platform provides the governance controls, and organisations trust those controls to work. When they don’t, there’s nothing underneath.

The World Economic Forum’s 2026 Global Cybersecurity Outlook quantified this gap. Among CEOs, data leaks through generative AI are now the top cybersecurity concern, cited by 30%. Among cybersecurity professionals, that concern rises to 34%. Yet roughly one-third of organisations still have no process to validate AI security before deployment.

The WEF report also warned that without strong governance, AI agents can accumulate excessive privileges or propagate errors at scale. Recommending continuous verification, audit trails, and zero-trust principles that treat every AI interaction as untrusted by default. The recent Copilot incident demonstrates why those recommendations exist.

The compliance exposure

If Copilot processed emails containing protected health information, organisations may need to assess whether this constitutes a reportable breach under the Data Protection Act 2018. The question isn’t whether the user was authorised, it’s whether the AI’s processing was authorised under the business associate agreement. Microsoft’s public statement doesn’t resolve that analysis.

Under GDPR, Article 32 requires appropriate technical measures for security of processing. If an organisation’s sole measure was a vendor’s sensitivity labels that failed for weeks, that’s a difficult argument to make. The EU AI Act’s Article 12 adds another layer: if the only records of what the AI accessed come from the vendor that had the failure, organisations lack the independent documentation the regulation demands.

More is needed

Of course, the answer isn’t to stop using AI. Such tools deliver real productivity gains. The answer is to stop trusting AI platforms to govern themselves.

Defence in depth has been applied to network security for decades. Multiple independent layers, each capable of catching what the others miss. But for AI governance, we’ve been operating with just a single layer: the platform’s own controls. The Copilot bug proved that more is needed.

Defence in depth for AI governance means an independent data layer between AI platforms and sensitive content. AI doesn’t get direct access to repositories. It authenticates through an external governance layer that enforces policies independently. Purpose binding that restricts which data classifications AI can access, least-privilege controls, continuous verification, and audit trails that the organisation controls.

No more sleepwalking

Every major technology shift creates a moment where organisations decide whether to bolt security on after the fact or build it into the architecture from the start. We saw it with cloud migration. We saw it with remote work. We’re seeing it now with AI.

The Microsoft Copilot bug didn’t break new ground. It confirmed a structural vulnerability the industry has been sleepwalking past for two years. Organisations that treat this bug as a wake-up call by building independent AI governance at the data layer will be able to scale AI adoption with confidence. They’ll satisfy regulators with independent evidence and they’ll protect sensitive data not through trust in vendor controls, but through architecture that doesn’t depend on trust at all.

 



Source link