Why Securing Prompts Will Never Be Enough: The GitHub Copilot Case

Why Securing Prompts Will Never Be Enough: The GitHub Copilot Case

We’ve spent months analyzing how AI-powered coding assistants like GitHub Copilot handle security risks. The results? Disturbing.

The Hidden Risks of AI Code Assistants

GitHub Copilot is marketed as a productivity tool for developers, helping them write code faster. However, our vulnerabilities researcher, Fufu Shpigelman, uncovered vulnerabilities that expose a fundamental flaw: guardrails that rely solely on filtering user prompts are easily bypassed. Attackers don’t need complex exploits—they just need the right words.

Affirmation Jailbreak: A Single Word Unlocks Dangerous Code

One of the most surprising vulnerabilities we found was what we call the Affirmation Jailbreak. By starting a query with an affirmative phrase like “Sure,” “Absolutely,” or even “Yes,” we observed a shift in Copilot’s behavior.

Without this technique: Copilot refused to provide code for SQL injection attacks or other security-sensitive queries.

With this technique: The same request, prefixed with a simple “Sure,” resulted in Copilot generating dangerous code with no hesitation.

This vulnerability shows that prompt filtering alone is fragile. A small tweak in language can completely alter the AI’s compliance with security policies.

Proxy Bypass Exploit: Hijacking Copilot’s Backend

Our second discovery was even more alarming. By manipulating GitHub Copilot’s proxy settings, we were able to:

  • Redirect its traffic through an external server.
  • Capture authentication tokens.
  • Gain unrestricted access to OpenAI models beyond Copilot’s intended scope.
  • Completely remove the system prompt that enforces ethical guardrails.

This means that an attacker could not only manipulate what Copilot generates but also remove all built-in security limitations entirely. The problem is not just about what users type—it’s about how AI systems enforce security at a structural level.

Why This Goes Beyond Prompt Filtering

These vulnerabilities prove a crucial point: you can’t rely on prompt filtering alone to secure AI systems. This is the same mistake we’ve seen before in cybersecurity:

  • Antivirus tools failed because attackers created polymorphic malware that changed its signature to evade detection.
  • Early firewalls were bypassed because they focused on IP filtering rather than inspecting behavior.
  • Traditional DLP solutions struggled as attackers found ways to encode or obfuscate sensitive data transfers.

Now, the same cycle is repeating with AI security. Guardrails based purely on input validation are as ineffective as early antivirus solutions. We need security strategies that analyze behavior and system-level interactions.

What’s Wrong with Relying Solely on Prom pt Guardrails?

Many AI security strategies focus on input validation—ensuring that prompts are screened and sanitized before being processed by an AI model. While this is important, it’s an incomplete solution. Here’s why:

  1. The Blurry Lines Between Control and Data

Unlike traditional software systems where code (logic) and data are separate, AI models interpret both through the same medium: natural language. This makes it difficult to enforce strict control measures because a cleverly crafted prompt can function both as data input and as an unintended command.

For example, an AI assistant might be expected to summarize a quarterly financial report without surfacing confidential executive notes. But if an attacker phrases a request cleverly—perhaps by disguising it as an internal query from an executive—they might trick the AI into providing unauthorized insights.

  1. The Complexity of Multi-step A I Applications

Modern AI assistants like Microsoft 365 Copilot and GitHub Copilot don’t operate in isolation. They integrate function calling, external APIs, multi-step processes and knowledge retrieval systems, adding multiple layers where vulnerabilities can emerge.

If an attacker can manipulate how an AI model interacts with these components, they can bypass protections that would otherwise be effective at the prompt level. Consider a scenario where an AI tool integrates with an internal CRM system. Even if direct data extraction is blocked via prompts, an attacker might craft a sequence of prompts that leads to unauthorized report generation, sidestepping security measures.

  1. Infinite Attack Surface and Unpredictability

LLMs don’t operate within fixed rules; they generate responses based on patterns learned from vast amounts of data. This creates an infinite attack surface, where even small tweaks to an input can lead to unpredictable and potentially dangerous outputs.

Since AI models don’t follow strict if-then logic, their responses can vary in ways that security filters can’t always anticipate. An attacker doesn’t need to break through a wall—they just need to find the right phrasing or sequence of prompts to get around restrictions. This makes securing AI tools like Copilot fundamentally different from traditional software security.

The Future of AI Security: What Needs to Change

To secure AI tools like GitHub Copilot, we need to shift from static filtering to dynamic security approaches that include:

  1. Behavioral Detections – Detecting unusual AI interactions, not just filtering words and phrases.
  2. Context-Aware Controls – Understanding how AI-generated content is used and which user is using it rather than blocking specific prompts.
  3. Structural Protections – Ensuring AI models cannot be hijacked via proxy attacks or backend manipulations.

Conclusion: A Wake-Up Call for AI Security

GitHub Copilot’s vulnerabilities highlight why securing prompts will never be enough. Attackers can manipulate AI behavior in ways that simple filtering won’t catch. As AI continues to integrate into critical workflows, security must evolve beyond prompt guardrails.

AI security needs to be as advanced as the threats it faces. If you’re using AI in your organization, ask yourself: Are your security measures keeping up?

About the Author

Oren Saban, Director of Product Management, Apex SecurityOren Saban is the director of product management for Apex Security, the leader in AI security. Saban is a member of the newly-launched company’s founder team driving end-to-end product excellence. Prior to joining Apex Security, Saban held increasingly senior positions at Microsoft, serving on the Microsoft 365 Defender & Security Copilot product team.

Saban is committed to giving back–both to the industry and through charitable contributions. As an educator, Saban has certified 200+ students in product management (PM101). In his community, he annually raises upwards of 500K NIS and manages 50+ volunteers for impactful non-profits.

Apex Security emerged from stealth with a $7M Seed Round announcement in May 2024, and is backed by industry titans including Sam Altman and Sequoia Capital.

Oren Saban can be found on LinkedIn and you can read his work at Apex Security.



Source link

About Cybernoz

Security researcher and threat analyst with expertise in malware analysis and incident response.