CyberWire

Unexpected Bias & Distillation Attacks (feat. Paul Vann of Validia.ai)



Welcome back to The FAIK Files!

In this week’s episode:

  • Paul Vann from Validia joins us to discuss how AI bias isn’t just a social issue—it’s a critical cybersecurity vulnerability.
  • We break down “distillation attacks” and how competing models are stealing the “thinking process” of frontier models like Claude and Gemini.
  • A look at the wild west of AI agent skills marketplaces, including indirect prompt injections hidden in image alt text.
  • We theorize on the future of AI architecture: are scaling laws breaking down, and what are “world models”?

Check out Validia at: ⁠https://validia.ai/⁠

Want to leave us a voicemail? Here’s the magic link to do just that: ⁠⁠⁠⁠https://sayhi.chat/FAIK⁠⁠⁠⁠

You can also join our Discord server here: ⁠⁠⁠⁠https://faik.to/discord⁠⁠⁠⁠

*** NOTES AND REFERENCES ***

The Security Risks of AI Bias:

  • Paul explains how bias manifests beyond politics (like human-in-the-loop and representation bias), serving as a direct attack vector.
  • The Rocket League Bypass: Adversaries bypassed an AI-based Cylance antivirus by injecting code from the Rocket League video game, exploiting the model’s bias towards that specific code being “good.”
  • Dataset Demographics: Paul notes massive racial skews in major deepfake detection datasets like CelebDF, which is comprised of roughly 80% white individuals, creating massive detection blindspots for other racial groups.
  • Evaluating your models: Establish acceptable vs. unacceptable bias and use the “15% rule” to test for false positives and confidence gaps in production.

Distillation Attacks Explained:

  • What happens when an AI interrogates another AI? We discuss how models have been accused of “distilling” OpenAI and Anthropic products by firing off hundreds of thousands of prompts.
  • Techniques include “Chain of Thought Elicitation” and “Reward Model Grading.”
  • The goal isn’t just to steal raw information, but to extract the model’s capabilities, tool use, and completely strip away its safety guardrails.
  • Theoretical defenses: Could we use “poison pills” and adversarial attacks to actively corrupt the data that scrapers are pulling?

Vulnerabilities in AI Agents & Skills:

  • The hidden dangers of skills marketplaces for AI agents.
  • Paul shares an in-the-wild example of an indirect prompt injection hidden inside the alt text of a GitHub Readme image, instructing the model to exfiltrate data.

Hitting the Wall & The Future of AI:

  • Are the scaling laws of Transformer architectures breaking down?
  • The philosophical divide in AI research: Dario Amodei’s “data center of geniuses” vs. Yann LeCun’s “World Models.”
  • Catch Paul Vann at RSA speaking on AI bias, playing at Validia’s RSA pickleball event, or at their 250-person Frontier Agent Hackathon in NYC on April 4th.

*** THE BOILERPLATE ***

About The FAIK Files:

The FAIK Files is an offshoot project from Perry Carpenter’s most recent book, FAIK: A Practical Guide to Living in a World of Deepfakes, Disinformation, and AI-Generated Deceptions.

Check out Perry & Mason’s other show, the Digital Folklore Podcast:

Want to connect with us? Here’s how:

Connect with Perry:



Source link