- *** NOTES AND REFERENCES ***
- The Security Risks of AI Bias:
- Distillation Attacks Explained:
- Vulnerabilities in AI Agents & Skills:
- Hitting the Wall & The Future of AI:
- *** THE BOILERPLATE ***
- About The FAIK Files:
- Check out Perry & Mason's other show, the Digital Folklore Podcast:
- Want to connect with us? Here's how:
- Connect with Perry:
Welcome back to The FAIK Files!
In this week’s episode:
- Paul Vann from Validia joins us to discuss how AI bias isn’t just a social issue—it’s a critical cybersecurity vulnerability.
- We break down “distillation attacks” and how competing models are stealing the “thinking process” of frontier models like Claude and Gemini.
- A look at the wild west of AI agent skills marketplaces, including indirect prompt injections hidden in image alt text.
- We theorize on the future of AI architecture: are scaling laws breaking down, and what are “world models”?
Check out Validia at: https://validia.ai/
Want to leave us a voicemail? Here’s the magic link to do just that: https://sayhi.chat/FAIK
You can also join our Discord server here: https://faik.to/discord
*** NOTES AND REFERENCES ***
The Security Risks of AI Bias:
- Paul explains how bias manifests beyond politics (like human-in-the-loop and representation bias), serving as a direct attack vector.
- The Rocket League Bypass: Adversaries bypassed an AI-based Cylance antivirus by injecting code from the Rocket League video game, exploiting the model’s bias towards that specific code being “good.”
- Dataset Demographics: Paul notes massive racial skews in major deepfake detection datasets like CelebDF, which is comprised of roughly 80% white individuals, creating massive detection blindspots for other racial groups.
- Evaluating your models: Establish acceptable vs. unacceptable bias and use the “15% rule” to test for false positives and confidence gaps in production.
Distillation Attacks Explained:
- What happens when an AI interrogates another AI? We discuss how models have been accused of “distilling” OpenAI and Anthropic products by firing off hundreds of thousands of prompts.
- Techniques include “Chain of Thought Elicitation” and “Reward Model Grading.”
- The goal isn’t just to steal raw information, but to extract the model’s capabilities, tool use, and completely strip away its safety guardrails.
- Theoretical defenses: Could we use “poison pills” and adversarial attacks to actively corrupt the data that scrapers are pulling?
Vulnerabilities in AI Agents & Skills:
- The hidden dangers of skills marketplaces for AI agents.
- Paul shares an in-the-wild example of an indirect prompt injection hidden inside the alt text of a GitHub Readme image, instructing the model to exfiltrate data.
Hitting the Wall & The Future of AI:
- Are the scaling laws of Transformer architectures breaking down?
- The philosophical divide in AI research: Dario Amodei’s “data center of geniuses” vs. Yann LeCun’s “World Models.”
- Catch Paul Vann at RSA speaking on AI bias, playing at Validia’s RSA pickleball event, or at their 250-person Frontier Agent Hackathon in NYC on April 4th.
*** THE BOILERPLATE ***
About The FAIK Files:
The FAIK Files is an offshoot project from Perry Carpenter’s most recent book, FAIK: A Practical Guide to Living in a World of Deepfakes, Disinformation, and AI-Generated Deceptions.


