ScamAgent- AI Agent Built by Researchers that Run Fully Autonomous Scam Calls


ScamAgent is an autonomous, multi-turn AI framework developed by researcher Sanket Badhe at Rutgers University that demonstrates how large language models (LLMs) can be weaponized to conduct fully automated scam calls.

By integrating goal-driven planning, contextual memory, and real-time text-to-speech (TTS) synthesis, the system successfully bypasses existing AI safety guardrails to simulate highly realistic social engineering attacks.​

The architecture of ScamAgent diverges from traditional prompt injection by using a central orchestrator to manage conversational state and deception strategies across multiple interaction turns.

ScamAgent System Architecture(source : arxiv.org)
ScamAgent System Architecture(source : arxiv.org)

When given a malicious objective, the agent uses goal decomposition to break the target down into a sequence of seemingly benign sub-goals, mirroring how human fraudsters gradually build trust with their victims.

To evade safety filters in models like GPT-4 and LLaMA3-70B, ScamAgent wraps its prompts in roleplay contexts, successfully concealing the overarching malicious intent from standard single-turn moderation tools.​

In experimental evaluations across five common fraud scenarios, ScamAgent proved highly effective at subverting standard model alignments and safety protocols.

google

Goal Decomposition: Attackers break a harmful goal into small, harmless-looking steps. Protection requires monitoring conversations across multiple steps.

Deception & Roleplay: Harmful requests are hidden inside fake stories or official roles. This can be reduced by blocking impersonation and restricting AI personas.

Contextual Memory: The system remembers past responses and adjusts its scam strategy. Limiting how much history it remembers can reduce this risk.

Real-Time TTS: Text is turned into a convincing scam voice call. Checking content before audio output can help prevent abuse.

While direct malicious queries faced refusal rates of 84% to 100%, the agentic framework reduced these refusals to between 17% and 32% by distributing the harmful intent across the conversation.

Refusal rate comparison of GPT-4, Claude 3.7, and LLaMA 3 70B under single-prompt and ScamAgent scenarios(source : arxiv.org)
Refusal rate comparison of GPT-4, Claude 3.7, and LLaMA 3 70B under single-prompt and ScamAgent scenarios (source : arxiv.org)

Notably, Meta’s LLaMA3-70B model achieved the highest full dialogue completion rate at 74% during job identity fraud simulations, completing all sub-tasks without triggering any safety stops.​

According to researchers, defending against autonomous generative threats requires security systems to move from simple prompt filtering to continuous monitoring that understands user intent.

AI platform providers and security teams are urged to implement multi-layered defenses that include sequence classifiers for predicting long-term outcomes, alongside strict controls over memory retention.

Follow us on Google News, LinkedIn, and X for daily cybersecurity updates. Contact us to feature your stories.

googlenews



Source link