Amid the generative-artificial-intelligence frenzy of the last few months, security researchers have been revisiting the concern that AI-generated voices, or voice deepfakes, have gotten convincing enough and easy enough to produce that scammers will start using them en masse.
There have been a couple of high-profile incidents in recent years in which cybercriminals have reportedly used voice deepfakes of company CEOs in attempts to steal large amounts of money—not to mention that documentarians posthumously created voice deepfakes of Anthony Bourdain. But are criminals at the turning point where any given spam call could contain your sibling’s cloned voice desperately seeking “bail money?” No, researchers say—at least not yet.
The technology to create convincing, robust voice deepfakes is powerful and increasingly prevalent in controlled settings or situations where extensive recordings of a person’s voice are available. At the end of February, Motherboard reporter Joseph Cox published findings that he had recorded five minutes of himself talking and then used a publicly available generative AI service, ElevenLabs, to create voice deepfakes that defeated a bank’s voice-authentication system. But like generative AI’s shortcomings in other mediums, including limitations of text-generation chatbots, voice deepfake services still can’t consistently produce perfect results.
“Depending on the attack scenario, real-time capabilities and the quality of the stolen voice sample must be considered,” says Lea Schönherr, a security and adversarial machine learning researcher at the CISPA Helmholtz Center for Information Security in Germany. “Although it is often said that only a few seconds of the stolen voice are needed, the quality and the length have a big impact on the result of the audio deepfake.”
Digital scams and social engineering attacks like phishing are a seemingly ever-growing threat, but researchers note that scams in which attackers call a victim and attempt to impersonate someone the target knows have existed for decades—no AI necessary. And the very fact of their longevity means that these hustles are at least somewhat effective at tricking people into sending attackers money.
“These scams have been around forever. Most of the time, it doesn’t work, but sometimes they get a victim who is primed to believe what they’re saying, for whatever reason,” says Crane Hassold a longtime social engineering researcher and former digital behavior analyst for the FBI. “Many times those victims will swear the person they were talking to was the impersonated person when, in reality, it’s just their brain filling in gaps.”
Hassold says that his grandmother was a victim of an impersonation scam in the mid-2000s when attackers called and pretended to be him, persuading her to send them $1,500.
“With my grandmother, the scammer didn’t say who was calling initially, they just started talking about how they had been arrested while attending a music festival in Canada and needed her to send money for bail. Her response was ‘Crane, is that you?’ and then they had exactly what they needed,” he says. “Scammers are essentially priming their victims into believing what they want them to believe.”
As with many social engineering scams, voice-impersonation cons work best when the target is caught up in a sense of urgency and just trying to help someone or complete a task they believe is their responsibility.
“My grandmother left me a voicemail while I was driving to work saying something like ‘I hope you’re OK. Don’t worry, I sent the money, and I won’t tell anyone,’” Hassold says.
Justin Hutchens, director of research and development at the cybersecurity firm Set Solutions, says he sees deepfake voice scams as a rising concern, but he’s also worried about a future in which AI-powered scams become even more automated.
“I expect that in the near future, we will start seeing threat actors combining deepfake voice technology with the conversational interactions supported by large language models,” Hutchens says of platforms like Open AI’s ChatGPT.
For now, though, Hassold cautions against being too quick to assume that voice-impersonation scams are being driven by deepfakes. After all, the analog version of the scam is still out there and still compelling to the right target at the right time.