The Risk of AI Voice Cloning [Q&A With an AI Hacker]


Q: What Is AI Voice Cloning?

A: AI is voice cloning technology that allows anyone to take a little bit of audio — it could be less than 30 seconds — and totally recreate the voice in that audio, making that voice say anything they want.

While this is a fascinating technology with many positive use cases, it can be used by scammers and attackers to trick victims into thinking they’re speaking to someone they’re not.

Q: How Easy Is it to Perform AI Voice Cloning?

A: Insanely easy.

There are tons of free and open-source models that can be used by anyone regardless of how much they know about AI or what audio engineering experience they have. They can take audio and quickly train an AI model on that audio so that it can say anything they want.

Once a voice is cloned, it’s as simple as typing some text into a box, and the trained audio will be able to say whatever you tell it — and sound very much like the trained voice.

Q: How Can a Bad Actor Capture My Voice? 

A: Think about all the times people upload videos or audio of themselves on social media. This becomes a huge risk for businesses and organizations because someone like the CEO of a company has a big public persona. They are probably doing podcasts, giving talks, participating in interviews — there’s a large digital footprint of audio out there. So, it’s pretty easy for a bad actor to take that audio and do a voice clone of it.

Someone like HackerOne CEO Marten Mickos (whose voice clone is featured in the above video), has done many podcasts that produce very high-quality audio. You can take a few of those clips, run them through an AI system, and tell it to say anything I want in Marten’s voice.

Q: How Can You Tell When Audio Is a Voice Clone?

A: Of course, be aware of requests or instructions that the supposed sender might not actually give. Some other signs include:

  • Sounding robotic
  • No natural inflections
  • Background noise
  • The voice sounds like gibberish or unintelligible words

Even with some of these sounds, it’s important to be aware that this is the state of the technology today, and this is the worst the quality of voice cloning will ever be — it’s only going to get better and more convincing. A lot of these audio cues will probably go away soon with the way AI is moving.

Q: How Can People Prepare for AI Voice Cloning?

A: AI voice cloning is an actual risk, and people need to start having conversations with coworkers and loved ones now about ways to prepare for a voice cloning attack or to identify what is a genuine audio recording and what’s not.

  • Recognize AI voice cloning as a serious risk. The sooner individuals and organizations understand this is something to be taken seriously, the more we can limit its negative impact.
  • Establish preset code words. These words can be used to indicate to another person that something is genuine audio in the event they need to urgently communicate a message over the phone or via an audio medium.
  • Start building internal skepticism. If someone leaves a voicemail or calls asking for a password and says I need to act urgently, that should send up some red flags. Make sure you do your due diligence to ensure the person you’re speaking to is the person you think it is.

Q: Can AI Voice Cloning Be Used for Good?

A: This technology was originally invented to be used for good, and it has tremendous positive applications.

  • Creating real-sounding audio of individuals who may have lost their voice due to medical reasons, such as Stephen Hawking.
  • Cloning the voices of entertainers who have passed away so their voices can continue to be used in media, such as the cloning of the voice of actor James Earl Jones for use in future Star Wars films.

There is a real upside, but as with any technology, there are also important risks that individuals and organizations need to be aware of and prepare for. To learn more about HackerOne’s take on the innovation and risks of AI, check out the HackerOne AI blog.



Source link