Should we be worried about malicious use of AI language models?

More and more evidence is emerging into how large language models, such as Generative Pre-trained Transformer 3 (GPT-3) used by the likes of OpenAI’s advanced ChatGPT chatbot, seem to be highly vulnerable to abuse through creative prompt engineering by malicious actors.

Moreover, as the capabilities of such models hit the mainstream, new approaches will be needed to fight cyber crime and digital fraud, and everyday consumers will need to become much more sceptical about what they read and believe.

Such are some of the findings of a research project conducted by Finland’s WithSecure with support from the CC-Driver project, a project of the European Union’s Horizon 2020 programme that is focusing on disciplines such as anthropology, criminology, neurobiology and psychology in a collective effort to combat cyber crime.

WithSecure’s research team said universal access to models that deliver human-sounding text in seconds represents a “turning point” in human history.

“With the wide release of user-friendly tools that employ autoregressive language models such as GPT-3 and GPT-3.5, anyone with an internet connection can now generate human-like speech in seconds,” wrote the research team.

“The generation of versatile natural language text from a small amount of input will inevitably interest criminals, especially cyber criminals – if it hasn’t already. Likewise, anyone who uses the web to spread scams, fake news or misinformation in general may have an interest in a tool that creates credible, possibly even compelling, text at superhuman speeds.”

Andrew Patel and Jason Sattler of WithSecure conducted a series of experiments using prompt engineering, a technique used to discover inputs that can yield desirable or useful results, to produce content that they deemed harmful.

During their experiments, they explored how changing the initial human input into GPT-3 models affected the artificial intelligence (AI) text output to identify how creative – or malicious – prompts can create undesirable outcomes.

They were able to use their chosen model to create phishing emails and SMS messages; social media messages designed to troll or harass, or cause damage to brands; social media messages designed to advertise, sell or legitimise scams; and convincing fake news articles.

They were also able to coax the model into adopting particular writing styles, to write about a chosen subject in an opinionated way, and to generate its own prompts based on content.

“The fact that anyone with an internet connection can now access powerful large language models has one very practical consequence: it’s now reasonable to assume any new communication you receive may have been written with the help of a robot,” said Patel, who spearheaded the research.

“Going forward, AI’s use to generate both harmful and useful content will require detection strategies capable of understanding the meaning and purpose of written content.”

Patel and Sattler drew four main conclusions from their work, stating that prompt engineering and malicious prompt creation will inevitably develop as a discipline; that malicious actors will exploit large language models in potentially unpredictable ways; that spotting malicious or abusive content will become harder; and that such models can already be easily used by cyber criminals to make the social engineering components of their attacks more effective.

Patel said he hoped the research project would help to spur the development of more secure large language models that are less susceptible to being manipulated in this way. The team’s full research write-up can be downloaded here.

WithSecure is the latest in a long line of cyber companies to have expressed concerns over GPT-3 technology, which has come to prominence in mainstream discourse thanks to the public release of ChatGPT by OpenAI in November 2022.

Although positively received by many, ChatGPT has already drawn criticism for being supposedly too good at its job in some circumstances. Some have warned that it could be used to render human journalists obsolete, while its potential misuse in academia and scientific research projects was the subject of another research project conducted in the US. This study had the programme generate fake research abstracts based off published medical research, which tricked scientists into thinking they were reading a real report about 33% of the time.

“We began this research before ChatGPT made GPT-3 technology available to everyone,” said Patel. “This development increased our urgency and efforts. Because, to some degree, we are all Blade Runners now, trying to figure out if the intelligence we’re dealing with is real or artificial.”