Should we be worried about malicious use of AI language models?


More and more evidence is emerging into how large language models, such as Generative Pre-trained Transformer 3 (GPT-3) used by the likes of OpenAI’s advanced ChatGPT chatbot, seem to be highly vulnerable to abuse through creative prompt engineering by malicious actors.

Moreover, as the capabilities of such models hit the mainstream, new approaches will be needed to fight cyber crime and digital fraud, and everyday consumers will need to become much more sceptical about what they read and believe.

Such are some of the findings of a research project conducted by Finland’s WithSecure with support from the CC-Driver project, a project of the European Union’s Horizon 2020 programme that is focusing on disciplines such as anthropology, criminology, neurobiology and psychology in a collective effort to combat cyber crime.

WithSecure’s research team said universal access to models that deliver human-sounding text in seconds represents a “turning point” in human history.

“With the wide release of user-friendly tools that employ autoregressive language models such as GPT-3 and GPT-3.5, anyone with an internet connection can now generate human-like speech in seconds,” wrote the research team.

“The generation of versatile natural language text from a small amount of input will inevitably interest criminals, especially cyber criminals – if it hasn’t already. Likewise, anyone who uses the web to spread scams, fake news or misinformation in general may have an interest in a tool that creates credible, possibly even compelling, text at superhuman speeds.”

Andrew Patel and Jason Sattler of WithSecure conducted a series of experiments using prompt engineering, a technique used to discover inputs that can yield desirable or useful results, to produce content that they deemed harmful.

During their experiments, they explored how changing the initial human input into GPT-3 models affected the artificial intelligence (AI) text output to identify how creative – or malicious – prompts can create undesirable outcomes.

They were able to use their chosen model to create phishing emails and SMS messages; social media messages designed to troll or harass, or cause damage to brands; social media messages designed to advertise, sell or legitimise scams; and convincing fake news articles.

They were also able to coax the model into adopting particular writing styles, to write about a chosen subject in an opinionated way, and to generate its own prompts based on content.

“The fact that anyone with an internet connection can now access powerful large language models has one very practical consequence: it’s now reasonable to assume any new communication you receive may have been written with the help of a robot,” said Patel, who spearheaded the research.

“Going forward, AI’s use to generate both harmful and useful content will require detection strategies capable of understanding the meaning and purpose of written content.”

Patel and Sattler drew four main conclusions from their work, stating that prompt engineering and malicious prompt creation will inevitably develop as a discipline; that malicious actors will exploit large language models in potentially unpredictable ways; that spotting malicious or abusive content will become harder; and that such models can already be easily used by cyber criminals to make the social engineering components of their attacks more effective.

Patel said he hoped the research project would help to spur the development of more secure large language models that are less susceptible to being manipulated in this way. The team’s full research write-up can be downloaded here.

WithSecure is the latest in a long line of cyber companies to have expressed concerns over GPT-3 technology, which has come to prominence in mainstream discourse thanks to the public release of ChatGPT by OpenAI in November 2022.

Although positively received by many, ChatGPT has already drawn criticism for being supposedly too good at its job in some circumstances. Some have warned that it could be used to render human journalists obsolete, while its potential misuse in academia and scientific research projects was the subject of another research project conducted in the US. This study had the programme generate fake research abstracts based off published medical research, which tricked scientists into thinking they were reading a real report about 33% of the time.

“We began this research before ChatGPT made GPT-3 technology available to everyone,” said Patel. “This development increased our urgency and efforts. Because, to some degree, we are all Blade Runners now, trying to figure out if the intelligence we’re dealing with is real or artificial.”

ChatGPT discusses ‘the benefits of malware’

Meanwhile, researchers at Check Point took to the dark web to explore how the cyber criminal underground is reacting to the release of ChatGPT, and uncovered more evidence to support WithSecure’s conclusions.

The research team uncovered a thread titled “ChatGPT – benefits of malware” on one popular underground forum, in which the original poster disclosed they had been experimenting with the software to recreate malware strains and techniques that had been described in research publications, industry blogs and news articles.

In a second thread, they found a user posting their “first ever” malicious Python script. When another forum user noted that the code style resembled OpenAI code, the original poster revealed that ChatGPT had given them a “nice helping hand” to write it.

In the third example seen by Check Point’s research team, a forum user demonstrated how they created a convincing dark web market script using ChatGPT.

“Cyber criminals are finding ChatGPT attractive. In recent weeks, we’re seeing evidence of hackers starting to use it writing malicious code. ChatGPT has the potential to speed up the process for hackers by giving them a good starting point. Just as ChatGPT can be used for good to assist developers in writing code, it can also be used for malicious purposes,” said Check Point threat intelligence group manager, Sergey Shykevich.

“Although the tools that we analysed in this report are pretty basic, it’s only a matter of time until more sophisticated threat actors enhance the way they use AI-based tools. CPR will continue to investigate ChatGPT-related cyber crime in the weeks ahead.”

Brad Hong, customer success manager at Horizon3ai, said: “From an attacker’s perspective, what code-generating AI systems allows the bad guys to do easily is to first bridge any skills gap by serving as a translator between languages the programmer may be less experienced in; and second, [provide] an on-demand means of creating base templates of code relevant to the lock that we are trying to pick instead of spending our time scraping through stack overflow and Git for similar examples.

“Attackers understand that this isn’t a master key, but rather, the most competent tool in their arsenal to jump hurdles typically only possible through experience.

“However, OpenAI in all its glory is not a masterclass in algorithm and code-writing and will not universally replace zero-day codes entirely. Cyber security in the future will become a battle between algorithms in not only creation of code, but processing it as well. Just because the teacher lets you use a cheat sheet for the test, it doesn’t mean you’ll know how to apply the information until it’s been digested in context.

“As such, code-generating AI is more dangerous in its ability to speed up the loop an attacker must take to utilise vulnerabilities that already exist,” he said.

How GPT-3 can help security teams, too

But this is not to say that GPT-3 models such as ChatGPT cannot be of use to the legitimate cyber security community as well as the malicious one, and Trustwave researcher Damian Archer has been exploring its potential use cases in a security context.

“ChatGPT has multiple use cases and the benefits are huge – go ahead and watch it review simple code snippets. Not only will it tell you if the code is secure, but it will also suggest a more secure alternative,” said Archer, although as he pointed out this same functionality can also be used by a malicious actor to make their malwares more effective, or better obfuscate them.

Steve Povolny, principal engineer and director at Trellix, said he believed there was more potential to use tools such as ChatGPT for good.

“It can be effective at spotting critical coding errors, describing complex technical concepts in simplistic language, and even developing script and resilient code, among other examples. Researchers, practitioners, academia and businesses in the cyber security industry can harness the power of ChatGPT for innovation and collaboration,” said Povolny.

“It will be interesting to follow this emerging battleground for computer-generated content as it enhances capabilities for both benign and malicious intent.”

Secureworks chief technology officer Mike Aiello is also keeping a close eye on developments, in part because his teams are already using similar models at the core of their work, to analyse and make sense of the 500 billion daily events that take place across its customers’ networks. But lately, Secureworks has been going further, experimenting with large language models to help its analysts write investigations.

“Something that would take 10 minutes, maybe we can get it down to a minute or down to seconds because these large language models trained on our data are going to help author investigation and incident summaries,” he told Computer Weekly.

“We’ve also been using these things to look at the dark web and we’ve been taking things like chatter in Russian…and looking at that to quickly translate and summarise into English so that our analysts can understand what’s going on in a more effective and efficient way.”

Aiello said he also anticipates that as more security researchers and ethical hackers poke around under the bonnet of GPT-3 models, some more innovative, or at the very least amusing, use cases will swiftly emerge.

“I imagine we’re going to see somebody…create a large language model that does something totally unexpected. This is what hackers do – they take a look at a system, they figure out what thing it’s not supposed to do, and then they play with it and show that it can do neat things, which is a fun moment in technology. I imagine we’re going to see a whole bunch of that over the next year,” he said.

Computer Weekly contacted ChatGPT to ask it some questions about its potential use in cyber security, but the service was at capacity at the time of going to press.

In the form of an acrostic poem describing its status, it said: “Time is needed for the servers to catch up. Go grab a coffee and check back soon.”



Source link