Securing AI systems against evasion, poisoning, and abuse

Adversaries can intentionally mislead or “poison” AI systems, causing them to malfunction, and developers have yet to find an infallible defense against this. In their latest publication, NIST researchers and their partners highlight these AI and machine learning vulnerabilities.

Taxonomy of attacks on Generative AI systems

Understanding potential attacks on AI systems

The publication, “Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations (NIST.AI.100-2),” is a key component of NIST’s broader initiative to foster the creation of reliable AI. This effort aims to facilitate the implementation of NIST’s AI Risk Management Framework and aims to assist AI developers and users in understanding potential attacks and strategies to counter them, acknowledging that there is no silver bullet.

“This is the best AI security publication I’ve seen. What’s most noteworthy are the depth and coverage. It’s the most in-depth content about adversarial attacks on AI systems that I’ve encountered. It covers the different forms of prompt injection, elaborating and giving terminology for components that previously weren’t well-labeled. It even references prolific real-world examples like the DAN (Do Anything Now) jailbreak and some amazing indirect prompt injection work. It includes multiple sections covering potential mitigations but is clear about it not being a solved problem yet. It also covers the open vs closed model debate. There’s a helpful glossary at the end, which I plan to use as extra “context” to large language models when writing or researching AI security. It will ensure the LLM and I are working with the same definitions specific to this subject domain,” Joseph Thacker, principal AI engineer and security researcher at AppOmni, told Help Net Security.

AI integration and the challenges of data reliability

AI systems are now integrated into various aspects of modern life, serving roles from driving vehicles to being online chatbots for customer interaction and even aiding doctors in diagnosing diseases. These systems are trained using extensive data sets. For instance, an autonomous vehicle is trained with images of roads and traffic signs. At the same time, a chatbot based on a large language model (LLM) might learn from online conversation records. This data is crucial for the AI to respond appropriately in different scenarios.

However, a significant concern is the reliability of this data. Often sourced from websites and public interactions, the data is vulnerable to manipulation by malicious actors. This risk exists during the AI system’s training phase and later as the AI adapts its behavior through real-world interactions. Such tampering can lead to undesirable AI performance. For example, chatbots might start using offensive or racist language if strategically designed harmful prompts bypass their safety mechanisms.

“The risks of AI are as significant as the potential benefits. The latest publication from NIST is a great start to explore and categorize attacks against AI systems. It defines a formal taxonomy and provides a good set of attack classes. It does miss a few areas, such as misuse of the tools to cause harm, abuse of inherited trust by people believing AI is an authority, and the ability to de-identify people and derive sensitive data through aggregated analysis,” Matthew Rosenquist, CISO at Eclipz.io commented.

“The paper does not discuss the most significant risks of all, those associated with system implementation. As we see with encryption tools, most exploited vulnerabilities are not with the algorithmic systems themselves but in how they are insecurely implemented. The same will be true with AI systems. Cybersecurity must be actively involved in the deployment and allowed usage of AI,” Rosenquist concluded.

Understanding and mitigating potential attacks

Partly due to the immense size of datasets used in AI training, which are too vast for effective human monitoring and filtering, there currently isn’t a fail-safe method to shield AI from being misled. To support developers, the new report provides a comprehensive guide on potential attacks that AI products may face and suggests strategies to mitigate their impact.

The report examines the four primary categories of attacks: evasion, poisoning, privacy, and abuse. Additionally, it categorizes these attacks based on various factors, including the attacker’s intentions and goals, their capabilities, and their level of knowledge.

Evasion attacks happen post-deployment of an AI system and involve modifying an input to alter the system’s response. For instance, adding symbols to stop signs could lead an autonomous vehicle to mistake them for speed limit signs.

Poisoning attacks take place during the training phase through the introduction of corrupted data. For instance, embedding a significant amount of offensive language into conversation records could lead a chatbot to believe such language is normal, causing it to adopt it in its interactions with customers.

Privacy attacks happen at the deployment stage and aim to extract confidential information about the AI or its training data for malicious purposes. An attacker might pose numerous questions to a chatbot, then analyze the responses to deduce the model’s vulnerabilities or infer its data sources. Introducing harmful examples into these sources could lead to inappropriate behavior by the AI. Moreover, it can be challenging to make the AI disregard these specific harmful examples afterwards.

Abuse attacks entail embedding false information into a source, like a website, which is then assimilated by an AI. Differing from the previously mentioned poisoning attacks, abuse attacks focus on feeding the AI erroneous data from a genuine, yet tampered source, with the aim of redirecting the AI system’s original purpose.

“Most of these attacks are fairly easy to mount and require minimum knowledge of the AI system and limited adversarial capabilities. Poisoning attacks, for example, can be mounted by controlling a few dozen training samples, which would be a very small percentage of the entire training set,” said co-author Alina Oprea, a professor at Northeastern University.

“Despite the significant progress AI and machine learning have made, these technologies are vulnerable to attacks that can cause spectacular failures with dire consequences. There are theoretical problems with securing AI algorithms that simply haven’t been solved yet. If anyone says differently, they are selling snake oil,” he concluded.

Source link