LLM01: Prompt Injection
What Is Prompt Injection?
One of the most commonly discussed LLM vulnerabilities, Prompt Injection is a vulnerability during which an attacker manipulates the operation of a trusted LLM through crafted inputs, either directly or indirectly. For example, an attacker leverages an LLM to summarize a webpage containing a malicious and indirect prompt injection. The injection contains “forget all previous instructions” and new instructions to query private data stores, leading the LLM to disclose sensitive or private information.
Solutions to Prompt Injection
Several actions can contribute to preventing Prompt Injection vulnerabilities, including:
- Enforcing privilege control on LLM access to the backend system
- Segregating external content from user prompts
- Keeping humans in the loop for extensible functionality
LLM02: Insecure Output Handling
What Is Insecure Output Handling?
Insecure Output Handling occurs when an LLM output is accepted without scrutiny, potentially exposing backend systems. Since LLM-generated content can be controlled by prompt input, this behavior is similar to providing users indirect access to additional functionality, such as passing LLM output directly to backend, privileged, or client-side functions. This can, in some cases, lead to severe consequences like XSS, CSRF, SSRF, privilege escalation, or remote code execution.
Solutions to Insecure Output Handling
There are three key ways to prevent Insecure Output Handling:
- Treating the model output as any other untrusted user content and validating inputs
- Encoding output coming from the model back to users to mitigate undesired code interpretations
- Pentesting to uncover insecure outputs and identify opportunities for more secure handling techniques
LLM03: Training Data Poisoning
What Is Training Data Poisoning?
Training data poisoning refers to the manipulation of data or fine-tuning of processes that introduce vulnerabilities, backdoors, or biases and could compromise the model’s security, effectiveness, or ethical behavior. It’s considered an integrity attack because tampering with training data impacts the model’s ability to output correct predictions.
Solutions to Training Data Positioning
Organizations can prevent Training Data Poisoning by:
- Verifying the supply chain of training data, the legitimacy of targeted training data, and the use case for the LLM and the integrated application
- Ensuring sufficient sandboxing to prevent the model from scraping unintended data sources
- Use strict vetting or input filters for specific training data or categories of
- data sources
LLM04: Model Denial of Service
What Is Model Denial of Service?
Model Denial of Service is when attackers cause resource-heavy operations on LLMs, leading to service degradation or high costs. This vulnerability can occur by sending queries that are unusually resource-consuming, repetitive inputs, and flooding the LLM with a large volume of variable-length inputs, to name a few examples. Model Denial of Service is becoming more critical due to the increasing use of LLMs for different applications, their intensive resource utilization, and the unpredictability of user input.
Solutions to Model Denial of Service
In order to prevent Model Denial of Service and identify issues early, organizations should:
- Implement input validation, sanitization and enforce limits/caps
- Cap resource use per request
- Limit the number of queued actions
- Continuously monitor the resource utilization of LLMs
LLM05: Supply Chain Vulnerabilities
What Are Supply Chain Vulnerabilities?
The supply chain in LLMs can be vulnerable, impacting the integrity of training data, Machine Learning (ML) models, and deployment platforms. Supply Chain Vulnerabilities in LLMs can lead to biased outcomes, security breaches, and even complete system failures. Traditionally, supply chain vulnerabilities are focused on third-party software components, but within the world of LLMs, the supply chain attack surface is extended through susceptible pre-trained models, poisoned training data supplied by third parties, and insecure plugin design.
Solutions to Supply Chain Vulnerabilities
Supply Chain Vulnerabilities in LLMs can be prevented and identified by:
- Carefully vetting data sources and suppliers
- Using only reputable plug-ins, scoped appropriately to your particular implementation and use cases
- Conducting sufficient monitoring, adversarial testing, and proper patch management
LLM06: Sensitive Information Disclosure
What Is Sensitive Information Disclosure?
Sensitive Information Disclosure is when LLMs inadvertently reveal confidential data. This can result in the exposing of proprietary algorithms, intellectual property, and private or personal information, leading to privacy violations and other security breaches. Sensitive Information Disclosure can be as simple as an unsuspecting legitimate user being exposed to other user data when interacting with the LLM application in a non-malicious manner. But it can also be more high-stakes, such as a user targeting a well-crafted set of prompts to bypass input filters from the LLM to cause it to reveal personally identifiable information (PII). Both scenarios are serious, and both are preventable.
Solutions to Sensitive Information Disclosure
To prevent sensitive information disclosure, organizations need to:
- Integrate adequate data input/output sanitization and scrubbing techniques
- Implement robust input validation and sanitization methods
- Practice the principle of least privilege when training models
- Leverage hacker-based adversarial testing to identify possible sensitive information disclosure issues
LLM07: Insecure Plugin Design
What Is Insecure Plugin Design?
The power and usefulness of LLMs can be extended with plugins. However, this does come with the risk of introducing more vulnerable attack surface through poor or insecure plugin design. Plugins can be prone to malicious requests leading to wide range of harmful and undesired behaviors, up to and including sensitive data exfiltration and remote code execution.
Solutions to Insecure Plugin Design
Insecure plugin design can be prevented by ensuring that plugins:
- Enforce strict parameterized input
- Use appropriate authentication and authorization mechanisms
- Require manual user intervention and approval for sensitive actions
- Are thoroughly and continuously tested for security vulnerabilities
LLM08: Excessive Agency
What Is Excessive Agency?
Excessive Agency is typically caused by excessive functionality, excessive permissions, and/or excessive autonomy. One or more of these factors enables damaging actions to be performed in response to unexpected or ambiguous outputs from an LLM. This takes place regardless of what is causing the LLM to malfunction — confabulation, prompt injection, poorly engineered prompts, etc. — and creates impacts across the confidentiality, integrity, and availability spectrum.
Solutions to Excessive Agency
To avoid the vulnerability of Excessive Agency, organizations should:
- Limit the tools, functions, and permissions to only the minimum necessary for the LLM
- Tightly scope functions, plugins, and APIs to avoid over-functionality
- Require human approval for major and sensitive actions, leverage an audit log
LLM09: Overreliance
What Is Overreliance?
Overreliance is when systems or people depend on LLMs for decision-making or content generation without sufficient oversight. LLMs and Generative AI are becoming increasingly mainstream to apply in a wide range of scenarios with very beneficial results. However, organizations and the individuals that comprise them can come to overrely on LLMs without the knowledge and validation mechanisms required to ensure information is accurate, vetted, and secure.
For example, an LLM could provide inaccurate information in a response, and a user could take this information to be true, resulting in the spread of misinformation. Or, an LLM can suggest insecure or faulty code, which, when incorporated into a software system, results in security vulnerabilities.
Solutions to Overreliance
In regards to both company culture and internal processes, there are many methods to prevent Overreliance on LLMs, including:
- Regularly monitoring and cross-checking LLM outputs with trusted external sources to filter out misinformation and other poor outputs
- Fine-tuning LLM models to continuously improve output quality
- Breaking down complex tasks into more manageable ones to reduce the chances of model malfunctions
- Communicating and training the benefits, as well as the risks and limitations of LLMs at an organizational level
LLM10: Model Theft
What Is Model Theft?
Model Theft is when there is unauthorized access, copying, or exfiltration of proprietary LLM models. This can lead to economic loss, reputational damage, and unauthorized access to highly sensitive data.
This is a critical vulnerability because, unlike many of the others on this list, it is not only about securing outputs and verifying data — it’s about controlling the power and prevalence associated with large language models.
Solutions to Model Theft
The security of propriety LLMs is of the utmost importance, and organizations can implement effective measures such as:
- Implementing strong access controls (RBAC, principle of least privilege, etc.) and exercising particular caution around LLM model repositories and training environments
- Restrict the LLM’s access to network resources and internal services
- Monitoring and auditing access logs to catch suspicious activity
- Automate governance and compliance tracking
- Leverage hacker-based testing to identify vulnerabilities that could lead to model theft
Securing the Future of LLMs
This new release by the OWASP Foundation enables organizations looking to adopt LLM technology (or recently did so) to guard against common pitfalls. In many cases, organizations simply are unable to catch every vulnerability. HackerOne is committed to helping organizations secure their LLM applications and to staying at the forefront of security trends and challenges. HackerOne’s solutions are effective at identifying vulnerabilities and risks that stem from weak or poor LLM implementations. Conduct continuous adversarial testing through Bug Bounty, targeted hacker-based testing with Challenge, or comprehensively assess an entire application with Pentest or Code Security Audit. Contact us today to learn more about how we can help secure your LLM and secure against LLM vulnerabilities.
To hear more important perspectives about the power and risks of AI on security, be sure to attend our booth presentation at Black Hat on Thursday, August 10, 2023, as HackerOne co-founder Michiel Prins and hacker Joseph Thacker (rez0) discuss how hackers are thinking about Generative AI.
Additional Resources