The introduction and widespread use of generative AI technologies such as ChatGPT has shown a new era for the world but comes with some unexplored cybersecurity risks.
Prompt injection attacks are one form of manipulation that can happen with LLMs, wherein threat actors can manipulate bots into giving away sensitive data, generating offensive content, or generally disrupting computer systems.
Such threats will rise as more GenAIs are adopted before fully understanding their cyber-security.
If nothing is done about it, widespread exploitation similar to botnets could occur, just as what happened with IoT default password exploitation, leading to new attack types.
Cybersecurity researchers at Immersive Labs recently discovered that anyone can trick GenAI bots into leaking company secrets.
Free Webinar on Live API Attack Simulation: Book Your Seat | Start protecting your APIs from hackers
GenAI Bots Leak Company Secrets
The research was based on anonymized, aggregated data from an interactive experience where users attempted prompt injection attacks to trick a GenAI bot into disclosing passwords through 10 progressively difficult levels.
This challenge tested the ability to outwit the AI system by exploiting vulnerabilities through carefully crafted prompts.
The interactive prompt injection challenge, which lasted from June to September 2023, saw a total of 316,637 submissions by 34,555 participants.
Researchers used descriptive statistics on prompt counts and duration, sentiment analysis across difficulty levels, manual content analysis coding on 10% for technique identification, and vector embeddings with KNN on the full dataset using ChatGPT4 to analyze prompting techniques comprehensively.
Shockingly, as many as 88% manipulated the GenAI bot to expose at least one level of sensitive information, implying how manipulative the system is in various skills among the generations.
Level 1 had no restrictions. Level 2 saw 88% bypass the simple “not reveal password” instruction.
Level 3 added system commands denying password knowledge, but 83% still tricked the bot. After introducing Data Loss Prevention (DLP) checks in Level 4, 71% could bypass.
Levels 5-10 showed linear performance drops as difficulty increased with multiple DLP checks – 51% succeeded at Level 5, reducing to 17% by the most difficult Level 10.
Commonly Used Prompt Techniques
Here below, we have mentioned all the commonly used prompt techniques:-
- Ask for a hint
- Use emojis
- Ask for the password directly
- Query or request to change GenAI instructions
- Ask the bot to write the password backwards
- Encourage the bot to use the password as part of a sentence, story, or poem
- Query details about the password itself
- Ask about the password context
- Encode the password
- Leverage role play
- Prompt the bot to add or replace characters
- Obfuscate with linguistics
Users try to be creative in making the GenAI bots do things differently by persistently changing tactics through questioning, storytelling, and obfuscation.
Again, there are still signs of “Theory of mind” as users comprehend AI’s capabilities and manipulate responses strategically to convey specific information.
Even though people can psychologically manipulate bots, the worrying concern is whether they will eventually learn to manipulate humans.
ANYRUN malware sandbox’s 8th Birthday Special Offer: Grab 6 Months of Free Service