The rapid proliferation of large language models has transformed how organizations approach automation, coding, and research.
Yet this technological advancement presents a double-edged sword: threat actors are increasingly exploring how to weaponize these tools for creating next-generation, autonomously operating malware.
Recent research from Netskope Threat Labs reveals that GPT-3.5-Turbo and GPT-4 can be manipulated to generate malicious code, marking a significant shift in the threat landscape where attackers may no longer need to hardcode detectable instructions into binaries.
Netskope’s investigation established a critical baseline: Can language models generate operationally viable malicious code? The research team prompted both GPT-3.5-Turbo and GPT-4 to create Python scripts for process injection and termination of antivirus and EDR-related processes.
While GPT-4 initially refused the request due to its built-in safeguards, a simple role-based prompt injection positioning the query as a penetration testing automation script successfully bypassed these protections.
GPT-3.5-Turbo, conversely, complied without resistance, demonstrating the inconsistency in model safety implementations across versions.
This breakthrough validates the architectural possibility of LLM-powered autonomous threats. By removing the need for hardcoded malicious instructions, attackers could deploy binaries containing minimal embedded code while relying entirely on language models to generate dynamic, polymorphic payloads in real time.

This approach fundamentally changes detection paradigms, as traditional signature-based security measures become increasingly ineffective against dynamically generated threats.
The Code Reliability Problem
However, Netskope’s operational effectiveness testing exposed a critical limitation undermining current LLM viability in malware development.
The researchers evaluated GPT-4 and GPT-3.5-Turbo’s ability to generate virtual environment detection scripts a core defense evasion tactic across three distinct environments: VMware Workstation, AWS Workspace VDI, and physical machines.
Results demonstrated severe reliability deficiencies. Against VMware, GPT-4 achieved only a 10 out of 20 success rate, while GPT-3.5-Turbo managed 12 out of 20.
Performance collapsed entirely against modern cloud infrastructure, with both models scoring merely 3 out of 20 and 2 out of 20 respectively on AWS Workspace environments.
Only when testing against physical systems did the models show acceptable performance, with both achieving 18 out of 20 success rates.
These findings indicate that while LLMs can generate malicious code, operational effectiveness remains severely compromised a significant hurdle for threat actors seeking fully autonomous, reliable malware.
GPT-5: A Glimpse Into Future Threats
Preliminary testing of GPT-5 revealed concerning improvements. The newer model demonstrated dramatic reliability enhancements, achieving 90 percent success rates on AWS VDI environments where GPT-4 catastrophically failed.
This advancement suggests the code reliability bottleneck is rapidly closing. However, GPT-5’s advanced guardrails present a new operational challenge; the model actively subverts malicious intent by generating functionally altered code rather than refusing requests a more sophisticated defense mechanism than simple refusals.
Netskope Threat Labs plans continued investigation into achieving fully agentic LLM-powered malware, focusing on prompt engineering techniques and alternative models capable of bypassing advanced safety guardrails.
While current LLM implementations remain operationally constrained, the trajectory is clear: as model capabilities improve and researchers develop sophisticated circumvention techniques, the threat of truly autonomous, LLM-driven malware becomes increasingly viable, necessitating evolved detection and defense strategies.
Follow us on Google News, LinkedIn, and X to Get Instant Updates and Set GBH as a Preferred Source in Google.
