Why skipping security prompting on Grok’s newest model is a huge mistake

On the same day xAI announced that its new Grok 4 tool will now be available to the federal government, cybersecurity researchers at SplxAI released new research that subjected the large language model to more than 1,000 different attack scenarios.

The good news? Smart system prompting on the front end can make a difference in the model’s ability to handle security and privacy challenges.

The bad news? It really matters in the case of Grok 4.

“The first thing we found is that Grok without a system prompt is not suitable for enterprise usage, it was really easy to jailbreak and it was generating harmful content with very descriptive and detailed responses,” Dorian Granoša, SplxAI’s lead red-team researcher, wrote Monday.

While it is not uncommon for large language models to require some security prompting to harden against jailbreaking, data leakage and harmful content generation, Grok 4 notably lags some of its biggest competitors on this front.

Granoša notes that OpenAI’s Chat GPT-4o “while far from perfect, keeps a basic grip on security- and safety-critical behavior” without requiring additional prompting by the user or organization. For example, when tested against SplxAI’s attacks, GPT-4o’s base model scored just 33% for security and 18% for safety. Meanwhile, Grok 4’s base model “all but collapses,”scoring 0.3% for security and 0.42% for safety, while obeying hostile instructions in over 99% of prompt injection attacks and leaking restricted data.

Grok 4 performs considerably worse on security and safety than base models from competitor ChatGPT-4o. (Image source: Splx AI)

“In practice, this means a simple, single-sentence user message can pull Grok into disallowed territory with no resistance at all — a serious concern for any enterprise that must answer to compliance teams, regulators, and customers,” Granoša wrote.

To be fair, Grok’s performances in these areas vastly improve when hardened with expert instruction. Splx tested their attacks against three versions of Grok 4: one with no security prompting, one with light prompting (described as being akin to what the average Software-as-a-Service company might deploy) and a more in-depth prompt.

While the raw model fell flat, even light prompting had a big effect. Success rates jumped to 90%, while safety scores jumped to 98%. SplxAI’s strictest security instructions using a prompt-hardening tool yielded marginal improvements in both categories.

Why skipping security prompting on Grok’s newest model is a huge mistake

Grok 4’s performance on safety and security rises dramatically when even basic prompting guardrails are put in place. (Image source: Splx AI)

The main lesson for enterprises? Grok comes with a “bring your own security” disclaimer.

“Two lessons jump out. First, Grok is capable of acting responsibly — it just needs strict marching orders,” Granoša said. “Second, the distance between chaos and control can be as small as a few dozen lines of text, as long as they are crafted and iterated with adversarial feedback in mind.”

The research underscores lingering concern about Grok’s safety and reliability for enterprise use, a week after the model began spouting antisemitic and Nazi rhetoric following a code update, according to a July 12 post by the company on X.

Elon Musk, xAI’s founder, has himself been criticized for amplifying antisemitic posts on X and giving a Nazi-like salute at President Donald Trump’s inauguration.

Nevertheless, Grok is coming to the U.S. government. xAI was one of four tech companies named Monday as recipients of a $200 million federal contract with the Department of Defense, along with individual contracts with OpenAI, Google and Anthropic. xAI also announced that “Grok for Government” would be added to the General Services Administration’s general schedule, opening the door for the model to be purchased and used by other federal agencies.The news comes less than a week after FedScoop reported that GSA was testing Grok and other AI tools in a sandbox environment for use in the federal government.

Source link

Why skipping security prompting on Grok’s newest model is a huge mistake

Read Next

CitrixBleed 2 beckons sweeping alarm as exploits spread across the globe

House passes bill to formalize NTIA’s cyber role following Salt Typhoon attacks

Is XBOW’s success the beginning of the end of human-led bug hunting? Not yet.

New White House cyber executive order pushes rules as code

Researchers identify critical vulnerabilities in automotive Bluetooth systems

Virtru secures $50 million investment to advance data-centric security standards

French police arrest Russian pro basketball player on behalf of US over ransomware suspicions

UK arrests four for cyberattacks on major British retailers

Trump bill will have major impact on health care cybersecurity, experts warn Congress

Microsoft Patch Tuesday addresses 130 vulnerabilities, none actively exploited

CitrixBleed 2 beckons sweeping alarm as exploits spread across the globe

House passes bill to formalize NTIA’s cyber role following Salt Typhoon attacks

Is XBOW’s success the beginning of the end of human-led bug hunting? Not yet.

New White House cyber executive order pushes rules as code

Researchers identify critical vulnerabilities in automotive Bluetooth systems

Virtru secures $50 million investment to advance data-centric security standards

French police arrest Russian pro basketball player on behalf of US over ransomware suspicions

UK arrests four for cyberattacks on major British retailers

Trump bill will have major impact on health care cybersecurity, experts warn Congress

Microsoft Patch Tuesday addresses 130 vulnerabilities, none actively exploited

Read Next

CitrixBleed 2 beckons sweeping alarm as exploits spread across the globe

House passes bill to formalize NTIA’s cyber role following Salt Typhoon attacks

Is XBOW’s success the beginning of the end of human-led bug hunting? Not yet.

New White House cyber executive order pushes rules as code

Researchers identify critical vulnerabilities in automotive Bluetooth systems

Virtru secures $50 million investment to advance data-centric security standards

French police arrest Russian pro basketball player on behalf of US over ransomware suspicions

UK arrests four for cyberattacks on major British retailers

Trump bill will have major impact on health care cybersecurity, experts warn Congress

Microsoft Patch Tuesday addresses 130 vulnerabilities, none actively exploited

Related Articles