The Root Of AI Hallucinations: Physics Theory Digs Into The 'Attention' Flaw - Cybernoz

No-one really understands how AI works or when and why it doesn’t. But the application of first-principle physics theory to the working of AI’s Attention mechanism is providing new insights.

Neil Johnson is a Professor of Physics at George Washington University, heading an initiative in ‘complexity and data science’. In April 2025 he (with researcher Frank Yingjie Huo) published a paper (PDF) titled Capturing AI’s Attention: Physics of Repetition, Hallucination, Bias and Beyond.

That’s techno for ‘why and when does AI provide false predictions, such as hallucinations and biased responses?’

It’s an important question. Use of AI is increasing rapidly, but understanding AI – even by those who develop it – is not. Usage will pervade every aspect of our lives, from healthcare to warfare; and our lives could depend on something we don’t understand.

“If you ask anyone to explain how gen-AI works, they draw a blank,” says Johnson. “There is ongoing work trying to understand it, but that work is not really progressing.” This leads to an important and fundamental question: How can you make something safe, secure, and efficient – and how can you trust it – if you don’t know how it works?

Johnson’s paper seeks to explain the inner workings of AI to understand its continuing propensity to hallucinate and provide biased outputs. He uses first-principle physics theory (a theoretical framework that explains phenomena based on fundamental laws and principles of nature, such as quantum mechanics) to understand the Attention mechanism – that part of an LLM that allows the model to focus on the relevant parts of an input while generating predictions for its output. It’s the transformer T in GPT, but it exists in all LLMs.

The math, as you would expect from a professor of Physics, is complex.

However, simplistically, the gist associates the decision process in the Attention mechanism to two ‘spin baths’ in physics. A spin is an individual particle, while the bath comprises other particles with which it can interact. In the Attention mechanism, the spin is a token (let’s say a word), and the bath contains other words it could semantically associate with. According to Johnson, the Attention mechanism comprises two spin baths.

Advertisement. Scroll to continue reading.

Continuing the physics analogy, this allows the Attention mechanism to be seen as a 2-body Hamiltonian (the Hamiltonian is a physics term for the representation of the total energy of a system). The math in the paper then demonstrates that two is not enough, primarily showing that bias (which is almost impossible to objectively exclude) in the training data can affect individual token weights in the spin baths – sometimes giving too much weight to a particular token which can then affect the outcome with biased or hallucinated predictions.

“The theory predicts how a bias (e.g. from pre-training or fine tuning the LLM) can perturb N [the context vector] so that the trained LLM’s output is dominated by inappropriate vs. appropriate content (e.g. ‘bad’ such as “THEY ARE EVIL” vs. ‘good’),” reports the paper. “We’re not making any statements here about training data being right or wrong,” adds Johnson, “we’re just asking, given the training data it is being fed on, do we know when the Attention mechanism is going to go off the rails and give me something completely unreliable?”

In the 2-body Hamiltonian, this cannot be known. It will happen sooner with an LLM insufficiently trained or trained on biased data; but the potential is always likely to exist in LLMs that equate to 2-body Hamiltonians. And they all do. Johnston believes it is the result of the evolution – almost a Darwinian evolution – of AI through engineers who never really understood how or why AI works. A 2-body Hamiltonian design worked pretty well, so that was adopted.

This doesn’t mean AI cannot be made better… The quantum analogy continues – a 3-body Hamiltonian would be immensely more powerful (in this case, better and more accurate) than the current 2-body Hamiltonian – just as 3 qubits in quantum is immensely more powerful than 2 classical bits. And why not 4-body, or 5-body Hamiltonians?

Well, you could say the answer lies in Britain’s Gauge Wars. George Stephenson, engineer (‘the Father of the Railways’) settled on a railway gauge of 4 feet 8.5 inches (the narrow or standard gauge).

Isambard Kingdom Brunel, perhaps the UK’s greatest ever design engineer, was later tasked with developing a train route from Bristol to London. He chose a railway gauge of 7 feet 0.25 inches (the broad gauge) arguing the result would be faster, smoother and safer travel.

Brunel was right, but Stephenson was already embedded and in widespread use, and the narrow gauge eventually became law by Act of Parliament. The episode highlights two fundamentals of progress: initial investments create inertia if they work (even if they could work better); and changing is costly and disruptive.

This is exactly where we are now with gen-AI. So many billions of dollars have been invested in its design it is unconscionable to dump everything and start again – especially since it sort of works much of the time, and is very profitable as it is.

However, this doesn’t mean that security people are unable to do anything. By lifting the lid and getting a deeper understanding of how the Attention mechanism works, Johnson is able to discuss a risk management approach to a safer use of AI. His math shows a link between poor or inadequate training and unreliable output. From his mathematical understanding of the cause and effect, he can predict performance: he has the technology, or at least a formula on how to predict when a particular LLM is likely to go off the rails. It could be every 200 words in poorly trained LLMs, or every 2,000 words in better trained LLMs.

It will become a matter of risk management. Just as insurance already relies on the use of actuarial data, so LLM risk will become manageable with AI actuarial data. Right now, it’s early days. There are many different AI models trained on data of varying quality, so there will be many different points at which each will likely go off the rails. But he believes the math has provided a firm path forward.

Learn More at the AI Risk Summit | August 19-20, 2025 – Ritz-Carlton, Half Moon Bay

Related: The Shadow AI Surge: Study Finds 50% of Workers Use Unapproved AI Tools

Related: Epic AI Fails And What We Can Learn From Them

Related: What If the Current AI Hype Is a Dead End?

Related: Bias in Artificial Intelligence: Can AI be Trusted?

Source link

The Root of AI Hallucinations: Physics Theory Digs Into the ‘Attention’ Flaw

About Cybernoz