
Ollama provides an interface and REST API server for running and calling locally hosted large language models (LLMs). The application does not provide authentication by default and is also often configured to listen on all network interfaces (0.0.0.0), even though it’s meant for local usage and binds to localhost (127.0.1.1) by default. There are approximately 300,000 Ollama servers currently exposed on the public internet and many more on local networks.
“With over 170,000 GitHub stars and 100 million Docker Hub downloads, Ollama is widely used across enterprises as a self-hosted AI inference engine,” Cyera warns, adding that the vulnerability is broadly exploitable because no authentication is required.
Only three API requests needed for exploit
Located in Ollama’s model quantization pipeline, the bug relates to how the framework loads GGUF (GPT-Generated Unified Format) files, which store weights, metadata, and tokenizer information for local models.
