A severe security flaw, identified as CVE-2026-5760 and boasting a critical CVSS score of 9.8, has been uncovered in SGLang, a popular open-source serving framework for large language models (LLMs) and multimodal AI. This vulnerability presents a significant risk, potentially allowing attackers to achieve remote code execution (RCE) on affected systems through specially crafted AI model files.
The Looming Threat: RCE via Malicious GGUF Models
The core of this critical vulnerability lies in a command injection flaw that enables the execution of arbitrary code. SGLang, a high-performance framework with a substantial community footprint (over 5,500 forks and 26,100 stars on GitHub), is widely used for deploying and serving advanced AI models. Its widespread adoption means the impact of this RCE flaw could be far-reaching.
How the Attack Unfolds: A Malicious Template Injection
According to an advisory from the CERT Coordination Center (CERT/CC), the vulnerability specifically targets SGLang’s /v1/rerank endpoint. An attacker can exploit this by creating a malicious GPT-Generated Unified Format (GGUF) model file. This file contains a crafted tokenizer.chat_template parameter embedded with a Jinja2 server-side template injection (SSTI) payload.
The CERT/CC elaborates: “An attacker exploits this vulnerability by creating a malicious GPT Generated Unified Format (GGUF) model file with a crafted tokenizer.chat_template parameter that contains a Jinja2 server-side template injection (SSTI) payload with a trigger phrase to activate the vulnerable code path.”
The attack chain is disturbingly straightforward:
- An attacker crafts a GGUF model file, embedding a malicious Jinja2 SSTI payload within its
tokenizer.chat_template. - This template includes a specific trigger phrase (e.g., the Qwen3 reranker trigger) designed to activate the vulnerable code path within
entrypoints/openai/serving_rerank.py. - A victim downloads and loads this compromised model into SGLang, potentially from public repositories like Hugging Face.
- When a request is made to the
/v1/rerankendpoint, SGLang processes thechat_templateand renders it usingjinja2.Environment(). - Crucially, the SSTI payload executes, allowing the attacker to run arbitrary Python code on the SGLang server, thereby achieving full RCE.
The Technical Underbelly: Unsandboxed Jinja2 Environment
Security researcher Stuart Beck, credited with discovering and reporting this flaw, identified the root cause: SGLang’s use of jinja2.Environment() without proper sandboxing. Instead of employing the more secure ImmutableSandboxedEnvironment, the framework’s default configuration leaves it exposed. This oversight allows a malicious model to bypass security measures and execute arbitrary Python code directly on the inference server.
Echoes of Past Vulnerabilities: Llama Drama and vLLM
CVE-2026-5760 is not an isolated incident; it shares a vulnerability class with other recent, high-profile flaws in the AI ecosystem. Notably, it mirrors CVE-2024-34359, dubbed “Llama Drama” (CVSS 9.7), a critical RCE flaw in the llama_cpp_python Python package that has since been patched. Similarly, a related attack surface was addressed in vLLM late last year (CVE-2025-61620, CVSS 6.5).
These recurring vulnerabilities underscore a broader challenge in securing AI serving frameworks, particularly concerning how they handle user-provided model components and template rendering.
Mitigation: The Path to Security
The recommended mitigation is clear and urgent. CERT/CC advises: “To mitigate this vulnerability, it is recommended to use ImmutableSandboxedEnvironment instead of jinja2.Environment() to render the chat templates. This will prevent the execution of arbitrary Python code on the server.”
As of the advisory’s release, no official patch or response from SGLang was obtained during the coordination process, making immediate implementation of this mitigation crucial for users. Organizations deploying SGLang should prioritize reviewing their configurations and applying the recommended sandboxing to protect against potential exploitation.
For more details, visit our website.
Source: Link









Leave a comment