Blog
AI GatewayLiteLLMMCP SecurityUnauthenticated RCE

LiteLLM CVE-2026-42271: the AI gateway is now an RCE surface

LiteLLM CVE-2026-42271 turns the most widely deployed AI gateway into unauthenticated RCE via its MCP test endpoints. Why the LLM proxy is the asset nobody inventoried.

Zero Hunt Research··8 min read

Almost every company that shipped an LLM feature in the last eighteen months put a gateway in front of it. The gateway holds the API keys to OpenAI, Anthropic, Azure, and a dozen self-hosted models; it does budget control, key rotation, request logging, and fallback routing. It is, by design, the single host that can talk to every model provider and sees every prompt. In a lot of organisations it was stood up by a platform team in an afternoon, exposed on an internal load balancer, and never added to the asset register. On 8 June 2026 CISA added CVE-2026-42271 — a command-injection flaw in LiteLLM, the most widely deployed open-source AI gateway — to its Known Exploited Vulnerabilities catalog, with a federal remediation deadline of 22 June. A week later, on 15 June, a second research team published a chain that takes a low-privilege key all the way to remote code execution. The AI gateway is no longer plumbing. It is a target.

What CVE-2026-42271 actually is

LiteLLM exposes two endpoints intended to let an operator preview a Model Context Protocol (MCP) server before saving it: POST /mcp-rest/test/connection and POST /mcp-rest/test/tools/list. To test an MCP server that uses the stdio transport, those endpoints accept a full server configuration — including the command, args, and env fields — and then spawn that command as a subprocess on the proxy host to see whether it responds. There is no allow-list. Whatever binary and arguments you submit, the gateway runs.

That is the entire bug. Per the NVD entry, it is classified CWE-77 / CWE-78 (OS command injection), scored CVSS 3.1 8.8 and CVSS 4.0 8.7, and affects LiteLLM 1.74.2 through 1.83.6. BerriAI, the maintainer, shipped the fix in v1.83.7-stable on 8 May 2026. The flaw was originally rated authenticated, because reaching the MCP test endpoints required a valid proxy API key. In a multi-tenant gateway where every developer holds a key, "authenticated" is already a thin wall. As it turned out, it was thinner than that.

From authenticated to unauthenticated: the Starlette chain

LiteLLM is built on Starlette, the ASGI framework underneath FastAPI. On 26 May 2026, CVE-2026-48710 — nicknamed "BadHost" — disclosed a host-header validation bypass in Starlette versions ≤ 1.0.0. Horizon3.ai connected the two: if a LiteLLM deployment's dependency tree pins a vulnerable Starlette, an attacker can manipulate the HTTP Host header to bypass LiteLLM's API-key gate entirely, then call the MCP test endpoints with no credentials at all.

Horizon3.ai validated the full chain on 1 June 2026 and rates it CVSS 10.0. The published impact reads like a worst-case briefing slide:

  • execute arbitrary commands on the LiteLLM host;
  • read the model-provider credentials the gateway stores;
  • siphon the API keys and secrets it manages on behalf of every downstream app;
  • move laterally into the AI infrastructure the gateway is wired into.

The remediation is two-sided: upgrade LiteLLM to 1.83.7 or later and Starlette to 1.0.1 or later. Patching only the application leaves the host-header bypass live in the dependency you did not think to check.

The 15 June escalation: a low-privilege key to proxy admin

The freshest piece landed yesterday. On 15 June 2026, Obsidian Security disclosed a separate, authenticated chain that reaches the same command-injection primitive from inside — useful in exactly the multi-tenant deployments where the unauthenticated bypass is patched but ordinary users still hold keys. The Hacker News covered the disclosure; the chain is rated CVSS 9.9 and stacks three flaws:

CVE Class Mechanism
CVE-2026-47101 Authorization bypass When a regular user mints a virtual key, LiteLLM stores the caller-supplied allowed_routes without checking it against the user's role. A non-admin can issue a key with allowed_routes: ["/*"] — a wildcard reaching admin-only routes.
CVE-2026-47102 Privilege escalation /user/update lets a user edit their own record without restricting writable fields. A self-update with user_role: "proxy_admin" promotes the caller to full proxy admin.
CVE-2026-42271 Command injection With admin reach, the MCP test endpoints spawn an attacker-supplied command — the reverse shell.

Three logic bugs, none exotic: a missing role check on a field, an over-permissive self-service update, and an endpoint that trusts its input. Chained, they convert any developer who can log into the gateway into root-equivalent on the host that holds your model keys. The full fix set landed in later 1.83.x stable builds; if you patched for the command injection in May but did not track the authorization fixes, you are not done.

Why the AI gateway is the worst host to lose

Treat this as a normal RCE and you will under-rate it. The LiteLLM host is not a web server you can reimage and forget. According to its GitHub repository, LiteLLM carries roughly 47,800 stars and over 21,000 dependent projects, and runs in production at organisations including Netflix, Lemonade, and Rocket Money. The reason it is everywhere is the reason its compromise is severe: it is the one place where every provider credential is concentrated and every prompt and completion flows through.

"We rotated the OpenAI key after the incident. We did not realise the gateway also brokered the Bedrock role, the Azure deployment keys, three internal vLLM endpoints, and the MCP servers our agents call. The attacker had a thirty-minute head start on a host we'd never scanned, because it wasn't in the CMDB."

That counterfactual is the whole problem. An attacker on the gateway does not just get the box. They get a credential vault for your entire model estate, a vantage point on every prompt (including whatever sensitive data your users paste in), and — through the very MCP machinery the bug abuses — a foothold into the tool-calling infrastructure your agents use to touch databases, ticketing systems, and internal APIs. MCP is the connective tissue of agentic AI; an RCE that lives in the MCP preview path lands you in the middle of it.

The asset nobody inventoried

The uncomfortable part is not the three logic bugs. It is the cadence gap that let them sit exposed. AI gateways have a discovery problem that older infrastructure does not:

  • They are stood up outside change control. A platform or data-science team spins up LiteLLM to unblock a feature. It rarely enters the asset register, so the quarterly scan and the annual pentest never see it.
  • They are internet-reachable more often than anyone admits. Plenty of LiteLLM instances sit behind nothing more than an API-key check — which CVE-2026-48710 removes — on hosts reachable from the public internet or a flat internal network.
  • They move fast. LiteLLM has shipped over a thousand releases; the version you assessed two months ago is not the version running today. The command-injection fix (1.83.7) and the authorization fixes shipped weeks apart. A point-in-time test that predates either is blind to it.
  • Conventional scanners don't model them. A network scanner sees an HTTP service on a port. It does not know that /mcp-rest/test/connection will spawn a subprocess, or that an allowed_routes wildcard is a privilege-escalation primitive. The vulnerability is in application logic, not a banner.

The result is a class of high-value hosts that are exposed, fast-moving, and absent from the very inventories your validation programme is built around. By the time CVE-2026-42271 reached the KEV catalog on 8 June, the relevant question for most security teams was not "is it patched" — it was "do we even know where our LiteLLM instances are."

Where this leaves continuous validation

The defensible answer to a fast-moving, un-inventoried, internet-exposed RCE surface is not a faster annual pentest. It is validation that finds the asset and proves the exploit chain before an attacker does — which is the operational question Zero Hunt's generative pentest pillar is built to answer.

Zero Hunt runs change-triggered campaigns: when a new service appears on the perimeter — a freshly stood-up LiteLLM gateway included — it triggers a full campaign within the hour, rather than waiting for the next scheduled window. The 10-agent swarm (Recon, Exploit, Web, Credential, Post-Exploit, Pivot, Tactic, Report, under an AI Controller) then does what a banner scan cannot: the Web and Exploit agents reason about application logic, probe an endpoint like /mcp-rest/test/connection for the subprocess-spawning behaviour behind it, and write a per-target exploit with a local LLM rather than pulling a generic PoC. New offensive skills are backtested in the AI Gym — against Vulhub, NYU CTF Bench, Cybench, and a CVE-based black-box corpus — before they ever touch a production environment, so the chain that gets run against your gateway has been proven safe in a sandbox first. Every finding is ECDSA-signed at write time, which matters when the host you just proved exploitable holds the keys to your entire model estate and someone will eventually ask for the evidence.

The companion pillar is wire-side. The exploitation here ends in a subprocess: a reverse shell, a curl to an external IP, an outbound beacon from a host whose entire prior life was proxying requests to model providers. Zero Hunt's AI Traffic Analysis — a deep-learning model with four inference heads running on the appliance GPU — is built to flag exactly that: a proxy that suddenly initiates an outbound session to a never-seen ASN, the lateral pivot from the gateway into connected AI infrastructure, the egress that doesn't match the host's baseline. It sees the consequence of the RCE while it is happening, not in the next morning's log review. Find the gateway before the attacker, and you patch. Miss it, and the traffic is still the witness. Both run on a 100% on-prem appliance — no cloud callbacks, no prompts or keys leaving your network to a third party, which for the host that brokers your entire AI stack is the only acceptable posture.

If you run LLM features, the action item is concrete: find every LiteLLM instance, confirm it is on 1.83.7+ with Starlette 1.0.1+, and check the authorization fixes too — not just the command-injection one. Then ask the harder question of how you would have found that host if it had never made it into the inventory in the first place. Talk to us if the honest answer is "we wouldn't have."