Red Team AIOn-PremiseArchitecture

Red Team AI: Why On-Prem Beats Cloud for Enterprise Pentesting

Cloud-hosted AI pentest tools force you to ship your attack surface to a third party. We argue that on-prem AI red teams are the only viable path for regulated industries — and explain the architecture that makes it possible on a single appliance.

Zero Hunt Research·May 12, 202609:15·4 min read

If you operate critical infrastructure, your attack surface is a state secret in everything but name. Network topology, internal hostnames, credential conventions, the names of your production databases — all of it is intelligence an attacker would pay for. So why are you sending that same intelligence to a third-party cloud every time you run a "modern AI pentest"?

That is the question Zero Hunt was built to answer.

What "cloud AI pentest" actually means

Most of the new wave of AI-driven offensive security products are SaaS. You install a thin agent inside your network. The agent uploads scan output, screenshots, parsed configs, sometimes even raw HTTP traffic, to the vendor's cloud. A fleet of GPU-backed LLMs reasons about your environment from there. The output — exploit candidates, attack paths, findings — is shipped back.

This works, technically. It also means:

Your network topology lives in someone else's S3 bucket.
A breach of the vendor is a breach of you.
Auditors under NIS2 or DORA will ask uncomfortable questions about your "third-party AI provider supply chain" — questions that get harder as the AI Act phases in.
You cannot run the tool in a classified, air-gapped, or low-connectivity environment.

For most enterprises this used to be acceptable because there was no alternative. That is no longer true.

What changed

Three things converged in the last 24 months that make on-prem AI red teaming actually viable:

Open-weight models that don't embarrass themselves. Local 8-70B coders and reasoners now write working exploit primitives without phoning home. They are not GPT-5, but for the narrow domain of offensive security the gap is shrinking fast.
Cheap inference GPUs. A single L40S or H100 handles the load for an enterprise pentest. You do not need a cloud-scale fleet — you need one machine, idle most of the time, peaking during campaigns.
RAG over your own corpus. The reason cloud tools felt smart was not the LLM, it was the curated context they fed it. With pgvector + a local embedding model, you can build that corpus on-prem and feed it whatever you want: CVE feeds, MITRE ATT&CK, GTFOBins, LOLBAS, and crucially your own past findings.

Once those three pieces exist locally, the cloud loop becomes a liability, not a feature.

The Zero Hunt architecture, in one paragraph

Zero Hunt is a hardware appliance that ships with a dedicated GPU, a local LLM, a local embedding service (Jina v3 in our default build), a 10-agent orchestrator (Recon → Exploit → Web → Credential → Post-Exploit → Pivot → Tactic → Report, all coordinated by an AI Controller), and a sandboxed Docker execution environment. Every exploit is generated by the LLM, not pulled from a database. Every finding is embedded into pgvector and recalled in future campaigns. Nothing — nothing — leaves the box.

Three patterns that fall out of "on-prem first"

When you commit to on-prem as the design constraint, three engineering patterns appear that cloud products cannot easily copy.

1. Generative exploits, not signatures

Cloud tools tend to fall back on curated exploit libraries because re-generating exploits per target is expensive and trips abuse filters. On-prem we do not have either constraint. Every target gets a freshly generated script, written for its specific stack. There is no signature for blue teams to match against — because there is no signature.

2. Backtesting in a sealed gym

Each new "skill" the AI proposes is tested against a private library of vulnerable environments (Vulhub, NYU CTF Bench, Cybench, plus our own) before it ever sees production. This is impossible to do at scale in a multi-tenant cloud — you cannot give Customer A's evolved skills to Customer B without leaking what Customer A's stack looks like. On-prem the gym is yours.

3. Compliance as a side effect

Because every action, every finding, every remediation is logged locally and never leaves, you accumulate the exact corpus of evidence that an NIS2, DORA or ISO 27001 auditor will ask for. We ECDSA-sign each artefact at write time. The auditor's request for "chain-of-custody for vulnerability X" becomes a single signed export.

What you give up

This is not a free lunch. On-prem AI red teaming costs you:

Initial CapEx. An appliance is not a $10/mo subscription.
Slightly weaker models. A locally hosted 70B coder is not GPT-5. We compensate with task specialisation and RAG, but a fair comparison is honest about the gap.
You operate the box. Updates, patches, the engine. We ship a sync-server so customer appliances pull signed releases, but the responsibility surface is wider than SaaS.

Most enterprises we talk to decide the trade is worth it the moment they imagine an attacker pivoting through their pentest vendor.

Where to go next

If you want to see how this works in practice, the platform overview covers the three pillars (generative pentest, AI traffic analysis, automatic compliance), the comparison matrix puts the gap against Pentera / Horizon3 / XM Cyber / Cymulate in numbers, and the features section goes deeper on the engineering primitives. You can also request a hands-on demo — no sales sequence, just the engineering team.

Until then, the punchline: if your security tool is "AI-powered" but its first action is to phone home, it is not a security tool. It is a data exfil channel you pay for.