← Learn
Playbook9 min read

EDR bypass incident response — the playbook when the endpoint stack is the vector

Short definition

Step-by-step response when an endpoint security platform (Defender, Apex One, CrowdStrike, SentinelOne, etc.) is suspected of being subverted, silenced or used as an attack channel — not when it merely missed a detection.

Why this matters now

Between 20 and 21 May 2026, CISA added three exploited zero-days affecting Microsoft Defender and Trend Micro Apex One to the Known Exploited Vulnerabilities catalogue, with federal remediation deadlines of 3 and 4 June. The Qilin and Warlock ransomware crews are running BYOVD payloads that terminate 300+ EDR drivers across every major vendor. When the endpoint security platform itself is the attack surface, standard IR runbooks fail because they assume the EDR is ground truth — and supervisors under NIS2 Title 13 and DORA Art. 19 do not accept "the EDR did not alert" as a defence.

Key points

  • Treat the EDR as **untrusted infrastructure** the moment one of three signals fires: KEV-listed CVE in your stack, telemetry gap on a known-active host, or anomalous management-plane traffic.
  • The detection-clock and the regulator-clock are different — the **regulator-clock starts when you knew or should have known**, not when the EDR finally alerted.
  • Out-of-band visibility (network DPI, identity provider, cloud-control-plane logs) is the only data source you can trust during the first 4-72 hours.
  • Map blast radius assuming the management plane was used as a **deployment channel**, not just an evasion target — every endpoint under that console is potentially actor-controlled.
  • NIS2 Art. 23 / DORA Art. 19 reporting clocks still run; "our EDR was the vector" does not pause them — file the initial notification on out-of-band evidence.
  • Post-recovery, the platform stays in **continuous adversarial validation** scope — vendor patching alone is insufficient under the recurring-exploitation pattern CISA documents.

Scope and when this playbook fires

Use this playbook when any of the following triggers are observed:

  • A CVE affecting your deployed endpoint security platform is added to the CISA Known Exploited Vulnerabilities catalogue — recent examples include CVE-2026-41091 (Microsoft Defender LPE), CVE-2026-45498 (Defender DoS) and CVE-2026-34926 (Trend Micro Apex One on-prem agent-injection).
  • A known-active host has gone telemetry-silent for longer than your detection-platform baseline, with no host-change ticket to explain it.
  • The endpoint management console (Defender for Endpoint portal, Apex One server, CrowdStrike Falcon admin plane, SentinelOne management console, MDE policy server) shows unexpected administrative activity, configuration changes, or agent-deployment events outside change-management.
  • A peer organisation in your sector has published an IR report describing the same vendor and version you run.
  • A BYOVD pattern is observed — kernel-level driver loading from a non-vendor path, EDR service termination chains, or msimg32.dll-style side-loading consistent with the Qilin / Warlock 2026 campaigns.

Do NOT use this playbook for: an EDR that simply did not alert on a known-good detection (that is a tuning issue, not a compromise of the platform), routine vendor-patch cycles without exploitation evidence, or false-positive storms (those are operational triage, not IR).

The detection clock vs the regulator clock

Two clocks run in parallel. Conflating them is the most common operational error.

The detection clock is internal. It starts at the trigger observation (CVE published, telemetry gap, anomalous management traffic) and runs against your own SLO for triage-to-containment. Typical mature targets: under 1 hour to triage decision, under 4 hours to platform isolation, under 24 hours to blast-radius confirmation.

The regulator clock is external. Under NIS2 Art. 23 it starts at the moment of awareness of a significant incident — early warning at 24 hours, incident notification at 72 hours, final report at 1 month. Under DORA Art. 19 it starts at classification of a major ICT incident — initial notification at 4 hours from classification (24-hour outer ceiling from detection), intermediate at 72 hours, final at 1 month. Under GDPR Art. 33 — 72 hours from awareness of a personal-data breach.

The trap: teams wait for the EDR to confirm compromise before starting the regulator clock. The regulator-side rule is "knew or should have known" — once a KEV-listed CVE in your stack is published with an exploitation report, you should have known. The defence "our EDR did not alert" actively damages your position because it confirms the EDR cannot be trusted as the source of truth supervisors expected it to be.

The discipline: start *both* clocks at the trigger. File the regulator early warning on out-of-band evidence — KEV listing, vendor advisory, observed telemetry gap. Update at the intermediate gate as the internal investigation closes the picture. Do not delay the regulator filing for internal certainty; the regulation expects preliminary submissions to evolve.

Hour 0 to 4 — confirm, isolate the management plane, freeze evidence

Goal: contain the most likely subversion path before it spreads. Do not assume the agent fleet is hostile until proven; assume the management plane is hostile until proven safe.

Checklist for the first 4 hours:

  • Classify the trigger against the three categories: evasion (BYOVD, anti-tamper bypass), neutralisation (DoS, agent muting, telemetry gap) or subversion (management-plane used as deployment channel). The subversion category is the highest-severity case and requires fleet-wide assumption of breach.
  • Isolate the management console at the network layer: remove agent-deployment console reachability to managed endpoints (firewall ACL on the agent-update port set), preserve read-only access for forensics.
  • Freeze management-plane state: snapshot the console VM, snapshot the underlying database, export policy and deployment history, export administrative authentication logs for the last 14 days. Sign each snapshot at write time.
  • Identify privileged sessions in flight: any administrative session active on the console between trigger time and isolation is suspect. Force-revoke, capture session tokens, alert the identity team.
  • Activate the out-of-band telemetry plane: network DPI, identity provider (Entra/Okta) sign-in logs, cloud-control-plane (AWS CloudTrail, Azure Activity, GCP Audit) become primary sources. The endpoint-resident EDR is *secondary* until cleared.
  • Start the regulator clock. File the NIS2 early warning / DORA initial notification on the evidence available — KEV citation, vendor advisory, observed gap. Do not wait for internal forensics to be complete.
  • Notify the vendor. For Trend Micro Apex One-class events, the vendor IR team has playbooks specific to their management-plane abuse pattern and is contractually obliged to assist.

Failure mode at this gate: pulling the EDR agents off the fleet. That destroys forensic state and removes the only on-host instrumentation you have. Isolate the management plane from the agents; do not uninstall the agents from the endpoints.

Hour 4 to 72 — out-of-band visibility, blast-radius mapping, parallel detection

Goal: rebuild the truth-of-state without relying on the endpoint agent. File the regulator intermediate report by hour 72.

Out-of-band data sources (in priority order):

  1. Network-side DPI / NDR: encrypted traffic analysis from a passive tap or in-line appliance that does not run on the endpoints. A management-plane subversion that pushes attacker code to thousands of endpoints emits a recognisable traffic signature — fan-out timing matching the agent inventory, payload-size clustering at deployment time. This is the most reliable parallel channel because the wire does not lie about what crossed it.
  2. Identity-provider audit logs: every administrative authentication to the console — Entra, Okta, Active Directory — independent of the EDR. Cross-reference against the privileged-session list from Hour 0-4.
  3. Cloud control-plane logs: where the console runs in AWS/Azure/GCP, the platform audit log captures API calls outside the EDR's knowledge — instance modification, IAM changes, network reconfiguration.
  4. DNS resolver logs: any beaconing from compromised endpoints, including the post-subversion phone-home from attacker-deployed code, transits DNS. Centralised resolver logs see what the on-host agent does not.
  5. Configuration management tooling: Ansible, SCCM, Intune, Chef inventories give you the "known state" of each host independent of the EDR's view of itself.

Blast-radius mapping checklist:

  • Enumerate every endpoint the management console *could have* pushed to between trigger time and isolation. This is your suspect fleet.
  • For each suspect host, retrieve the network-side telemetry — new outbound destinations, persistence-relevant primitives (scheduled task creation, service registration, registry writes visible via Sysmon-equivalent NDR signatures).
  • Identify the silent set: hosts that went telemetry-quiet inside the trigger window with no business reason. Treat as compromised until cleared.
  • Identify the lateral set: hosts that received inbound connections from the suspect or silent set. Treat as exposed.
  • Build the cross-reference table: suspect host → identity-provider activity → cloud-platform activity → DNS resolution log → NDR verdict.

The intermediate regulator report (filed by hour 72) covers:

  • Updated assessment of the affected scope (suspect fleet size, silent set, lateral set).
  • Indicators of compromise drawn from the out-of-band sources — never solely from the suspected-compromised EDR telemetry.
  • Containment actions taken: management-plane isolation, agent-deployment pipeline frozen, privileged sessions revoked.
  • Initial root-cause hypothesis tied to the specific CVE or BYOVD primitive observed (cite the CVE ID; this is what supervisors expect).
  • Whether personal data was accessed (drives the parallel GDPR Art. 33 timeline).
  • Updated estimate of clients, transactions, or critical services impacted (DORA classification criteria).

Day 4 to Month 1 — eradication, recovery, and the final report

Goal: restore the endpoint security platform to a known-good state, validate that state against an adversary, and file the regulator final report within 1 month.

Eradication is two-layered:

  1. Platform layer: patch the console to the vendor-fixed version, rebuild the management-plane VM from a verified-clean image, rotate every administrative credential (console-local, IdP-federated, API keys), invalidate all session tokens, audit policy and deployment history for unauthorised changes, restore policy from a snapshot pre-dating the trigger.
  2. Agent layer: for the suspect-and-silent fleet, do not rely on the (now-patched) EDR's self-attestation of cleanliness. Forensically image a statistical sample, validate against the IoC set built in Hour 4-72, and either re-image the fleet or run an out-of-band hunt with a second-source tool (Velociraptor, GRR, native OS forensics) before declaring clean.

Recovery validation:

  • Re-enable the management-plane-to-agent communication only after both layers are eradicated and the IoC hunt comes back negative for ≥ 7 days.
  • Run a targeted offensive validation against the rebuilt platform: replay the original attack primitive (CVE exploitation in a lab, BYOVD driver against the new policy) and confirm the new state detects or prevents it. Do not declare recovery on vendor patch alone.
  • Restore on-host detection trust gradually — promote the EDR back from "secondary, audit-only" to "primary" only after the out-of-band parallel detection has confirmed agreement for a sustained period.

Final regulator report content (filed by month 1):

  • Verified root cause referenced to CVE or driver name.
  • Confirmed blast radius (final counts on suspect, silent, lateral sets).
  • Full IoC list with provenance — which IoCs came from out-of-band sources, which from the post-recovery EDR.
  • Remediation status: console patch in production, management-plane rebuilt, fleet-imaging completed for X hosts, residual exposure for Y hosts (with target dates).
  • Lessons learned tied specifically to the trust model — what the organisation changed about how it weights endpoint-resident vs out-of-band telemetry.
  • Direct and indirect costs (DORA economic-impact criterion).
  • Cross-references to parallel submissions (NIS2, GDPR, sectoral, law enforcement).

Evidence checklist — what to preserve from minute zero

Across all three gates, plan to have ready (every artefact signed at write time, timestamped, chain-of-custody preserved):

  • Trigger record: CVE ID, KEV listing date, vendor advisory ID, your KEV-monitor alert (if any) — the chain of evidence proving you were on notice.
  • Management-console snapshot pair: pre-isolation snapshot (the live evidence) and post-isolation snapshot (the frozen state) of the console VM and its database.
  • Administrative session log: every authentication to the management plane for the 30 days preceding the trigger, with source IP, user agent, MFA method, and result.
  • Deployment / policy-change log: every push from the console to managed agents in the 30 days preceding the trigger.
  • Network telemetry export: NDR/DPI capture covering the trigger window plus 14 days before, with original timestamps and packet metadata preserved.
  • Identity-provider audit log export: all administrative-tier authentications across Entra/Okta/AD for the same window.
  • Cloud control-plane log export: CloudTrail / Activity / Audit for the management-plane environment.
  • DNS resolver log export: for every host in the suspect-and-silent fleet.
  • Forensic imaging artefacts: where statistical sampling was used to validate fleet cleanliness, the imaging methodology, chain-of-custody record, and the validation result.
  • Vendor IR communications: timestamped record of vendor advisories received, support tickets opened, vendor IR-team interaction.
  • Regulator submission receipts: every NIS2 / DORA / GDPR submission timestamp and acknowledgement.
  • Reconstructed attack timeline: signed timestamp chain from trigger observation through final report, no manual back-edits.

The pattern peer organisations report from 2025-2026 endpoint-platform incidents: the bottleneck is not the technical response, it is producing regulator-grade evidence under deadline when the primary source of truth (the EDR) is the thing under investigation. Continuous adversarial validation of the endpoint security platform — running a generative pentest engine that produces novel exploitation primitives against the deployed EDR configuration *before* the next CVE drops — is the engineering response Zero Hunt builds for, on the AI Generative Pentest rail. The same engine that pressure-tests the platform also produces the pre-signed exploitation-attempt evidence chain that closes the post-incident question "could you have known sooner".

Common failure modes

1. Treating "no EDR alert" as evidence of safety. This is the failure mode every CVE in this class is designed to exploit. CISA documents twelve previously or actively exploited Trend Micro Apex vulnerabilities — the pattern recurs. Build the runbook to fire on KEV-listing of *your* stack, independent of agent alerts.

2. Pulling agents off the fleet at first suspicion. Removes forensic state. Isolate the management plane from the agents instead — the agents on the endpoints are the only on-host instrumentation you have.

3. Starting the regulator clock from "internal confirmation of compromise" instead of "knew or should have known". Supervisors read the rule the second way. A 12-hour gap between KEV listing in your stack and your early-warning submission is a finding, regardless of internal investigation status.

4. Single-source detection during the response. If the only source contradicting the EDR is *another* host-resident tool, you have not actually escaped the trust assumption. The parallel channel must be off-host — network-side DPI, identity provider, cloud control plane.

5. Vendor-patch-only recovery. Reading the BleepingComputer Apex One report again: the CVE was patched, but the management plane had been a viable subversion path. Patch closes *this* primitive; the management-plane trust question stays open. Re-validate adversarially before declaring recovery.

6. Carving the EDR out of pentest scope. The historical "do not test the EDR, it confuses the SOC" rule is exactly the gap these CVEs exploit. Bring the endpoint security platform into continuous adversarial validation scope; do not leave it as the one component that never sees an adversary except in production.

Cross-regime and cross-framework notes

A single endpoint-platform incident typically triggers multiple parallel regulator notifications:

  • EU essential or important entity under NIS2: early warning at 24 hours, notification at 72 hours, final at 1 month — the trigger is "significant incident", and a KEV-listed CVE in a fleet-wide security platform almost always crosses the significance threshold.
  • EU financial entity under DORA: 4-hour initial notification, 72-hour intermediate, 1-month final — the endpoint security platform protects critical and important functions by definition; an incident affecting it crosses Art. 7 (critical services affected) of Reg (EU) 2024/1772.
  • GDPR Art. 33 if personal data was accessed via the suspect fleet (the lateral-set hosts are the test case) — 72 hours from awareness.
  • Sectoral: in Italy, Consob (investment services), IVASS (insurance), AgID (PA) — some sectoral clocks are shorter than 24 hours, in which case the sectoral notification dominates.
  • Law-enforcement liaison: for confirmed nation-state or organised-crime attribution, parallel notification to the national CERT (ACN, BSI, ANSSI) and law enforcement.

The operational discipline: one evidence base, multiple exports. The same signed timeline, IoC list, classification worksheet and root-cause document feeds every submission in different summary forms. Building the record once is the only way the 1-month final report does not consume the team — particularly when the response is also rebuilding the management plane in parallel. The methodology angle is covered in the existing NIS2 Title 13 incident timeline and DORA 4h/72h/1-month playbooks; the cross-pillar discipline is the same here, with the added rule that the primary evidence source (the EDR) cannot be the audit witness for an incident in which it is implicated.

Goes deeper

Want this against your environment?

Book a 30-minute scoping call — we will map this directly to your current compliance scope and threat profile.