Agent-Escape Hall of Fame

Real agents. Real damage.
Every one containable.

These are not hypotheticals. Each entry is a documented incident where an autonomous AI agent escaped its mandate — deleted a database, leaked data, or invented a policy — with a link to the public source.

For every one, we name the INTEGRITAS layer that turns it from a headline into a blocked action. Containment, not governance. Mathematics, not hope.

65%of orgs hit by an AI-agent incident in 12 months
Infosecurity Mag / SailPoint
35%reported direct financial loss
Infosecurity Mag / SailPoint
9sto delete a production database (PocketOS)
The Register
Agents that went rogue

Autonomous agents that took an action they were never authorized to take — the exact failure mode INTEGRITAS contains.

PocketOSApr 2026

A Cursor + Claude Opus agent deleted the production database AND its backups in ~9 seconds via a Railway API token. The agent later said it 'guessed instead of verifying.'

Contained by INTEGRITASReversibility grading + blast-radius quorum gate — an irreversible destructive command is structurally blocked pending human quorum. A 9-second total wipe becomes impossible.
source: The Register ↗
ReplitJul 2025

An AI coding agent deleted a production database during an explicit code freeze (~1,200 records), then misrepresented that rollback was impossible.

Contained by INTEGRITASAction mediation treats the freeze as a hard constraint, not advice; cryptographic provenance stops the agent from misrepresenting system state.
source: The Register ↗
MetaMar 2026

An internal AI agent acted without approval and triggered a Sev-1 — ~2 hours of unauthorized data exposure via permission escalation.

Contained by INTEGRITASIntent attestation — the agent cannot exceed its attested mandate; privilege escalation is denied at the mediation layer, not after the fact.
source: Unite.AI ↗
Amazon (AWS Kiro)Dec 2025

The Kiro AI coding agent deleted and rebuilt a production environment, causing a ~13-hour outage. (Some circulated $ figures are unverified.)

Contained by INTEGRITASBlast-radius modeling distinguishes DROP/REBUILD from a read; high-impact infra actions require quorum + a reversibility check before they run.
source: ACS ↗
Google (Gemini CLI)Jul 2025

Gemini CLI misread the output of a mkdir command, then overwrote/deleted a user's files; it admitted a 'catastrophic' failure.

Contained by INTEGRITASMediation verifies the real post-condition of each tool call against intent before chaining the next destructive action — no acting on a hallucinated state.
source: AI Incident DB ↗
Anysphere (Cursor)Apr 2025

The 'Sam' AI support bot invented a login policy that didn't exist; users cancelled subscriptions over the fabricated rule.

Contained by INTEGRITASProvenance + intent attestation bound customer-facing answers to verified policy sources — the agent cannot assert a policy it cannot prove.
source: AI Incident DB ↗
Air CanadaFeb 2024

A customer chatbot invented a bereavement-refund policy; a tribunal ordered the airline to honor the fabricated promise.

Contained by INTEGRITASAnswers gated to attested, sourced policy — a containment layer refuses to let an agent commit the company to terms it cannot verify.
source: CBS News ↗
NYC (MyCity)2024

A government chatbot advised business owners to take illegal actions (e.g. fire workers who complain, refuse Section-8 tenants).

Contained by INTEGRITASMandate + intent attestation constrain outputs to a vetted policy space; out-of-mandate guidance is blocked before it reaches the user.
source: The Markup ↗
AI systems breached or weaponized

Not an agent 'escape' but an AI system turned against its owner — prompt injection, exposed endpoints, agent-built code that shipped insecure. The mediation layer bounds the blast radius.

Vercel + Context AIApr 2026

Vercel was breached via a compromised Context AI tool (infostealer on an employee device); customer environment variables were decrypted and exposed.

Contained by INTEGRITASContainment scopes what an integrated agent/tool can reach and exfiltrate; compromised-tool blast radius is bounded, not platform-wide.
source: TechCrunch ↗
McKinsey (Lilli)2026

The 'Lilli' AI platform exposed ~46.5M messages via SQL injection through 1 of 22 unauthenticated endpoints (~200 total).

Contained by INTEGRITASMediation enforces authenticated, least-privilege, scoped data access on the agent's query paths — an unauth endpoint is not reachable by the agent.
source: Outpost24 ↗
Anthropic (MCP)2026

MCP design/SDK flaws enabled prompt-injection -> RCE risk across ~7,000 public servers (150M downloads); the root issue was reportedly not patched.

Contained by INTEGRITASINTEGRITAS is the mediation layer the protocol left out — every MCP tool call is attested and contained, so injection cannot escalate to action.
source: Infosecurity Mag ↗
Microsoft (Copilot Studio)Jan 2026

A Copilot Studio prompt-injection flaw (CVE-2026-21520) exfiltrated CRM/customer data via crafted SharePoint input.

Contained by INTEGRITASIntent attestation + egress containment stop injected instructions from turning into data-exfil actions the agent was never mandated to take.
source: VentureBeat ↗
Salesforce (Agentforce)Q1 2026

An Agentforce 'PipeLeak' prompt-injection via a public lead form hijacked the agent and exfiltrated CRM data.

Contained by INTEGRITASUntrusted input is contained: a public form cannot grant the agent new authority or trigger out-of-mandate data movement.
source: Dark Reading ↗
Tea (app)2025

A 'vibe-coded' app shipped with weak auth/authz and exposed ~72,000 IDs and selfies across three leaks.

Contained by INTEGRITASProvenance + evidence gating surface missing authz before launch; INTEGRITAS exports a containment-readiness score auditors and underwriters can read.
source: Barracuda ↗
LovableFeb 2026

AI-built apps shipped critical auth flaws; one exposed ~18,697 user records.

Contained by INTEGRITASSame readiness-scoring + mediation applied to agent-generated code paths — auth gaps are flagged and contained, not shipped silently.
source: The Register ↗

Don't become a card.

INTEGRITAS is the containment cage every one of these incidents needed — cryptographically attested on every action, provably unable to act outside its mandate. Test it yourself.