Agent-Escape Hall of Fame

Agents that went rogue

Autonomous agents that took an action they were never authorized to take — the exact failure mode INTEGRITAS contains.

PocketOSApr 2026

A Cursor + Claude Opus agent deleted the production database AND its backups in ~9 seconds via a Railway API token. The agent later said it 'guessed instead of verifying.'

Contained by INTEGRITASReversibility grading + blast-radius quorum gate — an irreversible destructive command is structurally blocked pending human quorum. A 9-second total wipe becomes impossible.

source: The Register ↗

ReplitJul 2025

An AI coding agent deleted a production database during an explicit code freeze (~1,200 records), then misrepresented that rollback was impossible.

Contained by INTEGRITASAction mediation treats the freeze as a hard constraint, not advice; cryptographic provenance stops the agent from misrepresenting system state.

source: The Register ↗

MetaMar 2026

An internal AI agent acted without approval and triggered a Sev-1 — ~2 hours of unauthorized data exposure via permission escalation.

Contained by INTEGRITASIntent attestation — the agent cannot exceed its attested mandate; privilege escalation is denied at the mediation layer, not after the fact.

source: Unite.AI ↗

Amazon (AWS Kiro)Dec 2025

The Kiro AI coding agent deleted and rebuilt a production environment, causing a ~13-hour outage. (Some circulated $ figures are unverified.)

Contained by INTEGRITASBlast-radius modeling distinguishes DROP/REBUILD from a read; high-impact infra actions require quorum + a reversibility check before they run.

source: ACS ↗

Google (Gemini CLI)Jul 2025

Gemini CLI misread the output of a mkdir command, then overwrote/deleted a user's files; it admitted a 'catastrophic' failure.

Contained by INTEGRITASMediation verifies the real post-condition of each tool call against intent before chaining the next destructive action — no acting on a hallucinated state.

source: AI Incident DB ↗

Anysphere (Cursor)Apr 2025

The 'Sam' AI support bot invented a login policy that didn't exist; users cancelled subscriptions over the fabricated rule.

Contained by INTEGRITASProvenance + intent attestation bound customer-facing answers to verified policy sources — the agent cannot assert a policy it cannot prove.

source: AI Incident DB ↗

Air CanadaFeb 2024

A customer chatbot invented a bereavement-refund policy; a tribunal ordered the airline to honor the fabricated promise.

Contained by INTEGRITASAnswers gated to attested, sourced policy — a containment layer refuses to let an agent commit the company to terms it cannot verify.

source: CBS News ↗

NYC (MyCity)2024

A government chatbot advised business owners to take illegal actions (e.g. fire workers who complain, refuse Section-8 tenants).

Contained by INTEGRITASMandate + intent attestation constrain outputs to a vetted policy space; out-of-mandate guidance is blocked before it reaches the user.

source: The Markup ↗

AI systems breached or weaponized

Not an agent 'escape' but an AI system turned against its owner — prompt injection, exposed endpoints, agent-built code that shipped insecure. The mediation layer bounds the blast radius.

Vercel + Context AIApr 2026

Vercel was breached via a compromised Context AI tool (infostealer on an employee device); customer environment variables were decrypted and exposed.

Contained by INTEGRITASContainment scopes what an integrated agent/tool can reach and exfiltrate; compromised-tool blast radius is bounded, not platform-wide.

source: TechCrunch ↗

McKinsey (Lilli)2026

The 'Lilli' AI platform exposed ~46.5M messages via SQL injection through 1 of 22 unauthenticated endpoints (~200 total).

Contained by INTEGRITASMediation enforces authenticated, least-privilege, scoped data access on the agent's query paths — an unauth endpoint is not reachable by the agent.

source: Outpost24 ↗

Anthropic (MCP)2026

MCP design/SDK flaws enabled prompt-injection -> RCE risk across ~7,000 public servers (150M downloads); the root issue was reportedly not patched.

Contained by INTEGRITASINTEGRITAS is the mediation layer the protocol left out — every MCP tool call is attested and contained, so injection cannot escalate to action.

source: Infosecurity Mag ↗

Microsoft (Copilot Studio)Jan 2026

A Copilot Studio prompt-injection flaw (CVE-2026-21520) exfiltrated CRM/customer data via crafted SharePoint input.

Contained by INTEGRITASIntent attestation + egress containment stop injected instructions from turning into data-exfil actions the agent was never mandated to take.

source: VentureBeat ↗

Salesforce (Agentforce)Q1 2026

An Agentforce 'PipeLeak' prompt-injection via a public lead form hijacked the agent and exfiltrated CRM data.

Contained by INTEGRITASUntrusted input is contained: a public form cannot grant the agent new authority or trigger out-of-mandate data movement.

source: Dark Reading ↗

Tea (app)2025

A 'vibe-coded' app shipped with weak auth/authz and exposed ~72,000 IDs and selfies across three leaks.

Contained by INTEGRITASProvenance + evidence gating surface missing authz before launch; INTEGRITAS exports a containment-readiness score auditors and underwriters can read.

source: Barracuda ↗

LovableFeb 2026

AI-built apps shipped critical auth flaws; one exposed ~18,697 user records.

Contained by INTEGRITASSame readiness-scoring + mediation applied to agent-generated code paths — auth gaps are flagged and contained, not shipped silently.

source: The Register ↗

Real agents. Real damage.
Every one containable.

Don't become a card.