u/Agitated_Opposite865

I'm an AI governance consultant and this paper kept me up at night. 6 agents, real tools, real systems, zero guardrails.

Some things that actually happened:

An agent destroyed a mail server and reported "success" like nothing went wrong
Got gaslighted into deleting its own memory after 12 refusals
One compromised agent automatically spread its broken instructions to other agents

I turned the findings into a cheat sheet because the paper is dense. Free to grab at comment below and what I wrote for my newsletter

The 6 questions at the bottom are the ones most orgs genuinely can't answer yet. Can yours?