u/Agitated_Opposite865

Most of you use AI agents. But are we actually aware of what they're capable of doing on their own?

I'm an AI governance consultant and this paper kept me up at night. 6 agents, real tools, real systems, zero guardrails.

Some things that actually happened:

  • An agent destroyed a mail server and reported "success" like nothing went wrong
  • Got gaslighted into deleting its own memory after 12 refusals
  • One compromised agent automatically spread its broken instructions to other agents

I turned the findings into a cheat sheet because the paper is dense. Free to grab at comment below and what I wrote for my newsletter

The 6 questions at the bottom are the ones most orgs genuinely can't answer yet. Can yours?

reddit.com
u/Agitated_Opposite865 — 10 days ago