
Open-source CLI for red-teaming LLM agents before they touch tools and memory
Sharing RedThread, an open-source CLI for AI red-team campaigns:
https://github.com/matheusht/redthread
The angle is AI agents as an attack surface. Prompt injection gets more interesting once the model can call tools, delegate to workers, write memory, retry failed actions, or propose guardrail changes.
RedThread is built for staging/internal targets. It runs LLM red-team campaigns, records traces, scores failures, and can replay exploit and benign cases before treating a defense as evidence.
Current pieces:
- PAIR, TAP, Crescendo, and GS-MCTS attack flows
- JudgeAgent/rubric scoring
- replay-backed defense proposals
- telemetry/drift signals
- agentic checks for tool poisoning, confused deputy paths, canary propagation, and budget amplification
It is not a magic prompt shield and not broad production enforcement.
Looking for people who test agent workflows and can suggest realistic failure cases or target adapters.