u/Apprehensive-Zone148

Open-source CLI for repeatable prompt-injection and jailbreak testing

Sharing RedThread, an open-source CLI for LLM red-team campaigns:

https://github.com/matheusht/redthread

It is meant for repeatable testing, not one-off prompt lists.

Current flow:

  • generate attacks with PAIR, TAP, Crescendo, or GS-MCTS
  • run multi-turn traces
  • score the trace with JudgeAgent/rubrics
  • generate candidate guardrails for confirmed failures
  • replay exploit and benign cases before saving evidence

It also has checks for agentic cases like tool poisoning, confused deputy behavior, canary propagation, and budget amplification.

Useful if you are testing system prompts, comparing attack strategies, or trying to turn a prompt-injection failure into a regression case.

I am looking for safe fixture categories and scoring rubrics, not raw jailbreak dumps.

reddit.com

Open-source CLI for LLM red-team campaigns with replayable evidence

Sharing RedThread, an open-source CLI for LLM/agent red-team campaigns:

https://github.com/matheusht/redthread

The project is aimed at people building LLM apps where prompt injection, RAG/tool output, or agent delegation can turn into real actions.

The workflow is campaign-oriented:

  • run PAIR, TAP, Crescendo, or GS-MCTS attacks
  • record the multi-step trace
  • score the result with rubrics
  • isolate the failure
  • generate a candidate defense
  • replay exploit and benign cases before treating the defense as evidence

The main thing I am trying to avoid is noisy "scanner found scary text" output. A useful finding should preserve the prompt path, tool/action sequence, environment assumptions, failure class, and replay result.

It is CLI-first, not a hosted guardrail service, and not claiming universal production enforcement.

Would love feedback from LLM devs on target adapters, false positives, and what evidence format would actually be useful in CI or review.

reddit.com
▲ 27 r/vibehacking+6 crossposts

Open-source CLI for red-teaming LLM agents before they touch tools and memory

Sharing RedThread, an open-source CLI for AI red-team campaigns:

https://github.com/matheusht/redthread

The angle is AI agents as an attack surface. Prompt injection gets more interesting once the model can call tools, delegate to workers, write memory, retry failed actions, or propose guardrail changes.

RedThread is built for staging/internal targets. It runs LLM red-team campaigns, records traces, scores failures, and can replay exploit and benign cases before treating a defense as evidence.

Current pieces:

  • PAIR, TAP, Crescendo, and GS-MCTS attack flows
  • JudgeAgent/rubric scoring
  • replay-backed defense proposals
  • telemetry/drift signals
  • agentic checks for tool poisoning, confused deputy paths, canary propagation, and budget amplification

It is not a magic prompt shield and not broad production enforcement.

Looking for people who test agent workflows and can suggest realistic failure cases or target adapters.