AI chatbots can pass QA and still fail badly
AI chatbots can look perfectly fine during demos and QA testing — then fail once real users start interacting with them.
Some of the issues I kept seeing while stress-testing chatbot APIs and AI agents:
- hidden instructions leaking
- support bots inventing policies
- tools/actions triggered unexpectedly
- memory/context confusion between sessions
- indirect prompt injection through retrieved content
The scary part is that many of these systems still technically “work” while producing the wrong outcome for the business or customer.
That’s why I built PromptBrake.
It stress-tests the actual AI/chatbot endpoint companies ship using repeatable adversarial scenarios to help catch risky behavior before deployment.
I also recently added a self-hosted deployment option so teams can run scans inside their own infrastructure without sending prompts, responses, or internal workflows to a third party.
I recorded a short demo showing a real chatbot API scan here: YouTube demo
Would genuinely love feedback from others building AI products:
- Are you testing chatbot behavior before launch?
- Are teams around you asking for self-hosted AI tooling yet?
- What’s the hardest part of validating AI agent behavior today?