
How do you actually test a voice AI agent without calling it yourself every time?
So we've been working on a voice bot that handles customer calls and honestly the testing part has been brutal. We were literally calling the thing ourselves to check if it broke after every change.
Eventually we just wrote a framework that synthesizes fake caller audio, pipes it into the agent, and checks if the response is sane — latency, hallucinations, whether it handles interruptions, etc. Runs locally against a SQLite db, no cloud stuff.
It connects over websockets, can mock twilio streams, works with elevenlabs and vapi agents too. You can also plug in ollama as the judge so the whole thing runs offline.
We open sourced it: https://github.com/unforkopensource-org/decibench
Curious how others here handle this. Are you just vibing and hoping production doesn't break or is there a better workflow I'm missing?