Our Zendesk AI bot got worse because we made the LLM do too much
we built an automated triage bot for Zendesk tickets. its main job is to pull order history from our internal API to answer basic "where is my order" questions. but the generated replies got messy lately, like quoting wrong dates or hallucinating package statuses, so our reps had to rewrite half of them. At first we thought it was a context window issue and wasted days tweaking system prompts and vector chunk sizes. none of that actually helped.
it finally clicked when we mapped out the data flow. we were basically dumping raw API JSON and messy search results straight into a single massive Claude prompt. the LLM was getting confused trying to parse the data arrays and write a polite email at the same time. The fix wasn't better prompt engineering, it was breaking up the workflow. So we stripped out the basic logic tasks and gave them back to standard Python scripts. there's no reason to pay an LLM to read a json payload and calculate if a shipping date is delayed when simple code does it perfectly.
We also moved the LLM calls behind a gateway so we could trace each step instead of guessing where the reply went wrong. Were using ZenMux right now, mostly for logging, routing, and keeping the multi-model flow stable. now we use kimi2.6 to clean the context, then code handles the business logic, and deepseek V4 only drafts the final email based on the clean data. The difference was huge. latency dropped significantly and the accuracy stabilized because the system isn't juggling ten tasks at once anymore.
do you guys build your own custom routers for multi-step agents?