u/Doubt-Salt

Been watching the AI Engineer Europe + Miami talks from this spring, and one pattern keeps showing up across speakers: agents that compose many tools are hitting a ceiling, and "code mode" is the way through it.

The Cloudflare example is the sharpest version of it. Their full API as MCP tools is ~1.17M tokens. As an OpenAPI spec, ~2M tokens. That's most of a context window before the user has typed anything.

Their fix: expose two tools — search() and execute() — and let the agent write code against the discovered functions instead of calling each one as a tool. Token cost drops to ~1,069. 99.9% reduction.

But the real insight isn't the token math. It's where the orchestration step lives.

In tool calling, the harness owns the loop. The model picks one tool, result lands in context, model picks the next tool. Every step is an inference round trip even when the orchestration is mechanical (filter, paginate, retry, join).

In code mode, the model writes a program once, the program orchestrates the calls, and only the filtered return value reaches the model. The training story for why this works is mostly: LLMs have seen millions of real-world code projects in training, and very few tool calls. Kenton Varda from Cloudflare put it best — "Making an LLM do tasks by tool calling is like putting Shakespeare through a month of Mandarin and asking him to write a play in it."

I wrote up the full pattern: when to make the shift, when not to, what it actually costs (sandboxing, debugging, secrets).

https://x.com/sarthakarora128/status/2053966999521481083

Happy to dig into specific cases in comments if anyone's hit this ceiling.

Your agent doesn't need more tools. It needs to write code.