u/deafpigeon39

I have been banging my head against this for months. You ask DeepSeek a question, it answers fine. You ask a follow up, boom. HTTP 400. Same with Kimi, same with GLM, same with MiMo and MiniMax. I thought the models were broken. They are not. The clients are.

This is what is actually happening.

These models think before they speak. Not metaphorically, actually. They output a hidden field called reasoning_content, basically their internal notes. "User wants a weather app, I should check API docs, maybe use React..." You never see this field. It is invisible. But the model needs it back on the next turn.

OpenCode drops it. Cursor drops it. Claude Code drops it. VS Code Copilot drops it. Every single tool built against the OpenAI spec drops it, because reasoning_content is not in the OpenAI spec. It is a proprietary extension that DeepSeek, Kimi, GLM, MiniMax and Xiaomi MiMo all require anyway.

The first turn always works because there is no history to validate. So you test one round trip, it passes, you ship it, and your real users hit the wall on turn 2. This has been sitting in the open since January. Five months.

I know this because I logged it. My plugin has patched 12,551 messages across 200+ real sessions. Every single one of them was missing reasoning_content that should have been there. The plugin just fills the gap so the model can keep going.

The providers literally warn about this in their docs.

DeepSeek: "If your code does not correctly pass back reasoning_content, the API will return a 400 error."

Kimi: "You must keep the reasoning_content of every historical assistant message."

GLM: "When using interleaved thinking plus tools, you must explicitly preserve reasoning content."

MiMo: "Any assistant message with tool calls must preserve its full reasoning_content field, otherwise the API will return a 400 error."

MiniMax: "The complete model response must be appended to maintain reasoning chain continuity."

All five say the same thing. All five get ignored by the same clients.

I scanned Chinese, Russian and Western dev communities for evidence. The same bug shows up everywhere, independently.

15 OpenCode GitHub issues. One from January 28, 2026. Three PRs tried to fix it. None merged.

101K Russian developers read about GLM errors in OpenCode on Habr. A Russian dev patched LangChain source himself because the maintainers said they will not add support for provider specific fields.

31K Chinese developers viewed a cnblogs article explaining the workaround. A Tencent Cloud user wrote: "Feels like most people are hitting this. Qclaw and Workbuddy are dragging their feet, almost a month without fixing."

The CodeRouter blog put it best: "Your multi turn agent will deterministically 400 on turn 2. Affects every major agent framework."

17 platforms total. OpenCode, Cursor, VS Code Copilot, JetBrains, Roo Code, Kilo Code, n8n, Continue.dev, Claude Code Router, Codex CLI, GitHub Copilot, Make, OmniRoute, ZeroClaw, OpenClaw, Qwen Code, Hermes Agent.

That is not a provider bug. That is a protocol famine. The OpenAI spec has no slot for reasoning notes, so every client built on it silently drops them. Chinese providers built thinking mode on top anyway. The result is a five month old bug that breaks the cheapest and most capable models on the market.

I built a fix because I got tired of waiting. It is three layers, use what you need.

Plugin stops the crashes. 92 lines. Drop it in, restart OpenCode. Detects reasoning models and fills missing reasoning_content with empty strings. No more 400s.

Proxy replays real thinking. 422 lines. Runs on localhost:3457. Caches actual reasoning text per session, injects it on the next turn. Your model sees its own notes and keeps going like nothing happened.

Watchdog keeps the proxy alive. Systemd service, set and forget.

They stack. Plugin is the safety net, proxy is the optimization, watchdog is insurance.

If you maintain any tool that routes to DeepSeek, Kimi or GLM, check your message serialization. If you are building {role: "assistant", content: msg.content} from the response, you are dropping reasoning_content and your users are hitting this wall right now. They just are not telling you because they switched to Claude and moved on. The models are fine. The spec is the problem. The fix is simple. Someone just had to ship it. You can find logs in npm - sdk@openai-compatible Qwen does not have this problem.

If you're on the OpenCode Go subscription and use anything besides Qwen, you've probably hit this:

Turn 1: You ask something. It answers.
Turn 2: You follow up. HTTP 400.
"The reasoning_content in the thinking mode must be passed back to the API"

The Go model list has 15 models. 12 of them - DeepSeek V4 Pro/Flash, Kimi K2.7/K2.6, GLM 5.2/5.1, MiMo V2.5/V2.5 Pro, MiniMax M3/M2.7/M2.5, all produce reasoning traces. Every one of them needs that field present on every assistant message in history. OpenCode strips it. Qwen 3.7 Max/Plus/Plus is the only one that doesn't hit this because it doesn't expose reasoning in the API at all.

So if you're paying for Go and using anything other than Qwen, multi-turn conversations are basically broken.

DeepSeek's docs say it straight: "Between two user messages, if the model performed reasoning, the intermediate assistant's reasoning_content must be passed back to the API in all subsequent turns."

I spent a few hours checking keys, network, config, the usual stuff,before I noticed the 400 came back on turn 2 every single time, no matter what I asked. People have been complaining about this. There are open issues across OpenCode, Cline, Codex, and Copilot. Three PRs tried to fix it in OpenCode. None went through. The field is non-standar, OpenAI's spec doesn't include reasoning_content, so every tool in the chain just drops it.

I ended up with two fixes, both on GitHub (tbosancheros39/opencode-thinking-fix):

The fast one is a single TypeScript file you drop into ~/.config/opencode/plugins/. It hooks OpenCode's message pipeline and injects the missing field on every outgoing assistant message. The model doesn't see what it was thinking before, but the conversation stops breaking.

The better one is a small Node.js proxy (zero deps, uses http/https built-ins) that runs on localhost. It captures the real reasoning text as it streams back from the model, caches it, and injects the actual content on the next turn. Actually matters when you're 10 turns deep and want the model to remember its own reasoning.

For OpenCode Go specifically: the SSE field is called "reasoning" instead of "reasoning_content" in the stream. The proxy handles both.

Opencode/Anomaly reported issues : #24190, #24104, #24722, #25311, #25134, #25000, #24124, #24130, #24261, #24442, #24569

I just got sick of not having a fix and wanted to share what worked.

Anyone else been dealing with this? Did I miss a simpler solution?

EDIT : https://github.com/anomalyco/opencode/issues
Please go through issues people , there are people reporting the problem even today, opencode API is not passing the reasoning content with multiple tool calls.

IF you don't orchestrate and if you are not heavy on tool calling , your opencode will work fine people.

Every Chinese reasoning model has the same 400 error on turn 2. www.github.com/tbosancheros39/opencoded-thinking-fix

If you pay for OpenCode Go, 12 of 15 models break on follow-ups. Here's why superior LLMs were having errors.