u/waka324

▲ 14 r/Vllm

Qwen3.6-27B-FP8 with vllm:nightly, opencode unusable?

Hey all,

I'm at my wits end here, hoping someone might have some answers.

When using opencode (or forks like kilocode), after making some tool calls, inference on the backend stops, and opencode just waits until timeout.

I'm running on 4 RTX 8000s (SM75). I've tried all the chat templates, the coder and xml tool call parser, disabling reasoning, swapping between DFlash and MTP, but nothing seems to solve this issue.

Could this be a triton bug or something silly like that? I've had access to other, newer hardware at work that doesn't seem to display the same issue on flash based kernels.

reddit.com
u/waka324 — 2 days ago