u/AnyPaleontologist932

I spent 3 hours debugging a workflow that wasn't broken.

Qwen models have an internal reasoning mode. Before they answer, they sometimes stop and think — silently. Zero output. Zero progress bar. You're just staring at a frozen node wondering if it crashed.

It didn't crash. It's reasoning. And there was absolutely no way to see it.

So I forked the Qwen plugin and built ThinkingLLM.

What it does:

Live token streaming — every word appears in the terminal as the model generates it. You can literally watch it think in real time.

RAW_TRACE output — the full inner monologue preserved. Sometimes it's brilliant chain-of-thought. Sometimes the model decides the prompt is too easy and skips reasoning entirely. Now you can tell which is which.

Thinking toggle — let it reason before answering, or push for a direct one-shot response.

Supported models:

Qwen3.5, Qwen3-VL, Qwen2.5-VL, Qwen3, and Gemma 4 — both HF Transformers and GGUF/llama.cpp backends.

Tips for using it:

Pre-process input images with a resize node so large files don't blow up the context window

Connect the RESPONSE output to a Show Text or Show Anything node to read the answer

Connect RAW_TRACE to a second Show Text node to see what the model was thinking

It's free, open source (GPL-3.0), and installable through ComfyUI Manager.

GitHub: https://github.com/goodguy1963/ComfyUI-ThinkingLLM

Your local LLM node isn't frozen. The AI is thinking. I built a plugin so you can see it.