u/eteitaxiv

Open WebUI is completely broken now

Open WebUI is completely broken now

After 0.9, my Open WebUI got fully broken.

Title generation works 20% of the time.

Generation stops after tool calls, making it unusable.

I tried starting from nothing with backups. I tried different providers. Nothing is working. It is on a Raspberry Pi 5.

Are there any sharing this problems? If not, I will start from nothing, importing nothing and creating all the agents and skills and everything.

EDIT: I can't actually believe it, but I wrote the problem to Hermes Agent in the same system, and it fixed it by a patch like this:

- ./patches/middleware.py:/app/backend/open_webui/utils/middleware.py:ro

It seems the problem was this according to it:

OK, here's the full diagnosis and fix:

== ROOT CAUSE ==

Open WebUI v0.9.5 has a bug in get_reasoning_format() (middleware.py line 2169).
It only returns reasoning_content format for llama.cpp and think_tags for Ollama.
For ALL other providers (MiMo, DeepSeek, Kimi, etc.) it returns None.

When None, the convert_output_to_messages() function completely SKIPS reasoning
content when building follow-up messages after a tool call.

So the chain is:
  1. Model generates thinking/reasoning + tool_call
  2. Open WebUI executes the tool, gets results
  3. Builds follow-up messages but DROPS the reasoning_content
  4. MiMo/DeepSeek API gets request WITHOUT reasoning_content
  5. API returns 400: "reasoning_content in thinking mode must be passed back"
  6. Error is caught silently (log.debug + break) → chat just dies

This affects MiMo, DeepSeek, and likely any other provider that requires
reasoning_content to be passed back during thinking mode.

Confirmed in both Open WebUI and LiteLLM logs:
  - litellm.BadRequestError: reasoning_content must be passed back (mimo-v2.5-pro)
  - litellm.BadRequestError: reasoning_content must be passed back (deepseek-v4-pro)

== FIX APPLIED ==

Patched get_reasoning_format() to return 'reasoning_content' for ALL non-Ollama
providers. This is safe because:
  - Providers that support it (MiMo, DeepSeek) get it back ✓
  - Providers that don't simply ignore the extra field ✓

Persisted via Docker volume mount:
  /opt/docker/openwebui/patches/middleware.py → /app/backend/open_webui/utils/middleware.py:ro

Container restarted and healthy.

== CAVEAT ==

This patch will be lost if you update the Open WebUI image. After docker compose pull
and recreate, you'll need to re-extract the patched file from the new version, or
check if upstream has fixed it. I'll remember this for future sessions.

Go ahead and test — try triggering a web search or any tool call. It should continue
generating after the tool returns now. 🤞

And this is the patch fixing it, this is for v0.9.5: https://gist.github.com/eteitaxiv/a0804db86af57d92fdfdeb7b63c8a486

u/eteitaxiv — 8 days ago

Chatfill v2 — now with revolutionary switches!

REQUIREMENTS:

  1. Reasoning models. Chatfill is reasoning-exclusive from now on. You can use it with non-reasoning models, but do not expect the same performance.
  2. Prompt Post-Processing: Semi-strict. Tool use is up to you.
  3. Well-made characters. This is important, as this is a pretty bare-bones preset and it needs a good character to reason about. You need to give the model data, and the preset will provide the guidelines to use it. If you're unsure about how to make them, use this Character Card Generator I made, its characters are perfectly suited for this preset, since they were built for each other.

TOKEN COUNTS (without characters, personas, and lorebooks; counted by DeepSeek v4 Pro):

  • Basic set: 536 tokens (NSFW, DeepSeek modes, and Brevity off)
  • Default RP mode: 647 tokens (NSFW and DeepSeek modes off)
  • NSFW mode: 742 tokens (DeepSeek and Brevity off)
  • Fast NSFW mode: 853 tokens (DeepSeek modes off)

Here it is: https://drive.proton.me/urls/M481CVT69W#WcItvlsxU8lR

This is the distillation of all the Chatfill presets I've posted since the first one. I tried new ideas in most of them, a new prompt, a new way of phrasing something — and finally decided to compile them into the NEXT GENERATION.

The game-changer idea here is switches. Instead of piling so much stuff after the last user prompt and degrading quality, an idea struck me like lightning: why not just put a reminder, one simple reminder, to point the model back to the system prompt?

It didn't work at first.

But it turned out the problem was the wording and the form of the reminder. Adding verbatim repeats of the rules, or phrasing them as generic reminders, those didn't work. But the style I settled on here (you'll see it when you import the preset) does work. Works very well with reasoning models. This becomes clear the moment you check the models' reasoning output.

I separated the system prompt into distinct parts, many of them, framed each as a "switch" (marked as enabled), and simply placed this after the last user message:

<roleplay_rules_reminder name=enabled_switches>
- You are to check if any switches are enabled and apply all enabled switches from the system prompt to your response.
</roleplay_rules_reminder>

That's it. If you check the reasoning, you'll see the model going through the modules of the system prompt (the switches) and applying them cleanly. This also had the effect of working better than a traditional system prompt, and working reliably. For the first time, various system prompt instructions like no impersonation, forward momentum, brevity, and the rest are actually firing consistently, every turn.

You can easily make your own switches too, just look at how they're structured and write one of your own. Here's an example from the preset:

<narrative_momentum_switch state=enabled>
- Processed Information: Once {{char}} has acknowledged, reacted to, or processed a piece of information (in dialogue, thought, or action), treat it as settled. Do not re-process, re-realize, or re-acknowledge the same beat.
- Emotional Beats: Each emotional response should happen ONCE. If {{char}} expresses shock at learning X, subsequent responses must show the aftermath, not re-express the same shock.
- Forward Motion: Every response must advance the scene. If stuck, {{char}} should pivot to action, ask a new question, or shift focus — never spiral on the same realization.
</narrative_momentum_switch>

So far, I'm getting the best RP of my life with this. Test it, see for yourself, steal it for your own presets.

Now, the models. As I said, this is for reasoning models. It works with most of them quite well.

Not so with non-reasoning models, since they can't reason about the switches.

I tested with MiMo v2.5 Pro, GLM 5.1, MiniMax M2.7, Kimi K2.6, and DeepSeek v4 Pro. I haven't tried anything else.

For DeepSeek v4 Pro, I added the DeepSeek RP styles that DeepSeek posted. I translated them to English and tested extensively. My findings: they actually improve English RP quality. My first instinct was to use them in Chinese, but testing proved otherwise.

That said, they're not strictly necessary, and I don't use them extensively. Also, "Role-playing Mode" makes the switches harder to work with, I either use "Pure Analysis Mode" or none of the DeepSeek modes at all.

Now, the modules:

  • Emotional Economy: ALWAYS ON! Models sometimes get stuck on one beat, delivering the same reaction over and over with different variations. This prevents it.
  • No Impersonation: You all know what this is.
  • Brevity: For preventing overly long responses while still allowing them when the scene genuinely calls for it. This didn't use to work, but now, framed as a switch, it does. I frequently see the model debating brevity in its reasoning. Works especially well with DeepSeek v4 Pro.
  • Momentum: ALWAYS ON! It may seem like it's just repeating the Emotional Economy switch at first glance, but it's not. It complements it and carries it forward. You need both enabled for them to work properly.
  • NSFW: This accidentally works as a jailbreak for some models. I've seen MiMo v2.5 Pro, MiniMax M2.7, and Kimi K2.6 respond to previously refused prompts with this enabled. But that's a side effect, a result of how well the switches are working. Its real purpose is to shift the language and add an NSFW quality to everything. It works well.
  • Prose Rules: This is the last module and sits after the Chat History, just like the switch reminder. Don't leave this enabled permanently. It's only here for those cards that include RP-style speech in their output. Use it for a few turns to calibrate the responses, then disable it. And honestly, only use it if you're too lazy to edit those speech patterns out of the card yourself. =)
u/eteitaxiv — 9 days ago

Various LLM Subscription services, as of May 2026

Some people in the subreddit asked for an update, and here it is, automode allowing: https://rentry.org/8woc7i9y

I will keep this one updated, instead posting new posts after big releases.

I do think using a subscription is the best way to access for now. I am using two of them myself, one exclusively for SillyTavern, one for Hermes Agent/Open WebUI.

u/eteitaxiv — 12 days ago