r/OpenWebUI

▲ 100 r/OpenWebUI+2 crossposts

Open-Source Microsoft Office Extensions for Open WebUI

Ciao community di Open WebUI 👋

Sono Nick, faccio parte del team di Ianustec e siamo grandi fan di Open WebUI da molto tempo.

Apprezziamo molto ciò che questa community sta costruendo attorno all'IA open source e self-hosted, quindi volevamo dare il nostro contributo.

Al momento stiamo sviluppando una suite completamente open source di estensioni per Microsoft Office progettate per funzionare con Open WebUI, incluse integrazioni per:

  • PowerPoint
  • Word
  • Excel
  • Outlook

Il nostro obiettivo è rendere i flussi di lavoro di IA nativi all'interno di Microsoft Office, mantenendo tutto aperto, flessibile e compatibile con l'ecosistema di Open WebUI.

Alcune delle cose su cui stiamo lavorando:

  • Creazione di documenti con l'ausilio dell'IA in Word
  • Analisi e automazione di fogli di calcolo in Excel
  • Creazione e modifica di presentazioni in PowerPoint
  • Stesura e riepilogo di email in Outlook

Tutto verrà rilasciato come open source.

Ci piacerebbe anche collaborare con la community e conoscere le vostre opinioni:

  • Quali funzionalità vi sarebbero più utili?
  • Cosa renderebbe questi strumenti davvero preziosi nel vostro flusso di lavoro quotidiano?

Siamo entusiasti di collaborare con questa community e contribuire all'ecosistema 🚀

u/NicErGoblin9 — 1 day ago

Sync Directory feature for Knowledge base is poorly explained. Or is it just me?

I have used this feature for RAG ingestion, but when I go to "sync" it tells me it's going to delete the entire knowledge base and recreate it from scratch from the files in the directory. This is not what I would call "syncing".

What is the difference between syncing a directory and just deleting the old knowledge base and adding a new one pointing at the same source?

The Open WebUI documentation does not describe Sync Directory at all and I have not found any other discussions about it.

reddit.com
u/zowiebowie2 — 1 day ago
▲ 24 r/OpenWebUI+14 crossposts

I wanted to check Epstein files, without spending too much time on them. And spent too much time on them

So yeah. AI tool to talk to Epstein and his files

youtu.be
▲ 2 r/OpenWebUI+1 crossposts

I installed Open WebUI, upon installation it asked me to install Ollama, I skipped it, can I still install it now afterwards? I want to use local LLMs

Hi, I'm new to Open WebUI, I installed Open WebUI in Docker, it works well.

However, upon installation it asked me to install Ollama locally, I skipped it. Can I still install it now, afterwards? I want to use some local LLMs.

I tried to search for the answer myself but couldn't find it.

reddit.com
u/sarrcom — 2 days ago

Massive problems with 0.9.5 and Bedrock-Access-Gateway

I've decided to create a chat tool that is OpenWebUI in front of Bedrock Access Gateway.

I started with 0.9.1 and discovered that chat sharing was broken.

Then I noticed that 0.9.5 was released and downloaded it.

Now half of the Bedrock models streaming chat disappears after being visible (minimax m2.5, nova-pro-v1, glm-4.7-flash to name a few). And the "usage" toggle no longer pushes any token analytics into the postgres backend.

Does anyone have suggestions as to how to think through a stable, operational OWUI chat tool where the fundamental pieces like this will completely break with minor version changes??

Note: I upgraded BAG too. So apparently these "openai-compatible" bindings may not actually be as compatible as both BAG and OWUI make them out to be?

reddit.com
u/genobobeno_va — 2 days ago

Open WebUI + ComfyUI Image Editing Issue

Hi, I have Open WebUI setup and working perfectly for image generation but I am having issues with getting image editing to work correctly.

In Open WebUI I have image editing enabled, have uploaded the exported (API) workflow, and mapped both the prompt and image nodes.

The ComfyUI workflow works perfectly fine within ComfyUI and produces edits as asked, but in Open WebUI I get the original image back or ever so slightly modified as the prompt is not being passed correctly.

Below is my docker log from Open WebUI and from this I can see the image is being passed ok, but the prompt is being passed to the CLIPTextEncode node but under the element 'prompt' which does not exist within the actual node in ComfyUI.

I cannot work out why the prompt isn't being passed to the 'text' element.

I have deleted and recreated the CLIPTextEncode, I have tried a multi-line String hooked up to the CLIPTextEncode node but get the same result with the prompt being inserted into a newly created 'prompt' element.

Any help would be greatly appreciated.

Using Open WebUI v0.9.5 and ComfyUI v0.21.1.

(Edited to try and get the log over multiple lines to read easier.)

open_webui.utils.images.comfyui:comfyui_edit_image:247 - Workflow: {'9': {'inputs': {'filename_prefix': 'Flux2-Klein', 'images': ['80', 1]}, 
'class_type': 'SaveImage', '_meta': {'title': 'Save Image'}}, '76': {'inputs': {'image': '76037bd5-5e49-4cf1-8be9-464cb4a91dce.png'}, 'class_type': 
'LoadImage', '_meta': {'title': 'Load Image'}}, '80': {'inputs': {'empty_cache': True, 'gc_collect': True, 'unload_all_models': True, 
'image_pass': ['75:65', 0]}, 'class_type': 'VRAM_Debug', '_meta': {'title': 'VRAM Debug'}}, '81': {'inputs': {'value': ['80', 4]}, 'class_type': 
'UnloadAllModels', '_meta': {'title': 'UnloadAllModels'}}, '82': {'inputs': {'text': '', 'clip': ['75:71', 0], 'prompt': 'can you add a food bowl and some cat toys to this picutre'}, '
class_type': 'CLIPTextEncode', '_meta': {'title': 'CLIP Text Encode (Prompt)'}}, '75:61': {'inputs': {'sampler_name': 'euler'}, 'class_type': 'KSamplerSelect', 
'_meta': {'title': 'KSamplerSelect'}}, '75:73': {'inputs': {'noise_seed': 1054275450481062}, 'class_type': 'RandomNoise', '_meta': {'title': 'RandomNoise'}}, 
'75:70': {'inputs': {'unet_name': 'flux-2-klein-9b-fp8.safetensors', 'weight_dtype': 'default'}, 'class_type': 'UNETLoader', '_meta': {'title': 'Load Diffusion Model'}}, 
'75:71': {'inputs': {'clip_name': 'qwen_3_8b_fp8mixed.safetensors', 'type': 'flux2', 'device': 'default'}, 'class_type': 'CLIPLoader', '_meta': {'title': 'Load CLIP'}}, 
'75:72': {'inputs': {'vae_name': 'full_encoder_small_decoder.safetensors'}, 'class_type': 'VAELoader', '_meta': {'title': 'Load VAE'}}, '75:80': {'inputs': {'upscale_method': 
'nearest-exact', 'megapixels': 1, 'resolution_steps': 1, 'image': ['76', 0]}, 'class_type': 'ImageScaleToTotalPixels', '_meta': {'title': 'ImageScaleToTotalPixels'}}, '75:99': 
{'inputs': {'image': ['75:80', 0]}, 'class_type': 'GetImageSize', '_meta': {'title': 'Get Image Size'}}, '75:124': {'inputs': {'pixels': ['75:80', 0], 'vae': ['75:72', 0]}, 
'class_type': 'VAEEncode', '_meta': {'title': 'VAE Encode'}}, '75:82': {'inputs': {'conditioning': ['82', 0]}, 'class_type': 'ConditioningZeroOut', '_meta': 
{'title': 'ConditioningZeroOut'}}, '75:66': {'inputs': {'width': ['75:99', 0], 'height': ['75:99', 1], 'batch_size': 1}, 'class_type': 'EmptyFlux2LatentImage', 
'_meta': {'title': 'Empty Flux 2 Latent'}}, '75:62': {'inputs': {'steps': 4, 'width': ['75:99', 0], 'height': ['75:99', 1]}, 'class_type': 'Flux2Scheduler', 
'_meta': {'title': 'Flux2Scheduler'}}, '75:125': {'inputs': {'conditioning': ['82', 0], 'latent': ['75:124', 0]}, 'class_type': 'ReferenceLatent', '_meta': 
{'title': 'ReferenceLatent'}}, '75:123': {'inputs': {'conditioning': ['75:82', 0], 'latent': ['75:124', 0]}, 'class_type': 'ReferenceLatent', '_meta': {'title': 'ReferenceLatent'}}, 
'75:63': {'inputs': {'cfg': 1, 'model': ['75:70', 0], 'positive': ['75:125', 0], 'negative': ['75:123', 0]}, 'class_type': 'CFGGuider', '_meta': {'title': 'CFGGuider'}}, 
'75:64': {'inputs': {'noise': ['75:73', 0], 'guider': ['75:63', 0], 'sampler': ['75:61', 0], 'sigmas': ['75:62', 0], 'latent_image': ['75:66', 0]}, 'class_type': 
'SamplerCustomAdvanced', '_meta': {'title': 'SamplerCustomAdvanced'}}, '75:65': {'inputs': {'samples': ['75:64', 0], 'vae': ['75:72', 0]}, 'class_type': 'VAEDecode', 
'_meta': {'title': 'VAE Decode'}}}open_webui.utils.images.comfyui:comfyui_edit_image:247 - Workflow: {'9': {'inputs': {'filename_prefix': 'Flux2-Klein', 'images': ['80', 1]}, 
'class_type': 'SaveImage', '_meta': {'title': 'Save Image'}}, '76': {'inputs': {'image': '76037bd5-5e49-4cf1-8be9-464cb4a91dce.png'}, 'class_type': 
'LoadImage', '_meta': {'title': 'Load Image'}}, '80': {'inputs': {'empty_cache': True, 'gc_collect': True, 'unload_all_models': True, 
'image_pass': ['75:65', 0]}, 'class_type': 'VRAM_Debug', '_meta': {'title': 'VRAM Debug'}}, '81': {'inputs': {'value': ['80', 4]}, 'class_type': 
'UnloadAllModels', '_meta': {'title': 'UnloadAllModels'}}, '82': {'inputs': {'text': '', 'clip': ['75:71', 0], 'prompt': 'can you add a food bowl and some cat toys to this picutre'}, '
class_type': 'CLIPTextEncode', '_meta': {'title': 'CLIP Text Encode (Prompt)'}}, '75:61': {'inputs': {'sampler_name': 'euler'}, 'class_type': 'KSamplerSelect', 
'_meta': {'title': 'KSamplerSelect'}}, '75:73': {'inputs': {'noise_seed': 1054275450481062}, 'class_type': 'RandomNoise', '_meta': {'title': 'RandomNoise'}}, 
'75:70': {'inputs': {'unet_name': 'flux-2-klein-9b-fp8.safetensors', 'weight_dtype': 'default'}, 'class_type': 'UNETLoader', '_meta': {'title': 'Load Diffusion Model'}}, 
'75:71': {'inputs': {'clip_name': 'qwen_3_8b_fp8mixed.safetensors', 'type': 'flux2', 'device': 'default'}, 'class_type': 'CLIPLoader', '_meta': {'title': 'Load CLIP'}}, 
'75:72': {'inputs': {'vae_name': 'full_encoder_small_decoder.safetensors'}, 'class_type': 'VAELoader', '_meta': {'title': 'Load VAE'}}, '75:80': {'inputs': {'upscale_method': 
'nearest-exact', 'megapixels': 1, 'resolution_steps': 1, 'image': ['76', 0]}, 'class_type': 'ImageScaleToTotalPixels', '_meta': {'title': 'ImageScaleToTotalPixels'}}, '75:99': 
{'inputs': {'image': ['75:80', 0]}, 'class_type': 'GetImageSize', '_meta': {'title': 'Get Image Size'}}, '75:124': {'inputs': {'pixels': ['75:80', 0], 'vae': ['75:72', 0]}, 
'class_type': 'VAEEncode', '_meta': {'title': 'VAE Encode'}}, '75:82': {'inputs': {'conditioning': ['82', 0]}, 'class_type': 'ConditioningZeroOut', '_meta': 
{'title': 'ConditioningZeroOut'}}, '75:66': {'inputs': {'width': ['75:99', 0], 'height': ['75:99', 1], 'batch_size': 1}, 'class_type': 'EmptyFlux2LatentImage', 
'_meta': {'title': 'Empty Flux 2 Latent'}}, '75:62': {'inputs': {'steps': 4, 'width': ['75:99', 0], 'height': ['75:99', 1]}, 'class_type': 'Flux2Scheduler', 
'_meta': {'title': 'Flux2Scheduler'}}, '75:125': {'inputs': {'conditioning': ['82', 0], 'latent': ['75:124', 0]}, 'class_type': 'ReferenceLatent', '_meta': 
{'title': 'ReferenceLatent'}}, '75:123': {'inputs': {'conditioning': ['75:82', 0], 'latent': ['75:124', 0]}, 'class_type': 'ReferenceLatent', '_meta': {'title': 'ReferenceLatent'}}, 
'75:63': {'inputs': {'cfg': 1, 'model': ['75:70', 0], 'positive': ['75:125', 0], 'negative': ['75:123', 0]}, 'class_type': 'CFGGuider', '_meta': {'title': 'CFGGuider'}}, 
'75:64': {'inputs': {'noise': ['75:73', 0], 'guider': ['75:63', 0], 'sampler': ['75:61', 0], 'sigmas': ['75:62', 0], 'latent_image': ['75:66', 0]}, 'class_type': 
'SamplerCustomAdvanced', '_meta': {'title': 'SamplerCustomAdvanced'}}, '75:65': {'inputs': {'samples': ['75:64', 0], 'vae': ['75:72', 0]}, 'class_type': 'VAEDecode', 
'_meta': {'title': 'VAE Decode'}}}
reddit.com
u/ChickenLegsOG — 3 days ago

Anyone hosting openclaw without deep technical knowledge?

I came to mind to somehow experiment with openclaw casually been really curious about it since a lot mentions its too complicated and would end up days for you to learn the whole thing, well yeah i agree also looked into hostinger 1-click openclaw, pretty decent and straightforward its not something grand though but kinda easy to be understood, do you also consider this or you prefer other alternatives? maybe hit me up and i can explore it kinda new to this openclaw thing

reddit.com
u/Best_Technician47 — 3 days ago
▲ 2 r/OpenWebUI+1 crossposts

Help with using cloned voice from Chatterbox?

I'm running Llama.cpp in OpenWebUI, and have installed Chatterbox TTS to handle the voice side of it, because I really wanted to use a cloned voice locally. I've been at this for hours, trying OpenedAI for TTS, and now Chatterbox. I'm relatively new to Linux and python, etc.

Here's what I've got working:

OpenWebUI sees and uses the model running on Llama.cpp.

Chatterbox Frontend works as expected. Type text, click button, get output in cloned voice. It's a little slow,but that's likely because it's not using my rx580 for inference. Old card, I know.

Here's what looks sketchy to me:

putting in the address for the backend in the browser reveals a terminal like window that simply says: detail: "Not Found".

The actual error:

When trying to generate the TTS part in OpenWebUI, It claims the voice I've very clearly pointed to according to everything I've found, doesn't exist, then proceeds to tell me to use one of the included voices.

What I'm running (old hardware, I know):

AMD Ryzen 5 2600
32GB RAM
RX 580 8G VRAM
OS: Ubuntu 26.04 LTS

reddit.com
u/BrokeBoyFresh — 3 days ago

Update broke websearch

Hi Guys,
I believe the last update of OpenwebUI broke my websearch. I'm using a selfhosted SearXNG instance for websearch and it worked great until the last update now they all fail like this. The Search is still working by the looks of it but nothing is returned to the LLM.

https://preview.redd.it/s7j3bbeoyq1h1.png?width=275&format=png&auto=webp&s=069a5162d6ffce142fea1eb5028c0764b58bd207

Has anyone experienced similar and could point me in the right direction? I'm trying to fix it with AI at the moment but they seem helpless.

Cheers,

reddit.com
u/EngineWorried9767 — 4 days ago

Using open webui with excel

Hey evryone! I'm new to hosting local AI and overall not a very technical person so apologies for any stupid question.

I've been looking into using open webui with local model to analyze excel data. My current setup is on Ubuntu; open webui installed via snap; and local models are managed via ollama. My excel file contains mutiple worksheet for language study, where 1 worksheet for me to keep track of the vocabs, and another one for writing sample sentences.

My use cases involve using the AI to loop through the vocab worksheet for the list of learned vocab, to come up with sample sentences where I can then translate to another language, while utilizing the vocabs that have yet been used based on the sample sentences.

I have had quite a lot of success with my use case using Deepseek webapp, but I'm unable to replicate the same thing with my local setup. The local model would just tell me it is not able to tell the data structure, or have a proper access to the file.

I have tried with using both Tika and Docling (managed via docker) for content extraction, also tried using Knowledge and File Upload to give the model access to the file.

Any guidance is appreciated!

reddit.com
u/ToraGod — 5 days ago

Open WebUi for writer

Hello,

thought is would be better to ask, because googling gives me lot of technical terms and I do not understand it.

So here is my situation: I have been writing novels and brainstorming ideas with chatgpt. Lately I've been getting anxiety about my lore and plots being on openai's server, and I want to get rid of that, so I have installed ollama and mistral, and I want to give open WebUi a chance, but I'd like to get this question answered like you would answer to an idiot:

If setup is made locally to my pc, and I use only ollama/mistral, does any conversation/chat/brainstorming/questions be sent onto external server, meaning away from my pc? Or does everything I write to Open WebUI stay in my pc?

Thanks for helping middle-aged man in crisis 😃

reddit.com
u/jatsinkutsu — 5 days ago

Openwebui + comfyui

Hello, is someone succeeding in making these 2 work together? No matter what i am trying, unet loader, checkpoint… the workflow works when i type thenpromot in comfyui but as soon as i type same prompt in openwebui , i cannot manage to get an image and always get errors… i i port the fson worflow and specify prompt id and checkpoint if and model in openwebui but nothing works… is it because i use flux 1 dev fp16 ? Does it require smaller models to work ? Thanks for input !!

SOLUTION : I finally made it work using the help of qwen3.6-27b-q8 ))) so the problem is that ALL NODES ID must be filled in openwebui and also must add this command line to openwebui : ENABLE_RAG_LOCAL_WEB_FETCH=True , it was the fix for me )) now working perfectly !!!

reddit.com
u/Dolboyob77 — 7 days ago

I forked supertonic TTS and added OpenAI endpoint

Hi guys!

I watched the other day this amazing lightweight local TTS model that can run easelly on CPU that is called supertonic.

https://github.com/supertone-inc/supertonic

I test it and is very good. https://huggingface.co/spaces/Supertone/supertonic-3 here is the demo.

So i vibecoded a fork that runs on docker compose that downloads the model and creates a OpenAI endpoint so you can use it on your openwebui isntall.

https://github.com/calebrio02/supertonic

The voices are M1-5 & F1-5

blurring the endpoint since is exposed via CF Tunnels.

reddit.com
u/Brilliant_Anxiety_36 — 5 days ago

OpenRouter: what models do you like using?

I just started using Open WebUI and it's awesome! I'm trying to figure out what models on OpenRouter give the best bang for the buck. I've been choosing based on the OpenRouter rankings data: https://openrouter.ai/rankings

I've tried tencent/hy3-preview and anthropic/claude-sonnet-4.6. I'm looking for a model that's close to Sonnet's quality and Hy3's price. Any tips greatly appreciated!

u/andrewmarder — 6 days ago

Open WebUI is completely broken now

After 0.9, my Open WebUI got fully broken.

Title generation works 20% of the time.

Generation stops after tool calls, making it unusable.

I tried starting from nothing with backups. I tried different providers. Nothing is working. It is on a Raspberry Pi 5.

Are there any sharing this problems? If not, I will start from nothing, importing nothing and creating all the agents and skills and everything.

EDIT: I can't actually believe it, but I wrote the problem to Hermes Agent in the same system, and it fixed it by a patch like this:

- ./patches/middleware.py:/app/backend/open_webui/utils/middleware.py:ro

It seems the problem was this according to it:

OK, here's the full diagnosis and fix:

== ROOT CAUSE ==

Open WebUI v0.9.5 has a bug in get_reasoning_format() (middleware.py line 2169).
It only returns reasoning_content format for llama.cpp and think_tags for Ollama.
For ALL other providers (MiMo, DeepSeek, Kimi, etc.) it returns None.

When None, the convert_output_to_messages() function completely SKIPS reasoning
content when building follow-up messages after a tool call.

So the chain is:
  1. Model generates thinking/reasoning + tool_call
  2. Open WebUI executes the tool, gets results
  3. Builds follow-up messages but DROPS the reasoning_content
  4. MiMo/DeepSeek API gets request WITHOUT reasoning_content
  5. API returns 400: "reasoning_content in thinking mode must be passed back"
  6. Error is caught silently (log.debug + break) → chat just dies

This affects MiMo, DeepSeek, and likely any other provider that requires
reasoning_content to be passed back during thinking mode.

Confirmed in both Open WebUI and LiteLLM logs:
  - litellm.BadRequestError: reasoning_content must be passed back (mimo-v2.5-pro)
  - litellm.BadRequestError: reasoning_content must be passed back (deepseek-v4-pro)

== FIX APPLIED ==

Patched get_reasoning_format() to return 'reasoning_content' for ALL non-Ollama
providers. This is safe because:
  - Providers that support it (MiMo, DeepSeek) get it back ✓
  - Providers that don't simply ignore the extra field ✓

Persisted via Docker volume mount:
  /opt/docker/openwebui/patches/middleware.py → /app/backend/open_webui/utils/middleware.py:ro

Container restarted and healthy.

== CAVEAT ==

This patch will be lost if you update the Open WebUI image. After docker compose pull
and recreate, you'll need to re-extract the patched file from the new version, or
check if upstream has fixed it. I'll remember this for future sessions.

Go ahead and test — try triggering a web search or any tool call. It should continue
generating after the tool returns now. 🤞

And this is the patch fixing it, this is for v0.9.5: https://gist.github.com/eteitaxiv/a0804db86af57d92fdfdeb7b63c8a486

u/eteitaxiv — 7 days ago

Open Relay v4.1–4.3 — Terminal enhancements, folder bulk actions, app/server update notifications, massive code block/native visualizations performance updates, and a ton of polishing.

Hey everyone! This is a recap of 3 updates since 4.0 — v4.1, v4.2, and v4.3. Lots of quality-of-life and performance work, plus a couple of features I've been wanting to add for a while. Here's what's new:

App Store | GitHub

📁 Folder & Chat Management (v4.1)

Long-press any chat inside a folder to enter selection mode — you can now multi-select and then remove from folder, move to another folder, or delete in bulk. No more one-at-a-time tedium when reorganising hundreds of chats.

There's also a new "Move to Folder" button in the main chat list selection toolbar, so you can grab multiple chats from your top-level list and drop them all into a folder in one shot.

Folders now properly load all chats — previously there was a silent cap of 10 chats per folder. That's fixed.

🖥️ Terminal upgrades (v4.1 & v4.2)

The terminal got a lot of changes across both releases:

  • Fullscreen mode — tap the expand button in the terminal toolbar to open a dedicated full-screen terminal view with much more room to work.
  • Quick-action shortcut bar — Tab, ↑ (history up), ↓ (history down), and Clear buttons sit above the keyboard so you can control the terminal without hunting for special keys.
  • Faster startup — the shell now warms up as soon as you enable terminal access, so it's ready instantly when you open the panel instead of taking a second to initialise.
  • Fixed raw ANSI escape codes (e.g. [1m[96m…[0m) showing in output instead of clean formatted text.
  • Fixed terminal access leaking into chat requests even after disabling the toggle.
  • Fixed multi-server edge cases where switching servers with the toggle off could leave a stale terminal ID being silently sent.

🔔 Server Update Notifications (v4.1)

The app now checks your Open WebUI server for available updates on launch and shows a notification if one's available. There's also a manual "Check for Server Updates" button under About → Server.

📸 Native Visualizations + Images (v4.1)

Native inline visualizations now support audio output.

⚡ Code Block Performance (v4.3)

This is a big one. Code block and native visualization rendering has been drastically improved. Previously, streaming a response with 600+ line code blocks would cause visible UI lag and jitter. Now the UI stays fully responsive while streaming even large code blocks. The underlying markdown library also got a round of performance improvements.

🎹 Input & Keyboard Polish (v4.3)

A bunch of small things that were quietly annoying got fixed:

  • Long-press text selection — hold on any message text to select and copy. Previously only double-tap was available; now both work.
  • Input bar focus — tapping anywhere on the input bar now reliably opens the keyboard. There was a bug where it required multiple taps to actually focus.
  • Keyboard dismiss jitter — when scrolling to dismiss the keyboard, the input bar now glides down in perfect sync with the keyboard instead of snapping into place after it closes.
  • HTML streaming lag — fixed a conflict between the main-thread animation loop and the WebContent renderer that caused visible lag when JS-heavy HTML was streaming in.
  • Accessibility scaling — input bar buttons and icons now scale with your Accessibility UI Scale setting in both chat and channel views.

As always, full changelog on GitHub. Thanks for all the support along with the issues to make the app better day by day! Keep the issues and feedback coming. Thanks!

u/Zealousideal_Fox6426 — 7 days ago

Why separating classification from generation made my local Qwen workflow far more stable

After months of experimentation, I think I finally reached a setup that feels genuinely stable for local production-style RAG workflows.

A real operational pipeline for:

  • editorial drafting
  • legal/environmental research
  • structured retrieval
  • HTML generation
  • document synthesis
  • controlled outputs
  • anti-hallucination workflows

Hardware

  • Mac Studio M2 Max
  • 32 GB unified memory

Stack

Running in Docker, sharing the same network:

https://preview.redd.it/7q6x49hzfa1h1.png?width=1852&format=png&auto=webp&s=0211e4ff3a9ebe65071f2d973106ac8db933d4dc

  • Open WebUI
  • PostgreSQL
  • Qdrant (6Gb of data)
  • Apache Tika
  • Open Terminal
  • Nginx Proxy Manager

Inference:

  • LM Studio (latest beta at moment 0.4.13)
  • Qwen3.5-9B (unsloth 4bit 7-9GB)
  • Qwen3.6-35B-A3B Q2_XXS (~12 GB)
  • Qwen3.6-35B-A3B Q3_K_S (~17 GB)

common parameters: 16000k context, temp 0.2, top-k 20, penalty 0.95, unified KV cache on GPU

the biggest thing I learned is that local models fail because of routing drifts, retrieval gets noisy, bloated prompts, formatting errors, tool usage loops :exploding_head: or just start narrating its own reasoning

The breakthrough for me was separating the workflow into stages instead of relying on one giant "do everything" system prompt.

My pipeline now looks roughly like this:

User query
↓
GBNF classification
↓
Routing decision
↓
Tool / retrieval
↓
Guardrails
↓
Editorial synthesis
↓
Final formatting

The most important architectural decision

I use GBNF only for the fragile parts:

  • intent classification
  • routing
  • workflow decisions
  • fallback handling
  • output mode selection

NOT for final article generation.

That was a massive improvement.

Before this, the model would often:

  • over-explain
  • invent process narration
  • repeat tool calls
  • drift stylistically
  • produce inconsistent formatting
  • ignore operational constraints

Now the grammar forces highly structured outputs like:

<classification>
OPERATION=compliance_check
DOMAIN=waste_management
REQUEST_TYPE=specific_case
SOURCE_STATUS=sources_found
VERIFICATION=verified_content
CONFIDENCE=high
</classification>

Then another block decides:

  • whether tool usage is mandatory
  • which workflow to activate
  • which output format to use
  • whether fallback mode is required

This made even aggressively quantized models dramatically more reliable.

Optimize narration flow

I explicitly banned outputs like:

"I will now search..."
"I need to verify..."
"Proceeding with analysis..."

Removing process narration improved output quality far more than expected.

Especially with quantized local models.

Shorter prompts > smarter prompts

Golden rules:

  • short hard rules
  • operational constraints
  • anti-loop logic
  • explicit fallback behavior
  • slim output structures
  • deterministic formatting

The system prompt became less "literary" and much more procedural.

Less:

  • personality
  • motivational language
  • verbose instructions
  • pseudo-chain-of-thought

More:

  • routing
  • execution constraints
  • output contracts
  • retrieval discipline

RAG quality improved when I REDUCED context noise

Another thing that surprised me:

Reducing retrieval noise improved quality much more than increasing context size.

I now heavily prioritize:

  • exact query forwarding
  • minimal paraphrasing
  • retrieval discipline
  • compact chunks
  • avoiding semantic duplication
  • limiting repeated tool calls

Qwen family on Apple Silicon

Qwen3.6-35B-A3B Q2_XXS as well Qwen3.6-35B-A3B Q3_K_S performs much better than I expected on the M2 Max.

The quantization mostly hurts:

  • instruction precision
  • formatting discipline
  • operational consistency

Not raw reasoning ability.

So strong workflow constraints compensate extremely well.

The smaller Qwen3.5-9B is also extremely useful as:

  • classifier
  • router
  • lightweight editor
  • HTML formatter
  • fast operational assistant

I use it as task tool in open webui interface.

What learned

The biggest improvement wasn't the model itself, but how the system was designed.

Separating:

  • classification
  • routing
  • retrieval
  • generation
  • formatting
  • fallback handling

made the entire system feel significantly more stable than my previous "single giant prompt" approach.

On this setup I'm getting roughly:

  • 40–45 tokens/sec on low-to-medium complexity tasks (document retrieval, summary tables, document synthesis, lightweight editorial work)
  • around 30 tokens/sec on more complex workflows (regulatory comparisons, deeper analysis, long-form generation, structured drafting)

a medium query load

I'd love to hear from others running similar local workflows. I'm still experimenting and refining the entire stack, so suggestions or examples from other setups are very welcome.

reddit.com
u/liuc0j — 6 days ago
▲ 146 r/OpenWebUI+4 crossposts

Llama.cpp is getting better with every update

Last night I updated llama.cpp after like 2 or 3 weeks. The results were really exciting for someone running a 35B model on 6GB RTX 3050.

Today I was able to get stable token speeds and they didn't fall down to 9 t/s while coding 1000+ lines of code.

Now I can increase my context window to 64k range and I'm still getting 19 t/s minimum. Before it would do down drastically to 4 t/s.

But now it gives a solid 26 t/s. In high context window worflows it falls by 5-7 t/s only. This means I can do 1000$ worth of coding work on my laptop for free.

Yes. The AI bubble will pop for sure if people realizes they can locally get near same quality of the their cloud subscriptions.

reddit.com
u/Low-Alarm272 — 10 days ago

I really like OpenWebUI

I don't have anything in particular to comment on, I just wanted to post that I really, really like open-webui. I'm a sysad that's deployed it at my work center now, with several openterminal systems behind it, and it feels like witchcraft. Works far better than the SSH MCP kludge I had before it.

There's a lot of hate surrounding AI (some warranted, some not), but AI-powered IT operations is going to be a game-changer, I can smell it in the air.

Well done, team.

reddit.com
u/DHT-Osiris — 9 days ago
▲ 17 r/OpenWebUI+19 crossposts

I gave Claude Code a persistent markdown knowledge base so it stops forgetting project context between sessions

Persistent memory keeps coming up for AI coding agents. One approach I’ve found useful: treating the knowledge layer as a compiled markdown wiki rather than just stuffing more tokens into the context window.

llm-wiki-compiler ingests docs and URLs, then the LLM builds an interlinked markdown structure. Since the output is plain markdown on disk, Claude Code reads it directly. And when you run query --save, the answer gets written back into the wiki as a page — so future queries improve.

It’s not retrieval. It’s compounding. The knowledge base gets richer instead of resetting every session.

Plain markdown, no opaque vector store, fully inspectable.

How are other agent builders solving persistent memory?

reddit.com
u/riddlemewhat2 — 8 days ago