u/PromptInjection_

▲ 1 r/ollama

LLC: lightweight OpenWebUI alt - now with chat converter + custom tool calls

Posted my project here a while back and got some solid feedback via DMs. The main ask was a converter so people don't lose their existing chats when switching - that's in now.

https://preview.redd.it/mfn5i99d6c2h1.png?width=1400&format=png&auto=webp&s=10af6f8645c26d8d25b2356f98cee019c508a4d6

Quick context: LLC is a chat frontend for local LLMs. You download it, you run it, that's it - no install needed (unless you want), no dependencies, runs on pretty much anything including ancient hardware. I built it because OWUI kept feeling heavier than the models I was running.

so, what's new in v0.6:

  • Chat converter - import your OWUI history so you don't start from zero
  • Custom tool calls - you can define your own tools the model can use ( for example weather, stock market or whatever you like)

PS:
You can run the converter easily with
python convert_openwebui_to_locallightchat_v2.py webui.db --media-storage uploads (or --media-storage inline if you like it embedded with base64). The OpenWebui "uploads" folder should be in the same directory.

Link: https://www.locallightai.com/llc/
Github: https://github.com/srware-net/LocalLightChat/

reddit.com
u/PromptInjection_ — 2 days ago

LLC: lightweight OpenWebUI alt - now with chat converter + custom tool calls

Posted my project here a while back and got some solid feedback via DMs. The main ask was a converter so people don't lose their existing chats when switching - that's in now.

https://preview.redd.it/mfn5i99d6c2h1.png?width=1400&format=png&auto=webp&s=10af6f8645c26d8d25b2356f98cee019c508a4d6

Quick context: LLC is a chat frontend for local LLMs. You download it, you run it, that's it - no install needed (unless you want), no dependencies, runs on pretty much anything including ancient hardware. I built it because OWUI kept feeling heavier than the models I was running.

so, what's new in v0.6:

  • Chat converter - import your OWUI history so you don't start from zero
  • Custom tool calls - you can define your own tools the model can use ( for example weather, stock market or whatever you like)

PS:
You can run the converter easily with
python convert_openwebui_to_locallightchat_v2.py webui.db --media-storage uploads (or --media-storage inline if you like it embedded with base64). The OpenWebui "uploads" folder should be in the same directory.

Link: https://www.locallightai.com/llc/
Github: https://github.com/srware-net/LocalLightChat/

reddit.com
u/PromptInjection_ — 2 days ago

LLC: lightweight OpenWebUI alt - now with chat converter + custom tool calls

Posted my project here a while back and got some solid feedback via DMs. The main ask was a converter so people don't lose their existing chats when switching - that's in now.

https://preview.redd.it/mfn5i99d6c2h1.png?width=1400&format=png&auto=webp&s=10af6f8645c26d8d25b2356f98cee019c508a4d6

Quick context: LLC is a chat frontend for local LLMs. You download it, you run it, that's it - no install needed (unless you want), no dependencies, runs on pretty much anything including ancient hardware. I built it because OWUI kept feeling heavier than the models I was running.

so, what's new in v0.6:

  • Chat converter - import your OWUI history so you don't start from zero
  • Custom tool calls - you can define your own tools the model can use ( for example weather, stock market or whatever you like)

PS:
You can run the converter easily with
python convert_openwebui_to_locallightchat_v2.py webui.db --media-storage uploads (or --media-storage inline if you like it embedded with base64). The OpenWebui "uploads" folder should be in the same directory.

Link: https://www.locallightai.com/llc/
Github: https://github.com/srware-net/LocalLightChat/

reddit.com
u/PromptInjection_ — 2 days ago

Where can i find the imatrix dataset file for unsloth Quants?

quantize.imatrix.file Qwen3.6-35B-A3B-GGUF/imatrix_unsloth.gguf
quantize.imatrix.dataset unsloth_calibration_Qwen3.6-35B-A3B.txt

I find the gguf but not the unsloth_calibration_Qwen3.6-35B-A3B.txt file. Same for other LLMs.

reddit.com
u/PromptInjection_ — 2 days ago

Qwen 3.6-27B Dense with MTP on Strix Halo Windows - Benchmarks

Here are some results (llama.cpp - https://github.com/ggml-org/llama.cpp/releases/tag/b9190)!

Task 1: write a short poem
27B Dense: 12.5 tokens/s
27B Dense MTP: (spec-draft-n-max 6): 14.5 tokens/s
27B Dense MTP (spec-draft-n-max 3): 18.7 tokens/s

Task 2: edit a hello word html artifact
27B Dense: 12.6 tokens/s
27B Dense MTP (spec-draft-n-max 6): 14.2 tokens/s
27B Dense MTP (spec-draft-n-max 3): 19.8 tokens/s

Task 3: create a hello world html directly in chat
27B Dense: 12.6 tokens/s
27B Dense MTP (spec-draft-n-max 6): 17.9 tokens/s
27B Dense MTP (spec-draft-n-max 3): 23.2 tokens/s

It's fascinating how it varies with tasks!

https://preview.redd.it/bsrlgslasn1h1.png?width=1802&format=png&auto=webp&s=8aba6c751bf7c47494ce11697b91a4347fec79af

Settings used:

{
"name": "Qwen3.6-27B-UD-Q4_K_M",
"file": "Qwen3.6-27B-UD-Q4_K_M.gguf",
"custom": ["--mmproj", "C:/CarlAI/models/mmproj-Qwen_Qwen3.6-27B-bf16.gguf"],

"backend": "vulkan",

"parameters": {
"temp": 0.8,
"top_k": 20,
"top_p": 0.95,
"min_p": 0.00,
"repeat_penalty": 1.0,
"ngl": 99,
"context_length": 65000,
"jinja": true,
"flash_attn": "on"
}

},

{

"name": "Qwen3.6-27B-UD-Q4_K_XL_MTP",
"file": "Qwen3.6-27B-UD-Q4_K_XL_MTP.gguf",
"custom": ["-np", "1", "--spec-type", "draft-mtp", "--spec-draft-n-max", "6"],

"backend": "vulkan",

"parameters": {
"temp": 0.8,
"top_k": 20,
"top_p": 0.95,
"min_p": 0.00,
"repeat_penalty": 1.0,
"ngl": 99,
"context_length": 65000,
"jinja": true,
"flash_attn": "on"
}

reddit.com
u/PromptInjection_ — 5 days ago

Qwen 3.6-27B Dense with MTP on Strix Halo Windows - Benchmarks

Here are some results (llama.cpp)!

Task 1: write a short poem
27B Dense: 12.5 tokens/s
27B Dense MTP: (spec-draft-n-max 6): 14.5 tokens/s
27B Dense MTP (spec-draft-n-max 3): 18.7 tokens/s

Task 2: edit a hello word html artifact
27B Dense: 12.6 tokens/s
27B Dense MTP (spec-draft-n-max 6): 14.2 tokens/s
27B Dense MTP (spec-draft-n-max 3): 19.8 tokens/s

Task 3: create a hello world html directly in chat
27B Dense: 12.6 tokens/s
27B Dense MTP (spec-draft-n-max 6): 17.9 tokens/s
27B Dense MTP (spec-draft-n-max 3): 23.2 tokens/s

It's fascinating how it varies with tasks!

https://preview.redd.it/bqfgfha8sn1h1.png?width=1802&format=png&auto=webp&s=f1aa99fbb45a5d72f9c3c442ddee839d48cf696d

Settings used:

{
"name": "Qwen3.6-27B-UD-Q4_K_M",
"file": "Qwen3.6-27B-UD-Q4_K_M.gguf",
"custom": ["--mmproj", "C:/CarlAI/models/mmproj-Qwen_Qwen3.6-27B-bf16.gguf"],

"backend": "vulkan",

"parameters": {
"temp": 0.8,
"top_k": 20,
"top_p": 0.95,
"min_p": 0.00,
"repeat_penalty": 1.0,
"ngl": 99,
"context_length": 65000,
"jinja": true,
"flash_attn": "on"
}

},

{

"name": "Qwen3.6-27B-UD-Q4_K_XL_MTP",
"file": "Qwen3.6-27B-UD-Q4_K_XL_MTP.gguf",
"custom": ["-np", "1", "--spec-type", "draft-mtp", "--spec-draft-n-max", "6"],

"backend": "vulkan",

"parameters": {
"temp": 0.8,
"top_k": 20,
"top_p": 0.95,
"min_p": 0.00,
"repeat_penalty": 1.0,
"ngl": 99,
"context_length": 65000,
"jinja": true,
"flash_attn": "on"
}

},

reddit.com
u/PromptInjection_ — 5 days ago

Qwen 3.6-27B Dense with MTP on Strix Halo Windows - Benchmarks

Here are some results (llama.cpp)!

Task 1: write a short poem
27B Dense: 12.5 tokens/s
27B Dense MTP: (spec-draft-n-max 6): 14.5 tokens/s
27B Dense MTP (spec-draft-n-max 3): 18.7 tokens/s

Task 2: edit a hello word html artifact
27B Dense: 12.6 tokens/s
27B Dense MTP (spec-draft-n-max 6): 14.2 tokens/s
27B Dense MTP (spec-draft-n-max 3): 19.8 tokens/s

Task 3: create a hello world html directly in chat
27B Dense: 12.6 tokens/s
27B Dense MTP (spec-draft-n-max 6): 17.9 tokens/s
27B Dense MTP (spec-draft-n-max 3): 23.2 tokens/s

It's fascinating how it varies with tasks!

https://preview.redd.it/bsrlgslasn1h1.png?width=1802&format=png&auto=webp&s=8aba6c751bf7c47494ce11697b91a4347fec79af

Settings used:

{
"name": "Qwen3.6-27B-UD-Q4_K_M",
"file": "Qwen3.6-27B-UD-Q4_K_M.gguf",
"custom": ["--mmproj", "C:/CarlAI/models/mmproj-Qwen_Qwen3.6-27B-bf16.gguf"],

"backend": "vulkan",

"parameters": {
"temp": 0.8,
"top_k": 20,
"top_p": 0.95,
"min_p": 0.00,
"repeat_penalty": 1.0,
"ngl": 99,
"context_length": 65000,
"jinja": true,
"flash_attn": "on"
}

},

{

"name": "Qwen3.6-27B-UD-Q4_K_XL_MTP",
"file": "Qwen3.6-27B-UD-Q4_K_XL_MTP.gguf",
"custom": ["-np", "1", "--spec-type", "draft-mtp", "--spec-draft-n-max", "6"],

"backend": "vulkan",

"parameters": {
"temp": 0.8,
"top_k": 20,
"top_p": 0.95,
"min_p": 0.00,
"repeat_penalty": 1.0,
"ngl": 99,
"context_length": 65000,
"jinja": true,
"flash_attn": "on"
}

},

reddit.com
u/PromptInjection_ — 5 days ago

Qwen 3.6-27B Dense with MTP on Strix Halo Windows - Benchmarks

Here are some results (llama.cpp)!

Task 1: write a short poem
27B Dense: 12.5 tokens/s
27B Dense MTP: (spec-draft-n-max 6): 14.5 tokens/s
27B Dense MTP (spec-draft-n-max 3): 18.7 tokens/s

Task 2: edit a hello word html artifact
27B Dense: 12.6 tokens/s
27B Dense MTP (spec-draft-n-max 6): 14.2 tokens/s
27B Dense MTP (spec-draft-n-max 3): 19.8 tokens/s

Task 3: create a hello world html directly in chat
27B Dense: 12.6 tokens/s
27B Dense MTP (spec-draft-n-max 6): 17.9 tokens/s
27B Dense MTP (spec-draft-n-max 3): 23.2 tokens/s

https://preview.redd.it/i8f0cj0zrn1h1.png?width=1797&format=png&auto=webp&s=a48dd04bdfa4ace1e9bceb8e79415971c5085e95

It's fascinating how it varies with tasks!

Settings used:

{
"name": "Qwen3.6-27B-UD-Q4_K_M",
"file": "Qwen3.6-27B-UD-Q4_K_M.gguf",
"custom": ["--mmproj", "C:/CarlAI/models/mmproj-Qwen_Qwen3.6-27B-bf16.gguf"],

"backend": "vulkan",

"parameters": {
"temp": 0.8,
"top_k": 20,
"top_p": 0.95,
"min_p": 0.00,
"repeat_penalty": 1.0,
"ngl": 99,
"context_length": 65000,
"jinja": true,
"flash_attn": "on"
}

},

{

"name": "Qwen3.6-27B-UD-Q4_K_XL_MTP",
"file": "Qwen3.6-27B-UD-Q4_K_XL_MTP.gguf",
"custom": ["-np", "1", "--spec-type", "draft-mtp", "--spec-draft-n-max", "6"],

"backend": "vulkan",

"parameters": {
"temp": 0.8,
"top_k": 20,
"top_p": 0.95,
"min_p": 0.00,
"repeat_penalty": 1.0,
"ngl": 99,
"context_length": 65000,
"jinja": true,
"flash_attn": "on"
}

},

reddit.com
u/PromptInjection_ — 5 days ago

LocalLightChat - the new portable lightweight ChatUI for LLMs

I got tired of every local AI frontend is either not portable, extremely slow and bloated- or even both. So i developed my own. It can handle even 500k+ tokens on a laptop from 2010!

LocalLightChat is a standalone chat interface for local LLMs and cloud APIs. Single binary, no installation, no dependencies. You download it, you run it, you're chatting. Works on Windows, Linux (x64/ARM64), and macOS.

What it actually does:

  • 500k+ token context – runs smooth even on old hardware
  • Full-text search across your entire chat history in under 100ms
  • Compress & Clone – squeeze 50k tokens down to 2k while keeping the stuff that matters
  • Documents & Artifacts – create and edit long-form content without drowning your chat
  • Web search built in (Serper/SearchNGX/Brave/custom) with minimal token overhead
  • Image generation via API or ComfyUI auto-detection
  • Multi-modal input – PDFs, images, CSV, YAML, XML, logs, all processed client-side
  • Full LLM parameter control – temperature, sampling, DRY, Mirostat, everything
  • Multi-user system with role-based auth if you need it

There's also a Docker image and a self-hosted option if you want to run it on your own nginx/PHP stack.

Links:

Currently at v0.5. Happy to answer questions or take feedback.

reddit.com
u/PromptInjection_ — 7 days ago
▲ 1 r/ollama

LocalLightChat - the new portable lightweight ChatUI for LLMs

I got tired of every local AI frontend is either not portable, extremely slow and bloated- or even both. So i developed my own. It can handle even 500k+ tokens on a laptop from 2010!

LocalLightChat is a standalone chat interface for local LLMs and cloud APIs. Single binary, no installation, no dependencies. You download it, you run it, you're chatting. Works on Windows, Linux (x64/ARM64), and macOS.

What it actually does:

  • 500k+ token context – runs smooth even on old hardware
  • Full-text search across your entire chat history in under 100ms
  • Compress & Clone – squeeze 50k tokens down to 2k while keeping the stuff that matters
  • Documents & Artifacts – create and edit long-form content without drowning your chat
  • Web search built in (Serper/SearchNGX/Brave/custom) with minimal token overhead
  • Image generation via API or ComfyUI auto-detection
  • Multi-modal input – PDFs, images, CSV, YAML, XML, logs, all processed client-side
  • Full LLM parameter control – temperature, sampling, DRY, Mirostat, everything
  • Multi-user system with role-based auth if you need it

There's also a Docker image and a self-hosted option if you want to run it on your own nginx/PHP stack.

Links:

Currently at v0.5. Happy to answer questions or take feedback.

PS: With Ollama, simple use:
API-URL: http://localhost:11434/v1/
API-Key: ollama

reddit.com
u/PromptInjection_ — 7 days ago

LocalLightChat - the new portable lightweight ChatUI for LLMs

I got tired of every local AI frontend is either not portable, extremely slow and bloated- or even both. So i developed my own. It can handle even 500k+ tokens on a laptop from 2010!

LocalLightChat is a standalone chat interface for local LLMs and cloud APIs. Single binary, no installation, no dependencies. You download it, you run it, you're chatting. Works on Windows, Linux (x64/ARM64), and macOS.

What it actually does:

  • 500k+ token context – runs smooth even on old hardware
  • Full-text search across your entire chat history in under 100ms
  • Compress & Clone – squeeze 50k tokens down to 2k while keeping the stuff that matters
  • Documents & Artifacts – create and edit long-form content without drowning your chat
  • Web search built in (Serper/SearchNGX/Brave/custom) with minimal token overhead
  • Image generation via API or ComfyUI auto-detection
  • Multi-modal input – PDFs, images, CSV, YAML, XML, logs, all processed client-side
  • Full LLM parameter control – temperature, sampling, DRY, Mirostat, everything
  • Multi-user system with role-based auth if you need it

There's also a Docker image and a self-hosted option if you want to run it on your own nginx/PHP stack.

Links:

Currently at v0.5. Happy to answer questions or take feedback.

reddit.com
u/PromptInjection_ — 7 days ago

LocalLightChat - the new portable lightweight ChatUI for LLMs

I got tired of every local AI frontend is either not portable, extremely slow and bloated- or even both. So i developed my own. It can handle even 500k+ tokens on a laptop from 2010!

LocalLightChat is a standalone chat interface for local LLMs and cloud APIs. Single binary, no installation, no dependencies. You download it, you run it, you're chatting. Works on Windows, Linux (x64/ARM64), and macOS.

What it actually does:

  • 500k+ token context – runs smooth even on old hardware
  • Full-text search across your entire chat history in under 100ms
  • Compress & Clone – squeeze 50k tokens down to 2k while keeping the stuff that matters
  • Documents & Artifacts – create and edit long-form content without drowning your chat
  • Web search built in (Serper/SearchNGX/Brave/custom) with minimal token overhead
  • Image generation via API or ComfyUI auto-detection
  • Multi-modal input – PDFs, images, CSV, YAML, XML, logs, all processed client-side
  • Full LLM parameter control – temperature, sampling, DRY, Mirostat, everything
  • Multi-user system with role-based auth if you need it

There's also a Docker image and a self-hosted option if you want to run it on your own nginx/PHP stack.

Links:

Currently at v0.5. Happy to answer questions or take feedback.

reddit.com
u/PromptInjection_ — 7 days ago

The "The future is fictional" problem of Gemini

Gemini (without using tools) has a problem:
It think that everything is beyond it's knowledge cutoff date would be "fictional" or "satirical".

----

Gemini:

Key Details Break-Down

1. The Source and Meta-Data

Topic: Graphics Cards / Tech Community Project

Author: Max Doll

Date: May 14, 2026 (Note: This date is in the future, meaning it is likely a typo in your source text. It was probably published in May 2023 or 2024).*

[...] This fictional article touches upon a very popular topic in the PC gaming community:

----

Anybody else noticed this?

reddit.com
u/PromptInjection_ — 8 days ago

The "the future is fictional" problem of many local LLMs

Many local models have a problem (that raised due to excessive RHLF training):
They mostly think that everything that is beyond their knowledge cutoff date would be "fictional" or "satirical".

To be fair: Even the Gemini API without web access can have this sometimes. But it stops when you give it tools. However, with many local models, sadly it even goes on WITH tools: Look at this:

>You
search the web for iran war 2026 news

gemma-4-26B-A4B-it-Q4_K_M_128k

>Based on the search results, there is a significant amount of content describing a "2026 Iran War." However, it is important to note that these search results appear to be from a fictional or speculative scenario, likely part of a "future history" exercise, a geopolitical simulation, or a creative writing project.
of today, there is no real-world war occurring or scheduled for 2026 involving Iran.

>Key Observations from the Search Results:
As The search results contain specific dates and details that suggest they are part of a narrative or simulation:

>Specific Dates: Several snippets mention specific dates in 226 (e.g., "February 28," "April 17, 2026," and "May 1, 2026") describing events like "U.S.-Israeli joint strikes."

>Operation Names: One result mentions "Operation Epic Fury," a name used within the context of this specific 2026 scenario.

>Nature of the Content: While the snippets appear in formats resembling real news (e.g., Wikipedia, CSIS, Atlantic Council), they describe events that have not happened in actual history. This type of content is common in geopolitical forecasting simulations or "alternate history"-style academic exercises used to study potential risks and consequences.

>Summary of the "Scenario" described in the results:

>If you are looking for information on this specific geopolitical simulation, the "news" within that narrative includes:

>The Conflict: A joint U.S.-Israeli military operation against Iran.

>Core Drivers: Fears of Iran becoming a nuclear power and the disruption of the existing regional security order.

>Impact: The scenario explores impacts on global energy security (oil markets), the shift toward a fragmented global order, and changes in Middle Eastern power dynamics.

>Conclusion: There is no actual news of an Iran war in 2026. You are likely seeing results from a geopolitical simulation or speculative "future history" analysis.

So the tool call worked, "Epic Fury" is there, still the model thinks it's fictional or "future". Sometimes it can be fixed with including "It is x.x.2026" in the system prompt, but it is a workaround that shouldn't be necessary.

reddit.com
u/PromptInjection_ — 8 days ago

Gemma 4 E4B is great for short transcriptions

Yes, for material that is an hour long, there is no getting around tools like Whisper - or something even better.

However, for transcribing short snippets, Gemma works very quickly and reliably- even in foreign languages.

Do you use it as well?

reddit.com
u/PromptInjection_ — 10 days ago

Gemma 4 E4B is great for short transcriptions

Yes, for material that is an hour long, there is no getting around tools like Whisper - or something even better.

However, for transcribing short snippets, Gemma works very quickly and reliably- even in foreign languages.

Do you use it as well?

reddit.com
u/PromptInjection_ — 10 days ago

How to Fine-Tune LLMs on AMD Strix Halo

After the first general general fine-tuning tutorial i posted (https://www.promptinjection.net/p/the-ultimate-llm-ai-fine-tuning-guide-tutorial) some people asked if i can't make the same for AMD Strix Halo because approach here is quite different because of RoCM.

https://preview.redd.it/g63fjundxh0h1.jpg?width=1080&format=pjpg&auto=webp&s=4ea6efb97b7306646303adc9020f0a075e08865b

I listened and here it is now:
https://www.promptinjection.net/p/how-to-fine-tune-llms-on-amd-strix-halo-ryzen-ai-max-395-sft-lora

- Linux and pure Windows (no WSL!)
- Full SFT and LoRA

reddit.com
u/PromptInjection_ — 11 days ago
▲ 2 r/ollama

How to Fine-Tune LLMs on AMD Strix Halo and Other Exotic AMD Hardware

After the first general general fine-tuning tutorial i posted here (https://www.promptinjection.net/p/the-ultimate-llm-ai-fine-tuning-guide-tutorial) some people asked if i can't make the same for AMD Strix Halo because approach here is quite different because of RoCM.

https://preview.redd.it/3sjhuadbvh0h1.jpg?width=1080&format=pjpg&auto=webp&s=cae20397da5e27e682bbb40d7987149c4f8cc472

I listened and here it is now:
https://www.promptinjection.net/p/how-to-fine-tune-llms-on-amd-strix-halo-ryzen-ai-max-395-sft-lora

- Linux and pure Windows (no WSL!)
- Full SFT and LoRA

reddit.com
u/PromptInjection_ — 11 days ago

How to Fine-Tune LLMs on AMD Strix Halo and Other Exotic AMD Hardware

After the first general general fine-tuning tutorial i posted here (https://www.promptinjection.net/p/the-ultimate-llm-ai-fine-tuning-guide-tutorial) some people asked if i can't make the same for AMD Strix Halo because approach here is quite different because of RoCM.

https://preview.redd.it/o8kv7zkuth0h1.jpg?width=1080&format=pjpg&auto=webp&s=07531d93ec5ecbccbde03c32078b32c3d7009b8c

I listened and here it is now:
https://www.promptinjection.net/p/how-to-fine-tune-llms-on-amd-strix-halo-ryzen-ai-max-395-sft-lora

- Linux and pure Windows (no WSL!)
- Full SFT and LoRA

reddit.com
u/PromptInjection_ — 11 days ago

How to Fine-Tune LLMs on AMD Strix Halo and Other Exotic AMD Hardware

After the first general general fine-tuning tutorial i posted here (https://www.promptinjection.net/p/the-ultimate-llm-ai-fine-tuning-guide-tutorial) some people asked if i can't make the same for AMD Strix Halo because approach here is quite different because of RoCM.

https://preview.redd.it/62etwj5xqh0h1.jpg?width=1080&format=pjpg&auto=webp&s=df8a4076f68400b69ed6130eb9a8e66e28a6e55d

I listened and here it is now:
https://www.promptinjection.net/p/how-to-fine-tune-llms-on-amd-strix-halo-ryzen-ai-max-395-sft-lora

- Linux and pure Windows (no WSL!)
- Full SFT and LoRA

reddit.com
u/PromptInjection_ — 11 days ago