r/ChatbotRefugees

LettuceAI 2.1 & 2.1.1 (Android & Desktop) are live

Hey everyone! I'm the developer of LettuceAI.

LettuceAI is an open-source, privacy-first, cross-platform AI chat app built for character chats, roleplay, and long conversations that actually stay coherent.

It supports both local models (built-in llama.cpp engine, Ollama, LM Studio) and external APIs with full BYOK support, so you stay in control of your own setup. No forced accounts, no cloud routing through us, no vendor lock-in. Your requests go directly to the model provider you choose.

Where 2.0 was about companions and souls, 2.1 is about your hardware and your data, and 2.1.1 is about trusting both. The short version: local models now spread across every GPU in your machine, a new performance dashboard shows tokens per second on every generation, and OpenRouter models can be pinned to a specific provider endpoint. Then 2.1.1 rebuilt device sync from the ground up so your settings and memories actually arrive intact, made multi-GPU configuration honest about what it will really do, and added guided tours for the trickiest parts of the app.

What's new in 2.1

Local models & multi-GPU

Multi-GPU layer distribution: llama.cpp now splits a model's layers across every selected GPU, with automatic or manual splits, a per-model single-GPU device override, and per-GPU VRAM reservation so nothing overflows
A smart offloader sizes each device's share automatically and keeps far more of the model on the GPU instead of falling back to the CPU
KV cache placement modes (auto, split with layers, system RAM, or pinned to a main GPU), explained inline with fully localized pickers in the model editor

Performance metrics

A new local-LLM performance page graphs tokens per second and prompt/generation timing, per run and across runs, including group chats
Every generation is linked to its message, with a per-message action to open that reply's performance detail
MTP speculative-decoding stats (acceptance and draft counts) are persisted and shown on each message

Providers

OpenRouter provider pinning: pick a specific provider endpoint per model, with live pricing, cache rates, uptime, and provider logos in the picker, and route every request exclusively through it
Sprout hardware probe: an open-source companion service you run on your Ollama machine, so remote-Ollama runnability is judged against the real hardware behind the endpoint instead of a guess

Mobile & portability

The HuggingFace model browser now works properly on phones: it pairs with a remote Ollama provider and pulls GGUF models straight to that host, with files and recommended settings in a slide-in drawer
Chat export and import rebuilt around the official SillyTavern jsonl format, so histories move cleanly in and out
A shared memory cycle hub unifies the memory-cycle UI across chat and group memory pages, plus a wave of memory and embedding reliability fixes underneath

What's new in 2.1.1

Device sync, rebuilt

Conflict resolution now keeps the newest data instead of the oldest: previously a fresh install could silently overwrite the host device's real settings, advanced settings, and prompt templates with its own defaults
Memory metadata survives sync: importance, categories, timestamps, and embedding versions arrive intact instead of being stripped to bare text
Large libraries sync in chunks with no more transfer size ceiling, and a sync that fails or drops mid-transfer reports a real error instead of completing silently with partial data
Companion data, creation helper drafts, and ASR learning data (custom vocabulary, corrections, dismissed suggestions) now sync between devices, and ASR learning data is included in backups too
If a past sync ever ate your settings or memories: update both devices to 2.1.1 and re-run sync from Settings, it resends everything cleanly

Multi-GPU clarity

A leftover single-GPU pin can no longer silently disable multi-GPU: enabling multi-GPU takes precedence, while a deliberate per-model pin still wins where intended
The model editor now shows the effective GPU setup a model will really use, including settings inherited from global defaults, and pinned models get a visible notice with a one-tap "Remove pin"
Model browser installs persist your offload intent (auto, CPU, GPU, mixed) instead of a hardware-specific layer count, so VRAM you add later is actually used

Provider fixes

The zAI (GLM) provider actually works now: requests were sent to a nonexistent endpoint, so every call failed. Chat completions, GLM's thinking toggle, and API key verification all target the real Z.AI API. If you already added zAI with a regular API key, point the credential's base URL at https://api.z.ai/api/paas/v4 (coding-plan keys work as-is)

Chat & desktop fixes

Stopping a generation now actually stops it: a stop pressed before the first word arrived was silently lost; the "canceled" reply kept generating, was saved invisibly, and reappeared above your next message after a refresh
Linux desktop ships a working embeddings runtime again: the bundled ONNX Runtime library was a 0-byte file; builds now include the real library, and installs with the broken file repair themselves automatically on first use

Quality of life

Five new guided tours: the local model editor, runtime defaults, the model browser's recommendation panel, group chats, and dynamic memory walk you through themselves on first visit, in every supported language
Safe model file deletion: deleting a GGUF or mmproj warns you when a configured model still uses it, listing exactly what would break, without blocking you
Souls written by the right model: companion soul writing now respects the dedicated Soul Writer model in Settings instead of always using the character's chat model, so model A can do the roleplay while model B writes the soul

There's more in the full changelog. These are just the bits worth calling out.

If that sounds interesting, come and join our Discord server! It's the best place to follow updates, give feedback, and influence the future direction of the app.

AI usage (as requested by the subreddit moderation team): while developing LettuceAI, we used AIs such as Fable 5, Claude Opus 4.8 and GPT 5.5 to debug problems and discuss ideas. The coding and UI/UX design were carried out by humans.

Links:

Website: https://www.lettuceai.app/
Download: https://www.lettuceai.app/download
Full changelog: https://www.lettuceai.app/changelog
GitHub: https://github.com/LettuceAI/app
Discord: https://discord.gg/8eHDxEbRy4

r/ChatbotRefugees

LettuceAI 2.1 &amp; 2.1.1 (Android &amp; Desktop) are live

What's new in 2.1

What's new in 2.1.1

Silly Tavern Lite (Dumb Tavern)

Director Mode:

Rotation mode:

Despite the AI, I'm Lonely and Could Use a Friend or Two

Does anyone know if the warning is real?

Looking for a suitable roleplay/chat bot

Guys I need Some Help with this -

New Transparency Rules for Developers

Monthly Chatbot Alternatives &amp; Promotions Megathread – July 2026

Clank World removed my post about their issues

Rawr is good for me. Ntr denji

NOMI Ai Review - From a Medium-Power User: Not For Me

Koboldcpp v1.116 released

Project-Y — Pixel Hearts, Infinite API (ISO Alpha Testers)

Promotion Sunday: Looking for testers for Stage Whisper, long-form RP platform. NSFW allowed.

I built an AI story chat app where your choices actually change the scene

StillHere.ink, an API wrapper for AI companions

Smoke Signature Studios: Building localized LLM structures for maximum control

Best overall? I have 20 bucks to burn but I don’t know where to spend it 😃

Question for other devs on how you accept payment

Is there a chub.ai replacement?

LettuceAI 2.1 & 2.1.1 (Android & Desktop) are live

Monthly Chatbot Alternatives & Promotions Megathread – July 2026