u/6OMPH

Few weeks ago I got fed up with some networking issues I kept hitting in Reins (https://github.com/ibrahimcetin/reins) and forked it. Upstream went quiet after 1.2.0. I fixed what was bothering me and kept going until it turned into something else.

Claude wrote most of the code. I did the architecture, the debugging, the daily driving. Saying it upfront because I'd rather you know than find out later. GPL-3.0, commits are all there.

What it talks to:

Ollama — local servers and Ollama Cloud, bearer auth works
Claude — Anthropic Messages API, 4.x extended-thinking included
OpenAI — Chat Completions and o-series
Gemini — Google Generative Language API

Provider and model are per-chat. You can switch mid-thread.

Things that actually matter:

Primary and backup Ollama URL with failover. You set a home LAN address and an optional backup — Tailscale, VPN, whatever. It fails over on SocketException, timeout, or HttpException without you touching anything. Remembers whichever URL last worked so it's not sitting there probing a dead server for 30 seconds on every request. Your home server stays off the internet.

Bearer auth for Ollama. Authorization header on every request. Works with Ollama Cloud keys and any reverse proxy. Local servers without auth ignore it.

Per-chat thinking toggle. Default / On / Off, wired to Ollama's think field. For models that have a thinking phase. Models that don't just ignore it.

Keys in the OS keystore. flutter_secure_storage. Nothing in plaintext.

Streaming that doesn't fall apart. Rendering Markdown live during a stream gets slow — flutter_markdown reparses the whole string on every token and it compounds as messages get longer. There's a typewriter buffer now that drains at an adaptive rate, renders plain Text during the stream, swaps to MarkdownBody when done. AutomaticKeepAlive caps around 30 recent bubbles so scrolling back doesn't blow up.

Per-chat everything. Provider, model, system prompt, temperature, max tokens, context size, thinking mode. Model picker groups by provider.

Schema self-heal. If a chat's provider column is missing or wrong it infers provider from the model name. Old chats don't break when the schema changes.

OLED true-black dark theme.

GitHub Actions on every push. Signed Android APK, macOS .app, Windows .exe with VC++ runtime bundled, Linux .deb and .tar.gz. All five every release.

Compared to Reins: Reins is Ollama-only and has some hang and leak edges. Horizon adds three providers, hardens networking across the board, and replaces the streaming renderer. Also stops defaulting num_ctx to 2048 — that was forcing Ollama to unload and reload models any time they were loaded at a different context size. Now it leaves context to the server unless you set it explicitly. The bearer auth, Cloud support, and thinking toggle all came from open issues on the Reins tracker.

Code: https://github.com/60MilesPerHour/Horizon Releases: https://github.com/60MilesPerHour/Horizon/releases

Known issue: Gemini is implemented and the request shape looks right but I keep hitting auth errors with every Google AI Studio key I've tried across multiple accounts. Probably something on my end — project gating, region, billing, no idea. If you have a working key and want to test the Gemini provider in v3.3.0 and tell me whether models list and stream correctly, that would be useful data.

Not affiliated with any of the projects or companies mentioned. Bugs, feature ideas, and PRs welcome.

Horizon — multi-provider Flutter chat client. Ollama (local + Cloud), Claude, OpenAI, Gemini. Android / macOS / Windows / .deb / tar.gz