u/Altruistic_Night_327

Atlarix — Agent Workstation for macOS that works with your local models (Ollama, LM Studio) and BYOK APIs

Atlarix — Agent Workstation for macOS that works with your local models (Ollama, LM Studio) and BYOK APIs

What it is: An agent workstation that sits beside your editor (VS Code, IntelliJ, Vim) instead of replacing it. The agent gets a live map of your codebase and every destructive action goes through an approval queue.

Why I built it: I got tired of AI coding tools that (1) locked me into subscriptions with moving limits, (2) replaced my editor, and (3) silently modified files without showing me first.

• macOS-specific details:

I - Native Apple Silicon build (arm64 + x64)

II - Works with Ollama, LM Studio, llama.cpp out of the box

III - OpenRouter BYOK support for when you need cloud models

IV - Notarized DMG (App Store coming later when we can justify the 30% cut)

V - Free solo tier with BYOK/local models

• What makes it different:

--> Live codebase map — agent navigates architecture (imports, dependencies, call graphs) instead of dumping raw text into context

--> Approval queue — every file change streams for review before execution

--> Model-agnostic — native tool contracts per provider, no proxy translation layer

--> Session persistence — task ledger survives across context resets

Download: atlarix.dev

• Pricing:

1 - Solo (free, BYOK/local, entire ai workstation functionality),

2 - Studio (19/mo, managed bridge + unlimited workspaces, mcps and skills ),

3 - Team (79/mo, team management)

(I'm the founder — happy to answer questions or take feedback.)

youtu.be

I built an AI coding agent where I can actually see what it's about to do before it does it

Been testing different AI coding setups over the past few

months on a decently large TypeScript monorepo. Wanted to

share what I found since this community seems to care about

this stuff.

Blackbox AI — solid for in-editor suggestions and quick

completions, IDE integration is smooth. Starts to struggle

when the task spans many files at once.

Cursor — great if you're on VS Code. Not an option if

you're not.

Atlarix — different approach entirely. Doesn't live inside

your editor, runs as a separate app. Parses your repo into

a graph so it navigates structure instead of reading raw

files. Better for multi-file tasks and architectural work.

Free with local models.

Honestly they serve different use cases. Blackbox/Cursor

are better for quick in-editor flow. Atlarix is better when

you want the agent to work across the whole codebase

autonomously.

What setups are others here using for larger projects?

Black box AI is great until it deletes the wrong file.

I've been using LLMs for coding for two years. The pattern is always the same: prompt → wait → hope the output is right → manually verify everything. The model is a black box. I don't know why it suggested what it suggested. I just have to trust it.

That trust breaks when the agent has agency — when it can write files, run commands, push to git. A black box with a terminal is a liability.

So I built the opposite: an agent environment where the model is still a black box (I can't inspect its weights), but its actions are fully transparent and gated.

Here's how it works:

Every action is visible before execution

The agent plans in natural language. Then it requests specific tools:

  • read_file — auto-approved, I see what it read
  • search — auto-approved, I see the query
  • edit_filequeued for approval, I see the exact diff before it writes
  • run_commandqueued for approval, I see the exact command string

No "surprise, I rewrote your auth layer" moments. The black box thinks, but it can't act without my review.

The plan is inspectable

Before Build mode, the agent operates in Plan mode. It drafts a step-by-step plan:

plain

Copy

[ ] Install express-rate-limit
[ ] Create rate limiter config
[ ] Apply to login route
[ ] Apply to register route
[ ] Update tests

I review the plan. If it's wrong, I reject and explain. The agent replans. Only when the plan is solid do I switch to Build mode and start approving individual actions.

The code map is queryable, not injected

Instead of dumping raw code into the black box (where I can't see what it actually used), the agent queries a structured code map:

  • "Where is auth middleware registered?" → returns precise file + line
  • "What calls the login handler?" → returns dependency graph

I can inspect the map myself. The agent's queries are logged. If it makes a wrong assumption, I see the faulty query and correct it.

Local models = inspectable models

The environment routes to any model: cloud APIs or local (Ollama, LM Studio). When I run a 7B model locally, I control the inference. No data leaves my machine. The black box is literally in my box.

What I still can't inspect:

  • The model's reasoning process (still a black box)
  • Why it chose tool A over tool B (unless I read the full context)
  • Edge case failures in complex autonomy

What I can inspect:

  • Every file it read
  • Every search it ran
  • Every diff it proposed
  • Every command it wanted to execute
  • The full plan before execution
  • The session ledger after execution

The black box thinks. I decide what it does.

The practical result:

I use Neovim. The agent workstation sits beside it. I never leave my editor. The agent never acts without my review. I get the productivity of an autonomous agent with the safety of a PR review cycle.

Solo tier is free. Local models are unlimited. macOS + Linux.

Built in Nairobi, Kenya 🇰🇪

Question for this community:

What's your trust threshold for AI agents? Full autonomy with logging? Approval queues? Or do you need something more — formal verification, deterministic outputs, human-in-the-loop for everything?

reddit.com
u/Altruistic_Night_327 — 4 days ago
▲ 4 r/devtools+1 crossposts

Built an agent workstation with first-class Ollama support — approval queue, live code map, terminal access

Been lurking here for a while, wanted to share something I've been building.

Atlarix is a desktop coding agent that treats Ollama as a first-class provider. You add your Ollama base URL in settings, pick your model, and the full agent stack works — including multi-model routing (Fast/Balanced/Thinking tier mapping to whichever Ollama models you have pulled).

The thing that makes it more than just a chat wrapper:

Live Code Map — your repo gets parsed into a node/edge graph. The agent queries structure first before reading files. Dramatically reduces context bloat compared to "dump everything into the prompt."

Approval queue— every `write_file` and `run_command` goes through an explicit approve/reject step. You see the diff before it runs.

Agent modes— Explore (read-only), Plan, Build, Fix, Review. Each mode has a different tool surface so the model can't accidentally write files in read-only sessions.

Free tier is unlimited for local models. No telemetry on what you build.

Currently at v8.4.1. macOS + Linux.

https://atlarix.dev

Source for the MCP registry and agent behaviors is open: github.com/AmariahAK/atlarix-mcps

What Ollama models are you all running for coding tasks these days? Trying to get better benchmark data.

u/Altruistic_Night_327 — 4 days ago

Wanted to share an approach I've been using for retrieval-augmented generation over large codebases and get feedback from people thinking about similar problems.

The problem Naive codebase RAG typically works by chunking files into text segments and embedding them for similarity search. This breaks down on code because semantic similarity at the chunk level doesn't capture structural relationships — a function in file A calling a type defined in file C won't surface that dependency through embedding proximity alone.

The approach: AST-derived typed graphs Instead of chunking, I parse every file using Tree-sitter into its AST, then extract a typed node/edge graph:

  • Nodes: functions, classes, interfaces, types, modules
  • Edges: imports, exports, call relationships, inheritance, composition

This gets stored in SQLite as a persistent graph. Parse cost is one-time per project.

Retrieval: BM25 over graph nodes At query time, instead of embedding similarity, I run BM25 scoring over node metadata (names, signatures, docstrings, file paths). Top-scoring nodes get passed to the LLM. The graph structure means a retrieved function automatically pulls in its direct dependencies via edge traversal.

Empirically this lands at ~5K tokens per query on medium-large codebases that would otherwise require ~100K tokens with naive full-context approaches.

Hierarchical fallback for complex queries For multi-file reasoning tasks:

  1. A Mermaid diagram of the full graph serves as a persistent architectural map always in context
  2. BM25 node retrieval handles targeted lookup
  3. At 70% context capacity, a fast model compresses least-relevant nodes before passing to the primary model

Why BM25 over embeddings here Code identifiers (function names, type names, module paths) are highly distinctive lexically. BM25 outperforms embedding similarity on exact and near-exact identifier matching, which is the dominant retrieval pattern in code queries. Embeddings would likely help more for natural language docstring queries — haven't benchmarked that comparison rigorously yet.

Open questions I'm still thinking about:

  • Better edge-weighting strategies for the graph — currently all edges are unweighted
  • Whether re-ranking with a cross-encoder would meaningfully improve precision over BM25 alone
  • Handling dynamic languages where call graphs can't be fully resolved statically

Has anyone tackled codebase-scale RAG differently? Particularly curious if anyone's compared AST-graph approaches against embedding-based chunk retrieval on real codebases with quantitative benchmarks.

reddit.com
u/Altruistic_Night_327 — 22 days ago

Been building an AI coding tool and kept hitting the same wall: feeding a real codebase to an LLM burns through context fast. A medium production project hits ~100K tokens easily. That's expensive, slow, and the model starts hallucinating file relationships.

Here's the approach I landed on:

Step 1 — Parse into a typed graph Tree-sitter AST walks every file and extracts functions, classes, interfaces, imports, exports, and call relationships. This gets stored as a node/edge graph in SQLite. One-time cost, persistent across sessions.

Step 2 — BM25 scoring at query time Instead of re-reading files, every query scores the graph nodes by relevance using BM25. Only top-scoring nodes go to the LLM. Everything else stays in the database.

Step 3 — Hierarchical fallback For complex queries: a Mermaid diagram acts as a persistent high-level codebase map, BM25 handles targeted retrieval, and at 70% context capacity a fast model compresses the least relevant nodes before passing to the main model.

Result: ~5K tokens per query instead of ~100K. Provider-agnostic — works the same whether you're on GPT-4o, Claude, Gemini, or a local Ollama model.

Happy to go deeper on any part of this — the BM25 implementation, the graph schema, or the compression layer. Anyone else tackling codebase RAG differently?

reddit.com
u/Altruistic_Night_327 — 22 days ago

Been building Atlarix in Electron + React + TypeScript for about a year. Wanted to share the CI/CD setup since it took a while to get right.

The pipeline handles: macOS build with Apple Notarization via xcrun notarytool, hardened runtime codesigning, Linux packaging across all three formats, and automated release publishing — all triggered on tag push.

The hardened runtime + notarization part specifically was painful. Happy to share the Actions config if anyone's fighting with that.

App itself is an AI coding environment with local model support (Ollama/LM Studio). atlarix.dev if curious, but mainly posting because the Electron + GitHub Actions setup might save someone a headache.

reddit.com
u/Altruistic_Night_327 — 25 days ago