u/Due_Anything4678 — reddlx

I got tired of watching Coding sessions re-read the same files over and over. A 2,000-token file read 5 times = 10,000 tokens gone. So I built sqz.

The key insight: most token waste isn't from verbose content - it's from repetition. sqz keeps a SHA-256 content cache. First read compresses normally. Every subsequent read of the same file returns a 13-token inline reference instead of the full content. The LLM still understands it.

Real numbers from my sessions:

Scenario	Savings	How


Repeated file reads (5x)	86%	Dedup cache: 13-token ref after first read
JSON API responses with nulls	7–56%	Strip nulls + TOON encoding (varies by null density)
Repeated log lines	58%	Condense stage collapses duplicates
Large JSON arrays	77%	Array sampling + collapse
Stack traces	0%	Intentional - error content is sacred

That last row is the whole philosophy. Aggressive compression can save more tokens on paper, but if it strips context from your error messages or drops lines from your diffs, the LLM gives you worse answers and you end up spending more tokens fixing the mistakes. sqz compresses what's safe to compress and leaves critical content untouched.

Works across 4 surfaces:

Shell hook (auto-compresses CLI output)
MCP server (compiled Rust, not Node)
Browser extension - Firefox approved. Works on ChatGPT, Claude, Gemini, Grok, Perplexity, Github Copilot
IDE plugins (JetBrains, VS Code)

Install:

cargo install sqz-cli
sqz init

Also available via npm (npm i -g sqz-cli) and pip (pip install sqz).

Track your savings:

sqz gain    # ASCII chart of daily token savings
sqz stats   # cumulative compression report

Single Rust binary. Zero telemetry. 1000+ tests including 57 property-based correctness proofs.

GitHub: https://github.com/ojuschugh1/sqz

Docs: https://ojuschugh1.github.io/sqz/

If you try it, a ⭐ helps with discoverability - and bug reports are welcome since this is v1.0.5 so rough edges exist.

Have anyone else facing this problem ? Happy to answer questions about the architecture or benchmarks.

u/Due_Anything4678 — 5 days ago

▲ 1 r/LocalLLM

I kept running into the same problem with AI coding tools: every session feels disposable.

The agent forgets what it did. The next run re-reads the same files. Context gets duplicated. Claims are hard to verify. APIs drift. Dependencies get stale. So I started building a stack that treats AI like infrastructure, not just chat.

OpenHawk is the process layer of that stack. It is a local-first Agent OS in Rust that manages AI agents like real processes, with Copy-on-Write snapshots, a JSON-RPC bus, per-agent sandboxing, encrypted secrets, and a TUI dashboard for observability. The README also includes a demo GIF, which I would place near the top of this post so people can see the workflow immediately. OpenHawk’s setup flow installs 5 companion tools automatically: Aura, SQZ, Etch, GhostDep, and ClaimCheck. (GitHub)

Here is the stack behind it:

Project	What it does	Numbers worth noting

OpenHawk(GitHub)	Agent OS / process kernel	demo section in README, installs 5 companion tools
Aura(GitHub)	Memory + proof + self-improving knowledge layer	23 packages, 490+ tests, 3 stars
SQZ(GitHub)	LLM context compression	176 stars, , 15 releases
Etch(GitHub)	API change detection from real traffic	5 stars, includes `demo.gif`
GhostDep(GitHub)	Phantom / unused dependency detection	8 stars, supports Go, JS/TS, Python, Rust, Java
ClaimCheck(GitHub)	Verifies what AI agents actually claimed	3 stars

SQZ’s current real-session stats are the kind of thing I wanted to build around instead of hand-wavy “efficiency” claims:

SQZ metric	Value

Compressons	3,003
Tokens saved	178,442
Average reduction	24.7%
Best observed reduction	up to 92% with dedup

The way I think about the stack is simple:

OpenHawk handles execution.
Aura handles memory and proof.
SQZ handles context efficiency.
Etch handles API truth.
GhostDep handles dependency truth.
ClaimCheck handles agent truth.

This is still early, but it is the kind of foundation I wanted from day one: local-first, process-aware, and built to compound over time instead of resetting every session. OpenHawk is the system layer, and the rest of the stack is there to make the system smarter, leaner, and more trustworthy over time.

I’d genuinely love blunt feedback from people building local AI, agent infra, or Rust tooling: what feels most useful, what feels overbuilt, and what should be cut first?

Repo: OpenHawk
Stack: Aura, SQZ, Etch, GhostDep, ClaimCheck

If you find out any of the tools help, kindly please star it for the discoverability , please share your stories with other, feedback needed from the open source community

u/Due_Anything4678 — 2 months ago

▲ 3 r/ollama

I kept running into the same frustration with AI coding tools: every session felt like starting from zero.

Local AI, Claude Code, Cursor, Gemini CLI, ChatGPT, Codex - they all remember things differently, if at all. Decisions get lost, context gets scattered, and when an AI says “I created the file” or “I installed the package,” you still have to double-check it yourself. So I built Aura - a local-first daemon that gives AI tools persistent memory, claim verification, MCP traffic observability, OWASP compliance scoring, and a self-improving knowledge wiki. It is designed to work across tools, with one binary and zero cloud dependency.

The core idea is simple: make AI sessions compound instead of reset. Aura lets you store memory once and reuse it across tools, verify whether agent claims are actually true, track what your AI sessions cost, inspect MCP traffic, and keep a knowledge base that grows over time instead of disappearing with the session.

A few things Aura currently does:
Aura can verify claims like file creation or package installation, share memory across tools, compress context before it hits the model, scan for phantom or unused dependencies, track token/cost usage, and gate destructive actions with approval. It also includes a wiki mode for ingesting docs, URLs, and folders, then querying and visualizing the resulting knowledge graph.

It is still early - it is in v1.0-dev am sharing it now because I want feedback from people who feel the same pain: fragmented AI context, unreliable agent actions, and no real observability into what the tool is doing.

If this problem sounds familiar, I would love feedback, ideas, and brutal honesty.

https://github.com/ojuschugh1/aura

If you try it, a ⭐ helps with discoverability - and bug reports are welcome since this is v1.0-dev so rough edges exist.

u/Due_Anything4678 — 2 months ago