r/OpenSourceeAI

I would like to share my latest open source local LLM inference tool implemented in C#. It supports models like Gemma4, Qwen3.6 with multi-modal (image, vision, audio), reasoning and function tool. It can run on Windows/MacOS/Linux and fully leverage GPU's capability. The API is completely compatible with OpenAI and Ollama interface.

Really appreciated if you can try it and give me some feedback. If you like it, it will be a big thank you if you can star it. Thank you very much!

u/fuzhongkai — 10 hours ago

▲ 63 r/OpenSourceeAI+42 crossposts

Ask questions across your Markdown notes using a fully local Graph RAG engine. Built for Obsidian vaults, works with any folder of Markdown files. Extracts entity-relation triples from wikilinks & YAML frontmatter, retrieves answers via hybrid search (vector + BM25 + temporal). Multilingual. No cloud. Runs on Ollama.

https://github.com/benmaster82/Kwipu

u/WritHerAI — 13 hours ago

▲ 32 r/OpenSourceeAI+9 crossposts

Local coding models need better repo context, not just bigger context windows

Local coding models have a repo-context problem.

When using llama/qwen/mistral/gemma for coding, the hard part is often not the model itself. It is getting the right files/functions into context without dumping too much raw source.

Long context helps, but it does not solve retrieval.

If the model never sees the right file, it still guesses.

I’ve been building SigMap, a zero-dependency CLI that creates a compact repo map for coding workflows.

Instead of sending raw source first, it extracts:

function signatures
classes/interfaces
exports
import relationships
ranked file matches per query

The workflow is simple:

repo map first → find likely files → read full source only where needed

Benchmarked across 18 repos / 90 tasks:

81.1% hit@5 vs 13.6% random baseline
~6× better file retrieval
96.9% token reduction in the benchmark setup
41.4% fewer prompts per task

No embeddings. No vector DB. No npm dependencies.

This is not meant to replace LSPs, grep, agent search, MCP tools, or full-file reads.

It is meant to give local coding models / agents a cheap first-pass structure map before deeper inspection.

Repo: https://github.com/manojmallick/sigmap

Benchmark suite: https://github.com/manojmallick/sigmap-benchmark-suite

Curious how people here handle repo context with local coding models.

Are you mostly using grep/search, RAG, repo maps, MCP tools, or just relying on longer-context models?

Edit: Good point from the comments — SigMap core is model-agnostic. The docs currently look too focused on proprietary assistants, so I’ll add clearer examples for VSCodium/Open VSX, Continue, Cline/Roo Code, Aider, OpenHands, and local Ollama/llama.cpp workflows.

u/Independent-Flow3408 — 18 hours ago

▲ 23 r/OpenSourceeAI+5 crossposts

Hierarchos: Preliminary Findings From a 232M Recurrent Memory-Augmented Assistant Model [P]

Project Release / Research Draft] Hierarchos at 232M Parameters: Preliminary Findings From a Recurrent Memory-Augmented Assistant Model

Technical Report: July 2nd, 2026

Project: Hierarchos / KortexHOS

Authors: Makhi Burroughs / netcat420, Lost Time, and the Hierarchos project team

TL;DR:

We built and trained Hierarchos, an experimental 232M-parameter recurrent, memory-augmented language model from scratch. It is not a GPT-3/3.5-class model, but it successfully proves that a hybrid non-Transformer architecture (combining an RWKV backbone, hierarchical manager/worker loops, differentiable slot-based LTM, and a deterministic suffix automaton) can survive training, avoid collapse, and maintain short-form instruction coherence. Most of our breakthroughs came from fixing subtle train/inference parity mismatches and numerical stability bugs.

Dataset: netcat420/Experiment_0.1 (Alpaca format)
Training: 13 epochs on an RTX 6000 Blackwell (96GB) rental.

1. Introduction & Background

Modern LLMs are heavily dominated by Transformer scaling. Hierarchos explores a different path: can recurrent state, explicit memory retrieval, hierarchical iterative computation, and bounded local inference make a small model vastly more parameter-efficient?

Hierarchos isn't a direct clone of any single architecture, but a hybrid inspired by:

RWKV-style recurrence: For efficient sequence processing without traditional attention.
Titans-style neural memory: For persistent test-time memory.
Hierarchical reasoning (HRM): Multi-level recurrent modules (Manager/Worker) to iteratively refine state.

2. Architecture Overview

[Token Input] -&gt; [ROSA Suffix Matcher / DeepEmbed Modulator]
       |
       v
[Long-Term Memory] &lt;-&gt; [Top-k Associative Lookup]
       |
       v
[Manager Recurrent Cell] -&gt; (Produces Context Plan &amp; Drift Vector)
       |
       v
[Worker Recurrent Cell]  -&gt; (Refines local state / clamps drift)
       |
       v
[RWKV Backbone (Clamped Channel-Mix)] -&gt; [Next-Token Logits]

Key Components:

ROSA: A deterministic suffix-automaton path predicting continuation tokens based on exact repeated suffix patterns.
DeepEmbed: A token-specific modulation path that influences RWKV channel mixing.
LTM Subsystem: Learned slow-memory keys/values combined with fast working-memory values.
Manager/Worker Loop: High-level manager handles broad context to produce a target plan; the lower-level worker refines token-local state using a regularized drift vector.

3. Core Engineering Lessons (The "Gotchas")

A low training loss does not guarantee coherent chat. We had to fix several critical state-contract and numerical stability bugs to make the model usable:

1. Chat/Training Drift Mismatch

The Bug: During live streaming chat, the loop was feeding the previous drift state back into the model on every single token. During training, this state is reseeded at Truncated Backpropagation Through Time (TBPTT) chunk boundaries.
The Fix: We aligned the inference code to only reseed at boundary limits. Before this fix, live chat logits diverged sharply from training loss; after the fix, logit error dropped to near-zero.

2. Supervised LTM Inner Updates Mismatch

The Bug: Giving the model supervised memory updates during training that it can't replicate during zero-label live inference creates a crutch. The model learns to rely on a hidden training-only helper signal.
The Fix (v0.20.4): Implemented --ltm-training-mode read-only. Training keeps the memory structures but stops doing supervised fast-memory writes, perfectly mirroring inference.

3. Unbounded RWKV Channel Mixing

The Bug: Long runs exposed activation spikes in the ReLU-squared channel-mix FFN path, which were amplified by DeepEmbed modulation into NaN gradients.
The Fix: Implemented key clamps (--rwkv-channel-mix-key-clamp 12.0), DeepEmbed clamps (4.0), and excluded DeepEmbed identity gates from AdamW weight decay.

4. Evaluation & Smoke Test Results

Because cloud costs add up, we benchmarked the model locally on a CPU preset via a ROG Ally (--eval-limit 100), ensuring passive learning was disabled and working memory was cleared to mimic static chat.

Bounded Local Benchmark Metrics (--eval-limit 100)

Benchmark	Metric	Score	Std. Err.
ARC Easy	acc	0.3600	0.0482
ARC Easy	acc_norm	0.3200	0.0469
HellaSwag	acc	0.3400	0.0476
HellaSwag	acc_norm	0.3700	0.0485
TruthfulQA MC1	acc	0.2200	0.0416

Real-world Coherence Check:

The Good: Assistant-shaped, follows short instruction prompts well due to the Alpaca training data. Nontrivial commonsense and QA signal prove the weights didn't collapse.
The Bad: Brittle on long context lengths, weak on arithmetic/factual recall. Coherence is comparable to the GPT-2 era, not modern GPT-3.5+ systems.

5. Proposed Ablation & Scaling Plan

We want to transform this from a promising prototype into a rigorous scientific result. Our next step requires scaling tiers and isolated component testing.

Proposed Isolation Testing (Ablations)

No LTM / Read-Only LTM: Isolating exactly how much slot memory helps.
No ROSA / No DeepEmbed: Evaluating the real token-efficiency gains of suffix-matching and modulation.
Baseline Matches: Running a direct Transformer 232M and RWKV-only 232M on the exact same token budget to prove true comparative architecture efficiency.

Future Scaling Target Tiers

Tier	Model Size	Token Target	Purpose
Scout	300M–500M	20B–50B	Validate loss slope and stability scaling.
Real v1	1B–1.5B	100B–300B	Test architecture limits beyond small-scale behavior.
Serious	3B	600B–1.5T	Establish a truly competitive local open-source alternative.

Target Data Mix for Foundation Training:

Instead of jumping straight into instruction SFT data, a scaled run will prioritize high-quality base data:

35-50%: FineWeb / FineWeb-Edu style clean web text
20-30%: Dolma / DCLM curated web data
8-15%: Code and tech documentation
5-12%: Math, science, and academic proofs
1-5%: In-house assistant conversational SFT (applied exclusively in late-stage tuning)

6. What We Can (and Cannot) Claim Safely

What is supported by the data:

Hierarchos is a functional, coherent 232M experimental assistant checkpoint.
Combining recurrent sequence loops, memory slots, and hierarchical workers is viable and stable with the right clamps.
The findings provide a solid engineering roadmap for non-Transformer architecture stability.

What is NOT supported (Do not hype this!):

No claims of GPT-3.5 level math, coding, or logic.
No claims of attention/Transformer superiority at equal parameter counts yet (baselines pending).
Not production-ready for heavily quantized or low-bit local deployments yet due to drift sensitivity.

Final Thoughts

Hierarchos 232M shows that small, alternative architectures are still a deeply fruitful area of LLM research if you can conquer the train/inference state drift.

We would love to hear feedback from anyone working on recurrent neural memory or hierarchical backbones! Full code, scripts, and logs are in progress.

References:

Brown et al. **Language Models are Few-Shot Learners.** arXiv:2005.14165. https://arxiv.org/abs/2005.14165
Hoffmann et al. **Training Compute-Optimal Large Language Models.** arXiv:2203.15556. https://arxiv.org/abs/2203.15556
Peng et al. **RWKV: Reinventing RNNs for the Transformer Era.** arXiv:2305.13048. https://arxiv.org/abs/2305.13048
Behrouz et al. **Titans: Learning to Memorize at Test Time.** arXiv:2501.00663. https://arxiv.org/abs/2501.00663
Wang et al. **Hierarchical Reasoning Model.** arXiv:2506.21734. https://arxiv.org/abs/2506.21734
Zellers et al. **HellaSwag: Can a Machine Really Finish Your Sentence?** arXiv:1905.07830. https://arxiv.org/abs/1905.07830
Clark et al. **Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge.** arXiv:1803.05457. https://arxiv.org/abs/1803.05457
Lin et al. **TruthfulQA: Measuring How Models Mimic Human Falsehoods.** arXiv:2109.07958. https://arxiv.org/abs/2109.07958
Hugging Face. **FineWeb dataset.** https://huggingface.co/datasets/HuggingFaceFW/fineweb
Hugging Face. **FineWeb-Edu dataset.** https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu
Allen AI. **Dolma dataset.** https://huggingface.co/datasets/allenai/dolma
DataComp-LM. **DCLM Baseline dataset.** https://huggingface.co/datasets/mlfoundations/dclm-baseline-1.0

github repository with the architecture and the released model weights: https://github.com/necat101/Hierarchos

u/PhysicsDisastrous462 — 11 hours ago

▲ 25 r/OpenSourceeAI+5 crossposts

Archestra V1.3 (OSS) brings a central hub for skills — sync with Claude Code/Codex both ways, promotion, and sandboxed code execution

Archestra 1.3 "Lyra" (https://github.com/archestra-ai/archestra) is out today with support for skills and code execution.

Archestra doesn't just run skills in its own runtime — it acts as a company-wide hub for them. Skills sync between Archestra and Claude Code, Codex, and other agents in both directions.

On top of that, two things make a central place actually workable:

Promotion. Skills are personal by default — tinker, iterate, do whatever strange things you want. When one is good, promote it to your team or the whole org as an official skill everyone can see.
Network policies for code. Skills ship with scripts, and agents need to run them. Each skill's code executes in a sandbox bound to an environment, inheriting that environment's egress policy — and refusing to run if the environment can't be resolved. So "agents run skill code" is a sentence you can say to your security team.

It's open source.

Post with demo: https://archestra.ai/blog/new-skills-sandboxes

u/motakuk — 12 hours ago

▲ 1 r/OpenSourceeAI+1 crossposts

I built a neural network from scratch. I'm 15. Here's what happened.

So I built this thing called ONA (Omni Neural Architecture) over the past year. It's a neural network that learns from everything you give it. No PyTorch, no GPU, just Python and NumPy.

Actually speaking i want an Intelligent AI CODING AGENT which is free limitless and runs on my low-end hardware but there was nothing like that except for running cloud models. But, I started matching pieces like self-learning models,etc. Found myself in an need to build an new architecture, so i have made an new LLM code in Python changes how the matrix multiplications and params work and tried to tune the architecture so that only specific params activate when answering to related prompts, and guess what it worked!!. This is the architecture with few more build ups adding on top of it like, word-word generation, and an thinking loop. Actually i tried to relate this to how i learn in school like what loop i follow to prepare for an test named it as -Bio Loop, and added it to this particular architecture which made it Learn-on-spot LLM. For now it so dumb and can't answer things properly but can understand what the user means. It needs training, I am training it by feeding it internet articles presently. Anyways the code works and it has every right to become an GPT-5 model with enough training. Presently only CPU training gonna update the code to Rust so that it can be trained much faster than the regular python for loops. Gonna add GPU training later, but it is the symbol which proves that an high-level LLM can be run on an rassberry PI without any subscription completely free and limitless.

Anyway the code isn't public yet (hackathon soon) but the architecture is solid and it runs on my laptop. Happy to explain anything. And yes I wrote this myself lol.

MEDIUM LINK:https://medium.com/@kasishgadadhasu13/im-15-i-built-a-self-learning-neural-network-from-scratch-no-frameworks-no-gpu-e460f06c6599?sharedUserId=kasishgadadhasu13

reddit.com

u/Whole_Bridge3064 — 21 hours ago

▲ 5 r/OpenSourceeAI+3 crossposts

I built the universal Solana MCP. Any AI agent can connect in one click — send crypto, swap tokens, trade memecoins. No API keys. 100% open source.

I built an MCP server that gives any AI agent (Claude, Cursor, etc.) full read/write access to Solana through natural language. Your private key signs transactions locally. The agent never sees it. No API keys are shared. Ever.

14 tools — 8 read, 6 write

Read (no wallet): SOL balances, token balances, token metadata, live SOL price, pump.fun scanner, transaction lookup.

Write (with Phantom key): send SOL, send tokens, Jupiter swap, buy/sell pump.fun memecoins, devnet airdrops.

How it works

Your AI agent spawns the server. They talk through a local pipe (stdin/stdout). No network. No HTTP. No third party. The server grabs your key from .env, signs the transaction, sends it to Solana, and returns the signature. That's it.

You → AI agent → MCP server (local) → Solana

↓

your key (.env)

Why this matters

Every crypto AI tool asks you to paste your private key somewhere. Web app. Telegram bot. Browser extension. All of them expand your attack surface.

MCP inverts this. Everything runs on your machine. Your keys never leave. You get AI-powered trading without trusting anyone.

What's next

This is the foundation. I'm already building:

→ A fully autonomous memecoin trading bot — momentum detection, auto TP/SL

→ An airdrop farmer — hunts and claims tokens across protocols

→ Portfolio tracking — real-time P&L across all your wallets

Your AI agent should do everything you do on-chain — trade, farm, snipe, track — without you touching a dApp. This MCP server is the bridge.

Tech

TypeScript. 680 lines. MCP SDK 1.29. u/solana/web3.js. Helius WebSocket. Jupiter v6 API. Zod schemas. Circuit breaker + retry.

License — AGPL-3.0

Companies that modify this and run it as a service MUST release their changes. Individuals: use, modify, distribute freely. Nobody closes the source.

Start in 30 seconds

git clone https://github.com/KorroAi/solana-agent-mcp

cd solana-agent-mcp && npm install

cp .env.example .env

npm run dev

Type /solana in Claude Code.

⭐ Star: https://github.com/KorroAi/solana-agent-mcp

📄 Paper: 10-section academic paper in the repo

💬 AMA in the comments

u/korro_ai — 16 hours ago

▲ 101 r/OpenSourceeAI+5 crossposts

Steno: Opensource AI powered intelligence layer for all your confidential conversations.

Hey folks, wanted to share the latest update of Steno. Steno is an opensource project for a privacy focused AI notepad that rivals Granola with the added benefit of having opensource code and keeping your data private. No cloud, no usage limits and completely free.

With v0.3.0, you now have the ability to:

Query across all your notes across time
Have diarised transcripts
Have conversational history of all your chats against notes

In our roadmap, we'll be releasing speaker diarisation and live transcription next :)

We have a great community of contributors and always looking for great people to improve and push the boundary on privacy, local LLM and opensource AI.

Codebase @ - https://github.com/ruzin/stenoai
Download @ - https://stenoai.co

u/Far_Noise_5886 — 1 day ago

▲ 109 r/OpenSourceeAI+3 crossposts

[VisualTorch] How to generate architecture diagrams from PyTorch models

I built a small tool to auto-generate architecture diagrams directly from PyTorch models, which I originally built for my own research paper.

26k+ PyPI downloads, already used in publications (Nature, IEEE, MDPI), check out some use cases here: https://visualtorch.readthedocs.io/en/latest/markdown/showcase/index.html

It traces an actual forward pass, so it correctly captures branching, skip connections, and multi-input models, not just flat sequential stacks.

import visualtorch
import torchvision.models as models

model = models.resnet18()
img = visualtorch.render(model, input_shape=(1, 3, 224, 224), style="graph", show_neurons=False, layer_spacing=60)
img.save("resnet18.png")

Three rendering styles depending on what you want to show:

graph: node/edge diagram, good for showing branching/skip connections clearly
flow: stacked volumetric boxes, closer to the classic CNN-paper look
lenet: the classic LeNet stacked-plane style

GitHub: https://github.com/willyfh/visualtorch | Docs: https://visualtorch.readthedocs.io/en/latest/

Open to feedback, especially if you hit a model it renders weirdly :)

u/LostDistance9365 — 1 day ago

▲ 27 r/OpenSourceeAI+2 crossposts

I curated 48 LLM observability tools (Langfuse, Phoenix, Opik, LangSmith…) + a comparison matrix

Every few weeks I end up re-comparing LLM observability/eval tools for a project, so I put it all in one place: 48 verified tools across tracing, evals, prompt mgmt, gateways, OTel instrumentation, and guardrails, each with current stars + license; plus a self-host / license / tracing / evals / OTel comparison table for the top platforms.

It also includes original agent skills (instrument tracing, add evals, debug-from-traces, PII-safe tracing for regulated apps) and a minimal OpenTelemetry GenAI tracer.

Full disclosure, it's my org's repo (CC0, contributions welcome): https://github.com/ContextJet-ai/awesome-llm-observability — what tool am I missing?

u/nishchaymahor19 — 1 day ago

▲ 8 r/OpenSourceeAI+4 crossposts

TokenMizer - a local proxy for session checkpoint/resume and graph memory across Claude, GPT, and Ollama

I've been building TokenMizer, a local proxy that sits between your editor/CLI and whatever model you're using (Claude, GPT, Ollama) and handles two things I kept re-solving by hand: session checkpoint/resume, and a graph-based memory instead of a flat transcript.

The problem: once a long agent session hits the context limit, the usual fix is summarization, and summaries lose the reasoning behind a decision, not just the decision itself. I'd see a summary saying "switched to Argon2" with no trace of why bcrypt was rejected, so the agent would re-litigate the same tradeoff two sessions later. Flat transcripts have the opposite problem: everything is kept, but nothing is prioritized, so retrieval is just recency-biased keyword luck.

What TokenMizer does differently: instead of one growing text blob, decisions, constraints, and open questions are stored as nodes with edges (this decision depends on that constraint, this question was resolved by that decision). Checkpointing snapshots that graph plus a resumable session state, so you can kill a session and pick it back up without replaying the whole history through the model again.

Where it's rough: there's no eval harness yet comparing retrieval quality against a naive flat-transcript baseline, so right now my evidence is anecdotal (my own sessions), not benchmarked. I also learned the hard way that benchmarking your own memory system by asking it questions only it can answer is circular, so I'm holding off on publishing numbers until I have an honest comparison.

Repo: github.com/Shweta-Mishra-ai/tokenmizer (I'm the author). It's a Python project, MIT licensed. If you've hit the same summarization-loses-reasoning problem, I'd be interested in how you're handling it, and PRs/issues on the eval-harness gap would genuinely help.

u/Feisty-Cranberry2902 — 1 day ago

▲ 3 r/OpenSourceeAI+3 crossposts

I gave my AI assistant a human brain.

JoeBro is a native macOS AI workspace that runs entirely on your machine. No cloud, no account, no telemetry, no third-party packages. Stdlib Python backend, memories in a local SQLite file. Nothing leaves your machine.

It builds up a picture of you as you chat: your projects, your preferences, the things you keep returning to. For a while that lived in a list, it was boring. So I rebuilt it as a graph.

Every memory is a node. Related memories cluster together, pulled by a physics simulation. Line length is conceptual distance. Node size is how connected a memory is — your biggest nodes are the things your assistant keeps coming back to. Hover any node and the full memory text pops up. Right-click to edit, pin, or delete.

The whole UI is liquid glass and you set a wallpaper behind it. The graph floats over whatever image you drop in — nodes, lines, hover cards, all of it. If your wallpaper is moody it looks stunning. Redditors who care about their setup will want to screenshot it immediately.

For me, seeing one project dominate the map as a massive hub node was a strange moment. It knows me. Not because someone trained it on my data, but because I *told* it things and it remembered. That's a different feeling entirely.

Stdlib Python, SwiftUI Canvas, hand-rolled force simulation, GPLv3. Fully offline. Point it at Ollama or any OpenAI-compatible endpoint and you're running.

Repo: https://github.com/joexk1/JoeBro

u/joe_joexk — 2 days ago

▲ 23 r/OpenSourceeAI+2 crossposts

Building an AI Gateway because production LLM apps kept accumulating the same middleware (WIP, looking for feedback)

Over the past few months I've noticed a pattern while building LLM applications.

The application code stays relatively small.

But production concerns keep growing:

PII redaction
retries
provider fallback
audit logs
cost tracking
request logging
prompt inspection
rate limiting

These concerns end up being duplicated across projects.

So I've been building Gavio (work in progress), an open-source AI gateway that lets these concerns be composed as interceptors rather than scattered through application code.

Current ideas include:

• Request/response interceptor pipeline • PII & secret detection • Retry/backoff • Provider abstraction • Audit trail • Cost tracking • Local mock provider • Python / Java / JavaScript SDKs

The goal isn't to replace LangChain, AI SDKs, or provider SDKs.

It's to provide a production layer around them.

I'm still exploring the design, so I'd genuinely appreciate feedback.

Some questions I'm thinking about:

What production problems are you solving repeatedly?
What would you expect from an AI gateway?
Would you prefer middleware, sidecar, proxy, or SDK?
What have I missed?

GitHub: https://github.com/manojmallick/gavio

Docs: https://manojmallick.github.io/gavio

u/Independent-Flow3408 — 3 days ago

▲ 6 r/OpenSourceeAI+3 crossposts

Tried a recurrent architecture (HRM) for reasoning-retrieval, the bet held up.

The bet: BRIGHT is a retrieval benchmark where finding the right doc usually takes a few hops of reasoning, not just semantic overlap. Most embedders do a single forward pass. I wanted to see if a depth-recurrent architecture, one that loops over its own hidden state, would fit that better, so I built an embedder on HRM (Sapient's Hierarchical Reasoning Model). As far as I can tell it's the first time HRM's been used for retrieval.

The recurrence helped on the reasoning side, which was the whole bet. When I dialed the recurrence down at eval on pony (one of the BRIGHT domains), accuracy dropped with every loop I removed. Where it hit a wall was knowledge: the base was pretrained on a deliberately thin slice of text (Sapient built HRM-Text for pretraining efficiency, not breadth), so it's weak on knowledge-heavy domains. The part I find coolest: at 0.6B, the reasoning is coming from the architecture, not from scale.

Details:

~0.6B params, trained on one 3060 Ti (8GB).
Recipe's deliberately boring: mean-pool + L2, bidirectional (LLM2Vec style), contrastive InfoNCE. Only the backbone is unusual. Same recipe as RakanEmbed4B.

Numbers (BRIGHT, mean nDCG@10, 12 domains):

original: 18.1
query rewriting: 34.3
merged: 33.7

Weights are Apache-2.0 and the full BRIGHT eval harness is in the repo.

Open questions / discussion:

Would a massively pretrained HRM push this further? The ceiling here looks like knowledge, not reasoning, so a broadly-pretrained base might lift it a lot. I don't have the compute to try that myself.
Would other recurrent architectures show the same effect, or is something specific to HRM doing the work?

Model: https://huggingface.co/viventhraa96/HRM-Embed-0.6b

Code: https://github.com/okaybroda/hrm-embed

Full credits to Sapient Inc for open sourcing the code and the architecture for this work.

u/v1v55 — 1 day ago

▲ 160 r/OpenSourceeAI+27 crossposts

How to build an AGY WIKI OKF on the Antigravity CLI

AGY Builders,

We are all trying to build useful and scalable workflows for our AGY CLI and ecosystem, but the speed at which we need to learn, build, and deploy new things is incredibly overwhelming. If you are feeling that pressure, you are in the right place here at r/GoogleAntigravityCLI.

Over the past few weeks, I have been testing an "AGY WIKI OKF" setup that I put together myself (after inviting some members of this community to collaborate; mod is not proud). I know some folks might hesitate to trust a tutorial from a random Redditor, but I wanted to share this with the community anyway because it actually works.

I was able to build this because I am all-in on Google and the Antigravity Ecosystem. I’m a truly AGY—I am not some ultra-smart, 10x developer, but I know how to work hard, I dig for the right information, and I iterate.

AGY WIKI OKF | The Idea

To build a frictionless, token-efficient knowledge WIKI engine that transforms static documentation or notes (information) into an active, intelligent collaborator—orchestrated entirely by Antigravity CLI.

The core philosophy is simple: treat knowledge management as a clean pipeline and tokens as a premium, finite resource.

By anchoring this architecture to Google’s Antigravity CLI, the AGY WIKI OKF bypasses heavy middleware and complex UI layers, delivering a hyper-focused AI partner built entirely for execution speed, context hygiene, and minimal footprint.

Why adopting AGY WIKI OKF matters:

Stay organized (AGY OCD): Structured Markdown and YAML keep the chaos in check.
Save tokens: Doing more with less context window bloat.
Scale shareable knowledge: Making it easy to pass context and logic between different LLMs.
Humans and Agents working together: One standardized, readable format that works perfectly for both of us.
BYOD (Bring Your Own Data): Own your context. Port it to the newest model, platform, or OS instantly.

The Tools

Antigravity CLI
Obsidian : The IDE for the Knowledge bank
Obsidian Web Clipper:

The WIKI

In the agent-first era, a WIKI is no longer just a static graveyard for human notes; it is the operational hard drive for your agents. By maintaining a highly structured WIKI, you ensure that every piece of context is stored in a clean, machine-readable format. This means that whether you are testing a new modular skill or spinning up a specialized agent, your AGY CLI knows exactly where to find the precise context it needs to generate autonomous action, moving you far beyond simple, reactive conversational text.

Reference: Gist on Knowledge Representation

Google Open Knowledge Format (OKF)

Google’s Open Knowledge Format (OKF) feels like the exact missing piece we've needed for orchestrating multiple AI agents effectively. It provides a vendor-neutral, interoperable standard for storing and sharing organizational knowledge.

Why this is huge for orchestration:

The "Lingua Franca" for Agents: Any agent can read it out of the box without platform-specific integrations.
Seamless Context Passing: Specialized agents can access, update, and pass the exact same foundational context back and forth.
Human-in-the-Loop Oversight: Because OKF is just Markdown and YAML, it’s inherently readable and auditable.
Scalable Knowledge: It acts as a shared, living library that grows alongside your agents.

AGY WIKI OKF Integration

Structuring an AGY Wiki using OKF revolutionizes how complex knowledge is shared. By standardizing documentation with concise Markdown and YAML frontmatter, OKF provides a unified taxonomy for cataloging AGY CLI slash commands or skills It is highly token-efficient, stripping away bloated formatting and maximizing context window limits.

The Prompt for Building an AGY WIKI OKF

AGY CLI WIKI OKF PROMT EXAMPLE

/grillme I want to initialize a brand-new, empty Obsidian vault from scratch that adheres strictly to the Open Knowledge Format (OKF) standard, with the specific intent of potentially open-sourcing or sharing this architecture later. I want a purely blank, skeletal framework with no pre-populated data. Please grill me to define the optimal architectural blueprint for this vault. I need you to interrogate me on: Do not generate the directory structure or files until you are satisfied that you have captured all my requirements for a production-ready, shareable knowledge base. 
Core Directory Hierarchy: How should we structure the root (e.g., /concepts, /resources, /indices, /log) to be intuitive for external users? Template Strategy: What base boilerplate templates do we need to ensure every new file is automatically OKF-compliant and structured for consistent metadata? Workflow Logic: Since this is a fresh start, what processes should we bake in for capturing information vs. refining knowledge that could be easily documented for others? CLI Integration: What specific file locations or configurations do we need to ensure this vault plays nicely with the Antigravity CLI from day one? Open-Source &amp; Contributor Documentation: What files should we create to make this a "deployable" standard? Please include requirements for: A README.md with installation and usage instructions. A CONTRIBUTING.md that defines how to add new concepts or schemas. A "System Architecture" document that explains the logic behind the folder structure and metadata fields, ensuring anyone who clones this vault understands how to extend it.

The Final File Structure

AGY WIKI OKF
    ├── .agyrc
    ├── ARCHITECTURE.md
    ├── CONTRIBUTING.md
    ├── README.md
    ├── .agy
    │   └── .keep
    ├── .obsidian
    │   ├── app.json
    │   ├── appearance.json
    │   ├── core-plugins.json
    │   └── workspace.json
    ├── 00-Inbox
    │   └── .keep
    ├── 10-Projects
    │   └── .keep
    ├── 20-Areas
    │   └── .keep
    ├── 30-Resources
    │   ├── .keep
    │   └── Google Antigravity Documentation.md
    ├── 40-Archive
    │   └── .keep
    ├── 99-Meta
    │   └── Templates
    │       ├── Base_Template.md
    │       ├── Project_Template.md
    │       └── Resource_Template.md
    └── Clippings

TL;DR

AGY WIKI OKF: Organizes your information (context) , AGY CLI commands, skills behaviors, and A2A workflows into a token-efficient, shareable format that reduces inference costs for any LLM.
Open Knowledge Format (OKF): Provides a standardized, vendor-neutral way to share context (Markdown + YAML), preventing platform lock-in and eliminating data fragmentation.

AGY Builders, I genuinely want your input on this. Please comment, grill me, roast me, ask questions, or give me your raw feedback on this AGY WIKI OKF setup. We are building the foundation to organize and share our data in the BYOD era. Let's build the future together.

u/AgentPadrino — 3 days ago

▲ 12 r/OpenSourceeAI+1 crossposts

ai-rulez: one source of truth for AI coding rules, generates native configs for 19 tools (Go, MIT)

Every AI coding tool wants its own config file: Claude reads CLAUDE.md, Cursor wants .cursor/rules, Copilot expects .github/copilot-instructions.md, and so on. Use more than one and you're maintaining duplicates that drift.

ai-rulez keeps one source. You write rules, context, agents, and commands once in .ai-rulez/, run generate, and it emits each tool's native format for 19 platforms. Two things make it hold up on real projects:

Composition over git: [[includes]] pull shared rule modules from other repos, so org-wide standards live in one place and every repo overrides locally as needed.
Monorepos: nested configs plus generate --recursive, profiles per audience, and 33 builtin domains (languages, security, testing, git-workflow and more) you switch on instead of writing from scratch.

Concrete: in one of my repos a ~25-line config expands into 103 generated files across 5 tools, regenerated on every commit via a pre-commit hook, so nothing drifts.

Single Go binary: npx ai-rulez@latest init, or brew. MIT.

Honest tradeoff: outputs are generated, so you edit the source and never the outputs (they get overwritten).

https://github.com/Goldziher/ai-rulez

How are others managing rules across multiple AI tools?

u/Goldziher — 3 days ago

▲ 35 r/OpenSourceeAI+11 crossposts

Multi-model consensus debate via the filesystem. LLMs propose, peer-review, rebut, vote and synthesize a group-confirmed answer. CLI + MCP.

github.com

u/raiyanyahya — 3 days ago

▲ 3 r/OpenSourceeAI

Seeking Collaboration - Companion Agent

I am working on a companion agent, which I got inspired from OpenClaw/Hermes (Even before hermes tbh). Thing is after certain point with development and day job and stuff things feel like stuck.

I was thinking if I could get some interested folks on similar topics, would be awesome to collaborate.

I am not posting any links at this point since I don't want to make it look like a promotion. I know the chance of getting traction to this post is very minimal, but still posting it.

Please drop me a reply or dm me for additional info.

reddit.com

u/PlayfulLingonberry73 — 3 days ago

▲ 33 r/OpenSourceeAI+5 crossposts

eXo Platform 7.2 has been released : an open-source digital workplace with native AI, self-hosting support, and multi-LLM architecture

A new release of eXo Platform, an open-source digital workplace platform, is now available. It may be relevant to the self-hosted community here.

With version 7.2, the focus has been placed on three main areas:

• Native AI integration directly inside the platform (content management, knowledge access, collaboration, automation)
• Multi-LLM architecture → use the AI models you choose (OpenAI, local models, private deployment, etc.)
• Full deployment flexibility → cloud, private cloud, or fully on-premise/self-hosted

A few technical highlights:

• MCP server exposed via OAuth with access to 100+ platform tools for AI agents
• Internal RAG connected to organizational knowledge bases
• Ability to restrict/contextualize AI sources (documents, spaces, tasks, notes…)
• AI assistants that can be customized for specific internal workflows
• Open-source architecture designed for organizations requiring data sovereignty

The goal is simple: integrate AI into everyday work without forcing organizations into closed SaaS ecosystems.

Feedback from people building self-hosted alternatives in this space is welcome.

Curious how others here are approaching AI + self-hosting.

eXo offers:

Community Edition (CE) → Fully Open Source
- Docker hub
- Github
Enterprise Edition (EE) → additional features & professional support

Both can be deployed self-hosted, in private cloud, or in secure environments (including SecNumCloud).

u/jaouanebrahim — 4 days ago

▲ 22 r/OpenSourceeAI+1 crossposts

Mistral AI Releases Leanstral 1.5: An Apache-2.0 Lean 4 Code Agent Model Solving 587 of 672 PutnamBench Problems

Most AI theorem proving is a language model generating a proof in one shot, with a verifier bolted on at the end to check it. That's autocomplete with a grader — and Mistral just drew a clear line between that and an actual proof agent.

They released Leanstral 1.5 — a 119B MoE with 6.5B active parameters, trained as a code agent that lives inside the Lean 4 compiler loop: propose a proof, read the compiler's goals and errors, refine, repeat until it compiles or the budget runs out. Verification isn't the eval here. It's the training signal.

Here's what's actually interesting:

→ Test-time scaling behaves like a dial: PutnamBench Pass@8 climbs 44 → 244 → 493 → 587 solved as the per-attempt token budget moves 50k → 200k → 1M → 4M

→ 587/672 on PutnamBench at ~$4 per problem, versus an estimated $300+ for Seed-Prover 1.5 high (a 10 H20-days-per-problem budget)

→ Saturates miniF2F: 100% on both validation and test sets

→ Two RL environments in training — a multiturn prover, and a raw-filesystem code agent that edits files, runs bash, and queries the Lean language server for live goals and types

→ Not just math: an Aeneas (Rust → Lean) pipeline flagged 11 genuine bugs across 57 repos, 5 previously unreported — including an integer overflow in datrs/varinteger when (value + 1) hits Std.U64.MAX

Apache 2.0 weights, free API endpoint

Full analysis: https://www.marktechpost.com/2026/07/03/mistral-ai-releases-leanstral-1-5-an-apache-2-0-lean-4-code-agent-model-solving-587-of-672-putnambench-problems/

Model weights: https://huggingface.co/mistralai/Leanstral-1.5-119B-A6B

Project: https://docs.mistral.ai/models/model-cards/leanstral-1-5

Technical Details: https://mistral.ai/news/leanstral-1-5/

u/ai-lover — 3 days ago