r/machinelearningnews

Who've told you that distributed training is impossible? Democratizing AI: The Psyche Network Architecture

It seems that not only it is totally possible without incurring in unfeasible excessively narrow train data transfer bottlenecks but that several models have already been trained using this method. It mostly depends on how many GPUs join such kind of network.

See here: https://psyche.network/runs

nousresearch.com

u/DevelopmentBorn3978 — 6 hours ago

▲ 31 r/machinelearningnews

NVIDIA HORIZON: A Hands-Free Agent that Evolves Git Worktrees and Hits 100% RTL Benchmark Completion

We covered a new paper from NVIDIA Research that moves agentic coding into hardware design.

HORIZON treats hardware design as repository-level code evolution. A human writes a Markdown harness. A bootstrap agent compiles it into a project pack, then a hands-free loop evolves an isolated git worktree until an acceptance gate passes.

Here's what's actually interesting:

Git is the interface, not bookkeeping

Each accepted repair becomes a commit. Git notes carry the evaluator verdict and reward. Rejected attempts are logged as negative examples. The repository history becomes the experience buffer.

The verifier harness is the real contract

The project pack bundles an executable evaluator, an acceptance predicate, a git policy, and domain skills. For RTL that means compile, simulate, coverage, and assertion checks. Any backbone can plug in.

The results

→ 100% completion across ChipBench, RTLLM-2.0, Verilog-Eval, and nine CVDP categories

→ 47.8% aggregate pass rate at the first iteration, before the loop closes the gap

→ 82 iterations for the hardest category (RTL code completion), its long tail the single largest cost

→ ~210M tokens total, ~91% cached input

→ GPT-5.3 as a fixed backbone, single-agent, hands-free

My takeaway: once executable feedback makes correctness converge, the bottleneck shifts to token efficiency and verification quality, not pass rate.

Full analysis: https://www.marktechpost.com/2026/07/04/nvidia-horizon-a-hands-free-agent-that-evolves-git-worktrees-and-hits-100-rtl-benchmark-completion/

Paper: https://arxiv.org/pdf/2606.28279

u/ai-lover — 1 day ago

▲ 28 r/machinelearningnews

NVIDIA AI Introduces ASPIRE: A Self-Improving Robotics Framework Reaching 31% Zero-Shot on LIBERO-Pro Long Tasks

Most robot-coding agents throw away everything they learn. Solve a task, discard the fix, start the next one cold — the agent on its 100th task is no smarter than on its first. NVIDIA's ASPIRE draws a clean line between that and an agent whose experience actually compounds.

They introduced ASPIRE (Agentic Skill Programming through Iterative Robot Exploration) — a code-as-policy system where a coding agent (Claude Code, Claude Opus 4.6, 1M-token context) writes and debugs its own robot programs against a fixed perception/planning/control API, and distills every validated fix into a reusable skill library, with no fixed perception-plan-execute pipeline anywhere in the loop.

Here's what's actually interesting:

→ The execution engine logs per-primitive multimodal traces — RGB keyframes, grasp candidates, object poses, motion plans, return status — so the agent localizes the failing primitive, not just the failed rollout

→ Validated fixes distill into a text skill library (failure signature + when-to-apply guard + repair sketch), not weights — and the agent is barred from reading sim ground truth, so the skills transfer to real hardware

→ Evolutionary search proposes K candidate programs per round, conditioned on surviving programs + residual failure traces — beyond single-trajectory tuning

→ LIBERO-Pro Object under perturbation: 98 vs 22 for CaP-Agent0

→ Robosuite bimanual handover: 92 vs 20 for CaP-Agent0

→ LIBERO-Pro Long zero-shot: 31 vs 4 for prior methods (skills learned on LIBERO-90, no test-time retries)

On a real bimanual robot with a different embodiment and API (OpenAI Codex GPT-5.5), transferred skills took soda-can lifting to 19/20 at ~10x fewer tokens, and drawer opening from 0/20 to 11/20.

The core bet: compound debugging experience into an explicit skill library, not the weights.

Full analysis: https://www.marktechpost.com/2026/07/03/nvidia-ai-introduces-aspire-a-self-improving-robotics-framework-reaching-31-zero-shot-on-libero-pro-long-tasks/

Paper: https://arxiv.org/pdf/2607.00272

Project page: https://research.nvidia.com/labs/gear/aspire/

u/ai-lover — 2 days ago

▲ 23 r/machinelearningnews+1 crossposts

Mistral AI Releases Leanstral 1.5: An Apache-2.0 Lean 4 Code Agent Model Solving 587 of 672 PutnamBench Problems

Most AI theorem proving is a language model generating a proof in one shot, with a verifier bolted on at the end to check it. That's autocomplete with a grader — and Mistral just drew a clear line between that and an actual proof agent.

They released Leanstral 1.5 — a 119B MoE with 6.5B active parameters, trained as a code agent that lives inside the Lean 4 compiler loop: propose a proof, read the compiler's goals and errors, refine, repeat until it compiles or the budget runs out. Verification isn't the eval here. It's the training signal.

Here's what's actually interesting:

→ Test-time scaling behaves like a dial: PutnamBench Pass@8 climbs 44 → 244 → 493 → 587 solved as the per-attempt token budget moves 50k → 200k → 1M → 4M

→ 587/672 on PutnamBench at ~$4 per problem, versus an estimated $300+ for Seed-Prover 1.5 high (a 10 H20-days-per-problem budget)

→ Saturates miniF2F: 100% on both validation and test sets

→ Two RL environments in training — a multiturn prover, and a raw-filesystem code agent that edits files, runs bash, and queries the Lean language server for live goals and types

→ Not just math: an Aeneas (Rust → Lean) pipeline flagged 11 genuine bugs across 57 repos, 5 previously unreported — including an integer overflow in datrs/varinteger when (value + 1) hits Std.U64.MAX

Apache 2.0 weights, free API endpoint

Full analysis: https://www.marktechpost.com/2026/07/03/mistral-ai-releases-leanstral-1-5-an-apache-2-0-lean-4-code-agent-model-solving-587-of-672-putnambench-problems/

Model weights: https://huggingface.co/mistralai/Leanstral-1.5-119B-A6B

Project: https://docs.mistral.ai/models/model-cards/leanstral-1-5

Technical Details: https://mistral.ai/news/leanstral-1-5/

u/ai-lover — 2 days ago

▲ 20 r/machinelearningnews+3 crossposts

Meet WebBrain: An Open-Source, Local-First AI Browser Agent That Reads Pages and Automates Tasks in Chrome and Firefox

WebBrain lives inside your browser and can run entirely on your own local model — no cloud, no account, no data leaving your machine.

Most "AI browser agents" are a chat box that pastes your page into someone else's server. That's not an agent that lives where you browse — and WebBrain draws a very clear line between the two.

It's an open-source (MIT), local-first browser agent for Chrome and Firefox. It runs inside your existing authenticated session, on a model you pick — so with llama.cpp or Ollama, nothing leaves your machine.

Here's what's actually interesting:

→ Two modes, cleanly separated. Ask reads the page (read-only, content scripts). Act clicks and types through the Chrome DevTools Protocol (chrome.debugger) — trusted input events that modern sites honor, reaching cross-origin iframes and shadow DOM.

→ UI-first by design. For anything that submits, sends, or buys, it drives the visible UI and refuses to hit REST/GraphQL endpoints directly. It starts read-only and asks before consequential actions.

→ Bring any model. llama.cpp, Ollama, LM Studio, vLLM — or OpenAI, Claude, Gemini, DeepSeek, Groq, OpenRouter. Recommended local: Qwen 3.6 35B (Qwen3.6-35B-A3B), which beat Gemma 4 on the project's screenshot benchmark.

→ Tuned for cost and privacy. Token-conscious screenshots, oldest-first context trimming, a dedicated vision model, 40+ tools (~20 in Compact mode). No telemetry. No accounts.

Full analysis: https://www.marktechpost.com/2026/07/02/meet-webbrain-an-open-source-local-first-ai-browser-agent-that-reads-pages-and-automates-tasks-in-chrome-and-firefox/

GitHub Repo: https://pxllnk.co/wdva98c

Chrome Extension: https://pxllnk.co/p4mn8

Firefox Add-on: https://pxllnk.co/m6k7c5w9

Portal: https://pxllnk.co/rlifl7h

u/ai-lover — 3 days ago

▲ 30 r/machinelearningnews+2 crossposts

MiCA is now part of Hugging Face PEFT

Glad to share that MiCA, short for Minor Component Adaptation, has now been merged into the HuggingFace PEFT library.

It is not yet included in the latest PyPI release, but you can already install it directly from PEFT main:

pip install --upgrade git+https://github.com/huggingface/peft.git@main

Then using MiCA is minimal:

from peft import LoraConfig, get_peft_model

config = LoraConfig(
    init_lora_weights="mica",
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    task_type="CAUSAL_LM",
)

model = get_peft_model(base_model, config)
model.print_trainable_parameters()

That’s it. MiCA is exposed through the existing LoRA interface via:

init_lora_weights="mica"

The idea behind MiCA is simple: instead of adapting along the dominant singular directions of a pretrained weight matrix, MiCA uses the minor singular subspace.

For a weight matrix:

W = U Σ Vᵀ

MiCA initializes:

B = U[:, -r:]
A = 0

So the adapter starts as a no-op, because B A = 0

The base model output is preserved exactly at initialization. During training, MiCA keeps B frozen and only trains A.

Why is this useful?

The intuition is that the major singular directions already encode much of the pre-trained model’s existing behavior. The minor directions are less used by the original model and may provide a more plastic subspace for injecting new knowledge.

In our experiments, MiCA showed in average over two experiments and three models:

about 90% higher knowledge uptake on average
about 20% less catastrophic forgetting
about 80% fewer trainable parameters compared with LoRA in the tested setup

See the paper for the full experimental details.

A practical rule of thumb:

If you have a LoRA setup that works well, try MiCA with:

r_mica ≈ r_lora / 2
learning_rate_mica ≈ 2 × learning_rate_lora

Because MiCA trains only one of the two LoRA matrices, you often need fewer parameters and can use a somewhat higher learning rate.

Best practice:

MiCA is mainly intended for continued pretraining / domain-adaptive pretraining.

A recommended workflow is:

Start from the base model, not the instruct/chat model.
Train the MiCA adapter on domain text.
Merge the adapter into the model.
Use the merged model as the adapted base for later instruction/chat tuning.

In many cases, merging or transferring the adapter into the corresponding instruct/chat model can work better; see the MiCA paper for details.

We tested MiCA primarily for continued pretraining and supervised fine-tuning. Early RL results look promising. Instruction fine-tuning alone was not the most useful setting in our experiments.

Huge thanks to Sebastian Raschka for the collaboration, and to the Hugging Face team (Lewis Tunstal and Benjamin Bossan) for review and integration.

Preprint: https://arxiv.org/abs/2604.01694

https://preview.redd.it/rbqi05lrb6ah1.png?width=1672&format=png&auto=webp&s=0f62e0f43b3926eb6ef0079fcd1fe4af38f1b831

reddit.com

u/Majestic-Explorer315 — 4 days ago

▲ 66 r/machinelearningnews+2 crossposts

I built a fully offline, private AI creative studio that runs on a cheap old 6GB GPU — should I open-source it?

Hey everyone,

Over the last weeks I've been building a 100% local, offline, private AI studio on my own PC — no cloud, no API keys, no subscriptions, nothing leaves the machine. It started as a personal project because I didn't want my data on someone else's servers, and it kind of grew into a full creative suite.

The thing I'm most happy about: it's all wrapped in one clean desktop app (single window, desktop icon). No ComfyUI node spaghetti, no terminal — my non-technical friends can actually use it. Under the hood it's all open-source tools glued together.

What it does right now (all offline):

Image generation — FLUX (GGUF) + several SD1.5 models, with a built-in prompt optimizer (a local LLM rewrites your casual/German text into a proper English prompt)
4K upscaling — 4x-UltraSharp + tiled Ultimate SD Upscale (real added detail, not just resize)
img2img reworking
Image → 3D model (TripoSR / Hunyuan3D) for .obj export
Face-swap (ReActor) and lip-sync / talking photos (LivePortrait) — fully offline
Local chat — Ollama (Qwen3.5, DeepSeek-R1, a vision model, etc.) behind an Open WebUI dashboard
Local coding agent — Aider + local models, with an auto test→repair loop and a little "auto-splitter" that breaks one big prompt into small steps so weaker local models don't choke
Code-RAG — Qdrant + embeddings for semantic search across my own projects
Context size auto-scales to whatever GPU is installed — zero manual tuning

The fun part — the hardware: most of this runs on a GTX 1060 6GB (yeah, an ancient Pascal card). It's slow, sure, but it works. I'm about to drop in an RTX 3060 12GB + 32GB RAM and add local video (LTX-2 / Wan 2.2), text-to-music, voice cloning (TTS), and local LoRA training.

Why I built it: I think people should be able to run this stuff for free, on their own hardware, with their data staying home. It's not trying to beat cloud models on raw quality — it's about ownership.

My question to you:

Is something like this worth open-sourcing on GitHub? Would anyone actually use a "one-click private AI studio" that bundles these tools behind a simple UI? If yes:

What would you want most (better docs, an installer, specific features)?
Any advice on license (MIT? GPL?) given it wraps a bunch of other open-source projects?
Would you rather have the launcher/UI as the open-source piece, since the underlying models/tools are already public?

Happy to share screenshots/a demo if there's interest. Not selling anything — just want to know if it's useful to more than just me. Cheers 🙏

Edit// Thank you to the community <3. You will find the project on ai.overlkd.com

u/SubjectNo2985 — 5 days ago

▲ 594 r/machinelearningnews+3 crossposts

Qwythos-9B-Claude-Mythos-5 Fine Tune with 1M Context has been released!

We have just released our Claude Mythos Fine Tune based on synthetic CoT generated from Fable-5 and Mythos-5 session logs.

You can find the model here: https://huggingface.co/empero-ai/Qwythos-9B-Claude-Mythos-5-1M

GGUFs are also available here:
https://huggingface.co/empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF

We also have some sample outputs here for you: https://huggingface.co/empero-ai/Qwythos-9B-Claude-Mythos-5-1M/blob/main/evals/sample_generations.md

We hope you can find some use in it! :)

u/EmperoAI — 7 days ago

▲ 9 r/machinelearningnews+2 crossposts

PostGreSQL MCP Server

Hi Everyone,

I am requesting feedback on MCPg : postgreSQL MCP server, and if possible, collaboration as well ! Please feel free to report issues on git itself or feature requests

u/Professional-Clerk30 — 4 days ago

▲ 6 r/machinelearningnews

I mapped the "Dynamic Grammar" of LLMs: How hidden states move, stabilize, and decide

Hi everyone,

I’m an independent researcher (no lab affiliation) who has spent the last year diving deep into the internal dynamics of Transformers. Instead of looking at outputs or attention heads, I’ve been tracking the geometric trajectories of hidden states layer-by-layer during inference.

I wanted to share my latest findings (preprints linked below) because they reveal a structured "dynamic grammar" that seems universal across architectures, from GPT-2 to Llama-3.2.

The Core Idea

Most observability tools treat LLMs as static input-output machines. I treat them as dynamic systems. By measuring metrics like trajectory curvature (ct_t), functional capacity, and state transitions, I found that LLMs don’t just "generate text"—they navigate a latent space through specific, reproducible phases.

Key Findings (V20–V24)

A Universal Dynamic Grammar (V24)

Across 7 models (GPT-2, OPT, Qwen, TinyLlama, Phi-1.5, Llama-3.2, DistilGPT2), I observed a conserved sequence of internal states:

B (Branching/Hesitation): Initial exploration.

A (Adaptive/Stable): The main processing phase (an attractor state).

D (Decision/Bifurcation): Final commitment to a token.

Result: B → A → D appears to be the "standard cognitive path" for coherent generation. Deviations from this path often correlate with errors or hallucinations.

Geometry > Neurons (V22)

Using orthogonal rotation controls, I proved that functional information (syntax, decision, stabilization) is encoded in the relative geometry of the representation space, not in individual neurons. If you rotate the latent space, the information remains decodable. This suggests LLMs think in shapes, not just activations.

Ambiguity Changes the Path, Not the Chaos (V23)

When prompts are ambiguous, models don’t necessarily become "chaotic." Instead, they delay commitment. They spend more time in the exploration phase (B) and less time rushing to decision (D). Phi-1.5, interestingly, shows a unique oscillating pattern (B↔A) during reasoning tasks, distinct from the smoother convergence of other models.

Architecture Matters More Than Size (V20)

Models cluster by their dynamic signatures (e.g., GD_ratio), not just parameter count. Small models like Qwen-0.5B show distinct stability regimes compared to GPT-2, despite similar sizes.

The Preprints (Open Access)

[June 2026] A Runtime Trajectory Dynamics Framework (V20): Introduces the 5-state taxonomy (Stable, Turbulence, Branching, Bifurcation, Committed) and the bicephalic operator.

Link: https://doi.org/10.5281/zenodo.20602685

[May 2026] Dynamic-Layer Controllability (V21): Shows how perturbations affect recovery and proves that emergent organization dominates architectural skeleton.

Link: https://doi.org/10.5281/zenodo.20400171

[May 2026] Conditional Dynamic Signatures (V22): Audits normalization effects and variance decomposition. Explicitly documents falsified claims.

Link: https://doi.org/10.5281/zenodo.20361289

[May 2026] Four Dynamical Regimes (V19/V20): Introduces ct_t (curvature × displacement) as a predictor of collapse and instability.

Link: https://doi.org/10.5281/zenodo.20348878

Why I’m Posting This

I’m not selling a product. I’m building an open framework (LIMEN) to make LLM internals auditable and controllable. I believe that if we want safe AI, we need to monitor its "vital signs" (dynamic stability) in real-time, not just its output.

I’d love feedback from the community, especially on:

Have you seen similar "universal motifs" in larger models (>7B)?

Critiques on the methodology (normalization, probe training).

Ideas for causal interventions based on these dynamic states.

reddit.com

u/Turbulent-Metal-9491 — 5 days ago

▲ 216 r/machinelearningnews+1 crossposts

Open-source models are under threat.

Anthropic is fine with open source AI as long as it’s not good enough to threaten their monopoly.

https://x.com/i/status/2070798718027141253

reddit.com

u/TheVault5 — 8 days ago

▲ 15 r/machinelearningnews+1 crossposts

Google AI Introduces TabFM: A Hybrid-Attention Tabular Foundation Model for Zero-Shot Classification and Regression

Most tabular ML in production is still XGBoost plus hours of hyperparameter tuning and feature engineering. That's not a foundation-model workflow — and Google Research just brought the zero-shot idea to tables.

They introduced TabFM — a foundation model for tabular classification and regression that reads your entire dataset as a single prompt and predicts in one forward pass, with no per-dataset training, tuning, or feature engineering anywhere in the loop.

Here's what's actually interesting:

→ In-context learning, not fine-tuning: training rows and test rows go in as one context, and the model learns the task at inference time

→ Hybrid attention: alternating row/column attention (TabPFN-style) → row compression into a dense vector → in-context learning over compressed rows (TabICL-style)

→ Trained entirely on hundreds of millions of synthetic datasets generated by structural causal models — no proprietary tables required

→ TabArena (38 classification + 13 regression datasets, 700–150,000 samples): Google reports it consistently outperforms heavily tuned supervised baselines

Full analysis: https://www.marktechpost.com/2026/07/01/google-ai-introduces-tabfm-a-hybrid-attention-tabular-foundation-model-for-zero-shot-classification-and-regression/

Technical Details: https://research.google/blog/introducing-tabfm-a-zero-shot-foundation-model-for-tabular-data/

Repo: https://github.com/google-research/tabfm

https://preview.redd.it/eam5uqurqkah1.png?width=2026&format=png&auto=webp&s=aa79748af4ea0353ec930d645c5f91a2963c0939

reddit.com

u/ai-lover — 5 days ago

▲ 19 r/machinelearningnews+1 crossposts

NVIDIA Releases Nemotron-Labs-TwoTower: an Open-Weight Diffusion Language Model Built on a Frozen Autoregressive Nemotron-3-Nano-30B-A3B Backbone

Most diffusion language models make one network do two jobs at once — represent the clean context and denoise the noisy tokens. Those two goals pull the same weights in different directions. NVIDIA just split them apart.

They released Nemotron-Labs-TwoTower — a block-wise autoregressive diffusion model built on the Nemotron-3-Nano-30B-A3B hybrid Mamba-2/attention/MoE backbone. It runs two towers: a frozen autoregressive context tower that processes clean tokens causally, and a trainable diffusion denoiser tower that refines noisy blocks via cross-attention to that context. Only the denoiser is trained — on ~2.1T tokens, a fraction of the backbone's 25T.

Here's what's actually interesting:

→ Two towers, not one: a frozen AR context tower and a trainable diffusion denoiser, connected layer-by-layer — denoiser layer i attends to context layer i, not just the last hidden state

→ 98.7% of the autoregressive baseline's quality at 2.42× generation throughput (γ=0.8, block size 16, 2×H100)

→ It commits multiple tokens per denoising step early in decoding — that's where the one-token-per-step AR bottleneck breaks

→ One checkpoint, three decoding modes: mask diffusion, mock-AR, and standard AR

→ Ablations: causal Mamba beats bidirectional Mamba, and tying the two towers under a joint loss is substantially worse

Full analysis: https://www.marktechpost.com/2026/07/01/nvidia-releases-nemotron-labs-twotower/

Paper: https://arxiv.org/pdf/2606.26493

Weights: https://huggingface.co/collections/nvidia/nemotron-labs-twotower

https://reddit.com/link/1ukfnsq/video/t43wdu4gukah1/player

reddit.com

u/ai-lover — 5 days ago

▲ 9 r/machinelearningnews

Local LLM Long-Context problems

We could finally have a 'light at the end of the tunnel'. It looks like we have a workaround for long context on our local machines. The keyword is RIS-Kernel. I would really like to hear your opinions on it. They said it was tested on several subjects, and it worked just fine for all of them. In my opinion, if it is really true, it would be a waste that such a solution is not broadly known by the machine learning community.

reddit.com

u/minerinvocal — 7 days ago

▲ 13 r/machinelearningnews+9 crossposts

Open handoff: Thought Tree, a markup/spec idea for modular LLM workflows

I’m releasing an open handoff draft of a framework I’ve been developing called the Thought Tree AI Framework.

At its core, the framework uses a simple pattern:

Data Units → Operations → Data Units

A Thought Tree program applies this recursively. Complex cognitive work is decomposed into named artefacts, transformations, contracts, modules and traces.

It came out of experiments with Auto-GPT-style agents, creative production pipelines and the need to separate what LLMs are good at from what deterministic code should handle.

I don’t currently have time to continue developing it properly, so I’m releasing it as an open handoff for anyone who wants to critique, fork, implement or reinterpret it.

The repo includes:

- a concise README;

- one-page summary;

- draft TTML schema;

- minimal example workflow;

- roadmap;

- original long-form explainer.

I’m especially interested in whether people see value in Thought Tree as:

- an intermediate representation for LLM workflows;

- a design vocabulary for structured AI production;

- a small open-source executor;

- or something that could map onto LangGraph / LlamaIndex / other orchestration tools.

Repo: https://github.com/RobertBateman/thoughttree-framework

Feedback, criticism, forks and maintainers welcome.

u/xavier1764 — 6 days ago

▲ 0 r/machinelearningnews

One internship vacancy for software domain in my group for 3 months only for girls

reddit.com

u/Other-Funny6369 — 6 days ago

▲ 17 r/machinelearningnews+1 crossposts

🌍 OlmoEarth v1.2 switches to RoPE for cleaner satellite-image embeddings

Today we're releasing OlmoEarth v1.2, the latest in our family of open foundation models for Earth observation. 🌍

OlmoEarth processes satellite images into tiles (patches), representing each as an embedding the model uses for downstream tasks—a numerical representation. Earlier versions tagged patches with a fixed position signal that surfaced as unwanted artifacts in those embeddings.

V1.2 switches to rotary positional embeddings (RoPE), which reduces artifacts in the embeddings & gives a small performance boost. Instead of adding a position signal to each patch, it rotates the vectors the model compares in attention by angles defined by each patch's position. The result is cleaner embeddings and better performance on downstream tasks: across all model sizes, we see consistent improvement on our kNN/linear-probe evals.

This update came directly from partners asking for cleaner embeddings. OlmoEarth v1.2 comes in Nano, Tiny, Small, & Base—all open source + available now.

🤗 Models: https://huggingface.co/collections/allenai/olmoearth
💻 Training & fine-tuning code: https://github.com/allenai/olmoearth_pretrain
📄 Tech report: https://allenai.org/papers/olmoearth-v1-2

u/ai2_official — 6 days ago

▲ 15 r/machinelearningnews+11 crossposts

Mistikguard – Lightweight Python library for memory integrity in LLM applications

## What My Project Does

Mistikguard is a small Python library designed to reduce memory fabrication in LLM-based applications. It provides:

- Provenance tracking for facts (`confirmed` vs `inferred`)

- A write gate that blocks contradictions of confirmed facts and self-narration

- Support for correction tombstones, so once a user corrects something, it is not silently reintroduced

- An optional grounding audit that detects memory claims in responses and validates them against stored memory

The core functionality works with almost zero external dependencies.

## Target Audience

This library is intended for **Python developers** who are building applications with long-term memory using LLMs. This includes:

- People building AI companions

- Developers creating autonomous agents

- Anyone working on RAG or memory-heavy LLM systems

It is a **library**, not a full application. It is meant to be integrated into other projects. It is currently in an early stage (v0.1) and is more suitable for personal projects and experimentation than large production systems without additional safeguards.

## Comparison

Unlike most memory systems that blindly store model output, Mistikguard actively tries to protect memory integrity by:

- Distinguishing between user-stated facts and model-generated inferences

- Preventing certain types of invalid writes through a deterministic gate

- Making user corrections more persistent using tombstones

It is lighter and more focused than full agent frameworks (such as LangChain or LlamaIndex memory modules) while being more structured than simple in-memory dictionaries or basic vector stores.

GitHub: https://github.com/obscuraknight/mistikguard

u/MistikAII — 8 days ago

▲ 7 r/machinelearningnews+6 crossposts

Avatar artificial living organism

Beyond Transformers: Why Artificial Life Needs Physics, Not Just Data

The current era of artificial intelligence is entirely dominated by static pattern recognition. We have built massive, highly capable models that can predict the next token with astonishing accuracy. But for all their complexity, these models are frozen in time. They lack temporal continuity, they lack physical grounding, and most importantly, they lack life.

If our goal is to build truly autonomous digital organisms, we cannot rely solely on the discrete, feed-forward nature of standard transformer architectures. We need systems that experience continuous time, manage internal energy states, and adapt dynamically to their environments.

This is the exact problem I set out to solve with Avatar, an open-source Artificial Life framework designed from the ground up to integrate theoretical physics with machine learning.

The Illusion of Life in Modern AI

Most AI agents today operate on discrete timesteps. They are fundamentally reactive: an input is provided, a computation is performed, and an output is generated.

Biological life does not operate this way. A living organism is a continuous, self-maintaining system (an autopoietic system). It possesses internal states—hunger, fatigue, curiosity—that continuously evolve over time, driving embodied learning and behavior even when there is no external prompt. To replicate this digitally, we need a fundamentally different mathematical foundation.

Enter the Avatar Architecture

Avatar shifts the paradigm from "data processing" to "embodied simulation" by relying on two major architectural pillars:

1. Continuous-Time Dynamics via Hamiltonian Neural ODEs

Instead of updating discrete neural network layers, Avatar models the organism's internal states using Ordinary Differential Equations (ODEs). Specifically, by structuring these equations around Hamiltonian mechanics (\mathcal{H}), the system inherently respects physical principles like energy conservation.

This means the organism doesn't just "decide" to move; its movement is a continuous mathematical evolution governed by its internal energy constraints. If the agent runs out of energy (fatigue), the Hamiltonian dynamics naturally dictate a change in its behavioral trajectory to seek sustenance.

2. Cognitive Topology via MERA Tensor Networks

To handle the complex, hierarchical nature of sensory processing and decision-making, Avatar utilizes Multi-scale Entanglement Renormalization Ansatz (MERA) tensor networks. Originally developed in quantum many-body physics to manage complex correlations, MERA provides a highly efficient way to structure cognitive tiers.

Instead of a flat neural network, the organism's brain processes sensory flux through a dimensional hierarchy. Lower tiers handle immediate, high-frequency sensory inputs, while higher tiers abstract this data into long-term behavioral goals.

Why Build This?

Building Avatar has been an exercise in pushing the boundaries of what is possible when we stop treating AI as a software product and start treating it as a synthetic biological complex. It is a proof-of-concept that artificial life can, and should, be mathematically grounded in the physics of the natural world.

As I finalize the avalanche power law metrics and prepare the late-breaking abstract for the upcoming ALife 2026 conference in Waterloo, I am opening the core repository for community review and collaboration.

If you are a researcher, physicist, or developer interested in emergent systems, autopoietic design, or continuous-time neural networks, I invite you to explore the codebase and run the simulations yourself.

Explore the Repository here: https://github.com/linga009/Avatar

u/linga009 — 9 days ago

▲ 16 r/machinelearningnews

Liquid AI Ships LFM2.5-230M with llama.cpp, MLX, vLLM, SGLang, and ONNX Support for On-Device Inference

Most "edge AI" is a big cloud model, quantized down and hoped for the best. A 230M model designed to run the agent loop on the phone itself is a different thing — and Liquid AI just shipped one.

They released LFM2.5-230M — their smallest model yet. It's a 230M-parameter, open-weight model on the LFM2 architecture (8 double-gated LIV convolution blocks + 6 GQA layers), pre-trained on 19T tokens, then post-trained by distilling from the larger LFM2.5-350M.

Here's what's actually interesting:

→ 213 tok/s decode on a Galaxy S25 Ultra CPU, 42 tok/s on a Raspberry Pi 5 — at a 293–375 MB memory footprint (4-bit)

→ Beats Qwen3.5-0.8B and Gemma 3 1B IT, both larger, on instruction following — IFEval 71.71 vs 59.94 vs 63.49

→ Tool use holds up: BFCLv4 21.03, ahead of Qwen3.5-0.8B's 18.70

→ Runs a Unitree G1 humanoid on-device on a Jetson Orin, turning one instruction into a sequence of tool calls via NVIDIA's SONIC framework

Full analysis: https://www.marktechpost.com/2026/06/27/liquid-ai-ships-lfm2-5-230m-with-llama-cpp-mlx-vllm-sglang-and-onnx-support-for-on-device-inference/

Model on HF: https://huggingface.co/LiquidAI/LFM2.5-230M

Docs: https://docs.liquid.ai/lfm/models/complete-library

Technical details: https://www.liquid.ai/blog/lfm2-5-230m

u/ai-lover — 8 days ago

r/machinelearningnews

Who've told you that distributed training is impossible? Democratizing AI: The Psyche Network Architecture

NVIDIA HORIZON: A Hands-Free Agent that Evolves Git Worktrees and Hits 100% RTL Benchmark Completion

NVIDIA AI Introduces ASPIRE: A Self-Improving Robotics Framework Reaching 31% Zero-Shot on LIBERO-Pro Long Tasks

Mistral AI Releases Leanstral 1.5: An Apache-2.0 Lean 4 Code Agent Model Solving 587 of 672 PutnamBench Problems

Meet WebBrain: An Open-Source, Local-First AI Browser Agent That Reads Pages and Automates Tasks in Chrome and Firefox

MiCA is now part of Hugging Face PEFT

I built a fully offline, private AI creative studio that runs on a cheap old 6GB GPU — should I open-source it?

Qwythos-9B-Claude-Mythos-5 Fine Tune with 1M Context has been released!

PostGreSQL MCP Server

I mapped the "Dynamic Grammar" of LLMs: How hidden states move, stabilize, and decide

Open-source models are under threat.

Google AI Introduces TabFM: A Hybrid-Attention Tabular Foundation Model for Zero-Shot Classification and Regression

NVIDIA Releases Nemotron-Labs-TwoTower: an Open-Weight Diffusion Language Model Built on a Frozen Autoregressive Nemotron-3-Nano-30B-A3B Backbone

Local LLM Long-Context problems

Open handoff: Thought Tree, a markup/spec idea for modular LLM workflows

One internship vacancy for software domain in my group for 3 months only for girls

🌍 OlmoEarth v1.2 switches to RoPE for cleaner satellite-image embeddings

Mistikguard – Lightweight Python library for memory integrity in LLM applications

Avatar artificial living organism

​Beyond Transformers: Why Artificial Life Needs Physics, Not Just Data

​The Illusion of Life in Modern AI

​Enter the Avatar Architecture

​Why Build This?

Liquid AI Ships LFM2.5-230M with llama.cpp, MLX, vLLM, SGLang, and ONNX Support for On-Device Inference

Beyond Transformers: Why Artificial Life Needs Physics, Not Just Data

The Illusion of Life in Modern AI

Enter the Avatar Architecture

Why Build This?