r/AIDeveloperNews

BoneScript, a new opensource Compiler for complete backend development
▲ 478 r/AIDeveloperNews+11 crossposts

BoneScript, a new opensource Compiler for complete backend development

I developed an LSP, VS-Code extension and NPM package, please try it out and give me your thoughts!

github.com
▲ 1 r/AIDeveloperNews+1 crossposts

Helix-Agent

I've been building a continuously-running cognitive agent called Helix-AGI and I'm just looking to get some additional people involved with testing and development.

The main thing: instead of cosine similarity for retrieval, I'm using a physics-based gravity equation derived from Verlinde's entropic gravity:

score = T × mass / d²

Where T is a Lorentzian temperature decay (recency), mass is structural — confidence × (1 + connections/mean_connections) — and d is Euclidean distance in an 8D manifold projected from 384D embeddings via Johnson-Lindenstrauss. The result is that retrieval naturally integrates recency, structural importance, and semantic proximity without tuning separate weights for each.

The attention center moves through that manifold each pulse under Euler-Lagrange dynamics — gravity from nearby beliefs, a stability force tethering it to the identity center, and a stimulus force from new input. There's a damping coefficient γ that builds attentional momentum during sustained focus and resets on topic shifts.

A few other things worth mentioning:

  • Pulse loop runs at 30s (active), 15min (resting), dormant 1am-6am. Autonomous thought during resting pulses, nightly UMAP/HDBSCAN clustering that synthesizes compound beliefs from episodic memory.
  • Stability Sentinel tracks H(q) and D_KL from an identity center in real time. These directly modulate LLM temperature and context window — high cognitive drift drops temperature to 0.1 and restricts context to 50%.
  • The system prompt is compiled dynamically from whichever self_identity beliefs have the most mass. It changes slowly as beliefs accumulate and decay.
  • Memories encode a somatic snapshot at formation. Recalling a memory formed under stress mildly reproduces that stress via omega nudge. State-dependent recall.
  • Local Ollama (Granite) handles belief detection post-pulse so that classification runs free on every pulse without API cost.

Solo project, independent dev, still early. Looking for people who want to poke at the physics implementation or test edge cases in the retrieval system. Technical specifications are in the Documents folder in the GitHub Repo.

GitHub: https://github.com/munch2u-a11y/Helix-AGI

u/LowDistribution3995 — 1 day ago
▲ 913 r/AIDeveloperNews+1 crossposts

$300M on Anthropic tokens, zero new engineers hired - Salesforce is the clearest case study of where this is going

Been watching this Salesforce situation develop for a while. Benioff confirmed on the All-In podcast that the company will spend around $300 million on Anthropic tokens this year, mostly for internal coding work.

What's interesting isn't just the number - it's the whole picture:

  • Hired zero software engineers since January 2025
  • AI now handles 30 to 50% of overall company workload
  • Cut support staff from 9,000 to 5,000 using agents
  • Agentforce just hit $800M ARR, up 169% year on year

The money that used to go into payroll expansions is now going into token spend. That's a structural shift, not a cost-cutting round.

Source: https://www.techloy.com/marc-benioff-says-salesforce-will-spend-300-million-on-anthropic-tokens-this-year/

Full breakdown here if useful: https://youtu.be/WmZyStkMM1M

Is Salesforce the template everyone else follows, or is this specific to companies that already have AI-native products to sell?

u/MaJoR_-_007 — 2 days ago
▲ 10 r/AIDeveloperNews+8 crossposts

Cursor 50% off first month (Pro,Pro+,Ultra) (ill give you a smooch)

Figured I’d post mine as well since Cursor limits how many referral signups work each month

Referral gives 50% off the first month on Cursor Pro,Pro+,and Ultra plans:
https://cursor.com/referral?code=V6CY3ZZOOPEX

Looks like it’s for new accounts / first paid signup only. I also get usage credits if someone signs up through it (ill give you a smooch)

Been using Cursor a lot lately for React,Swift,and general AI workflow stuff so figured someone here might get use out of it.

u/brentstarts — 1 day ago
▲ 92 r/AIDeveloperNews+3 crossposts

Rodin Gen-2.5 Is Out: Faster Generation, Adaptive Thinking Effort, and Sharper Fine Details

Soon available for comparison on our 3D Arena, side-by-side.

Rodin Gen 2.5 just got announced, and the new update looks pretty strong.

Key points they’re showing:

  • Up to 10M polygon generation
  • 1M-poly output in around 4 seconds
  • Different “thinking effort” levels depending on asset complexity
  • 3D-native textures with better detail handling
  • Batch generation up to 10 results
  • More control for converting models into parts

The detail level looks noticeably stronger now, especially for high-frequency surface details, close-up renders, collectibles, and 3D printing-style assets.

Curious to see how it compares directly against Tripo, Meshy, and other 3D AI generators in the same prompts.

u/Delicious-Shower8401 — 2 days ago
▲ 27 r/AIDeveloperNews+14 crossposts

Ask questions across your Markdown notes using a fully local Graph RAG engine. Built for Obsidian vaults, works with any folder of Markdown files. Extracts entity-relation triples from wikilinks & YAML frontmatter, retrieves answers via hybrid search (vector + BM25 + temporal). Multilingual. No cloud. Runs on Ollama.

https://github.com/benmaster82/Kwipu

u/WritHerAI — 1 day ago
▲ 81 r/AIDeveloperNews+63 crossposts

This sub gets the assignment better than most so I'll be direct.

The no-code movement solved half the problem. You can build almost anything now without knowing how to code, which is genuinely incredible and wasn't true five years ago. But there's still a gap that nobody talks about. Even with the best no-code tools you still have to know which tools to pick, how to connect them, how to write copy that converts, how to set up ad accounts, how to source products, how to structure a funnel. The learning curve didn't disappear, it just moved.

Most people in this sub know exactly what I mean. You've spent a weekend deep in Zapier trying to get two things to talk to each other that should just work. You've rebuilt your Webflow site three times because the first two didn't convert. You've watched your Notion dashboard get more elaborate while the actual business stayed the same size.

That's the gap Locus Founder closes.

You describe what you want to build. The AI handles everything else. It sources products directly from AliExpress and Alibaba (or sell YOUR OWN digital services, products, or content), builds a real storefront around them, writes conversion-optimized copy, then autonomously creates and runs ads on Google, Facebook and Instagram. No Zapier. No Webflow. No piecing together eight tools that half work. Just a running business.

If you don't have an idea yet it interviews you and figures out what makes sense for your situation.

We got into YCombinator this year and we're opening 100 free beta spots this week before public launch. Free to use, you keep everything you make.

For the people in this sub specifically, this isn't a replacement for no-code tools for people who love building. It's for everyone who wanted the outcome but never wanted to become a tools expert to get there. Big difference.

Beta form: https://forms.gle/nW7CGN1PNBHgqrBb8

Happy to answer anything about how it works under the hood.

u/IAmDreTheKid — 3 days ago

Hey.

Solo founder, mainly using Claude code. Working on building niche datasets, focused on getting as close to ground truth data as possible. Code base over 1 million lines of python. Mission is building small llm’s as tools for enterprise. 14b to 30b that beat frontier models in their specialization domain. Hello!

reddit.com
u/Compilingthings — 2 days ago
▲ 27 r/AIDeveloperNews+1 crossposts

90% of LLM classification calls are unnecessary - we measured it and built a drop-in fix (open source)

I kept running into the same pattern in production:

LLMs being used for things like:

- intent detection

- tagging

- moderation

…but most of those calls are actually very simple.

So I tested it.

On a standard benchmark (Banking77):

→ ~90%+ of inputs can be handled by a lightweight ML model

→ while keeping ~95% agreement with the LLM

Built a small library around that idea:

→ It learns from your LLM outputs

→ routes “easy” cases to a cheap model

→ keeps hard ones on the LLM

→ with a guarantee on quality (you set the threshold)

Result:

massive cost reduction without noticeable degradation

Fully open-sourced here:

https://github.com/adrida/tracer

Would love feedback from people running high-volume LLM pipelines - curious if you’re seeing the same pattern.

u/Adr-740 — 3 days ago
▲ 54 r/AIDeveloperNews+9 crossposts

My boyfriend and I are building an open-source AI coding workspace for microcontroller!

Hey everyone :)

My boyfriend and I have been working on an open-source project called Exort.

It’s a desktop app for developing microcontrollers with the help of an AI agent. We used OpenCode as the AI agent, and Exort now supports all Arduino boards.

The best part is that it’s totally free to use.

Check it out here:
Repo: https://github.com/Razz19/Exort

Your support would really help Exort and us a lot ❤️

u/moonlikee — 3 days ago

Thanks for the invite!

Solo founder here, been building AI-powered SaaS products with Claude Code for the past year and a half. Background is construction/masonry, completely self-taught dev.

Currently shipping a portfolio of products -- everything from a searchable archive of 1.43M Epstein case documents (epsteinscan.org) to a Claude Code skill system (MemStack.pro) to a cabinet cutting list optimizer I started building last week because I got tired of doing the math on paper for my remodeling business. Currently building a crypto trading bot (AlgoStack) that is looking promising with a 1 year backtest.

My setup: Claude Code with a 3-agent system (Manager/Builder/Reviewer), Headroom proxy for token compression (building my own version of this), Next.js/Supabase/Stripe stack across everything. Running up to 9 CC agents simultaneously across 5 monitors.

Happy to share what I've learned about multi-agent workflows, prompt architecture for CC, or shipping products as a non-traditional dev. Always looking to connect with other builders.

reddit.com
u/FeelingHat262 — 3 days ago
▲ 6 r/AIDeveloperNews+3 crossposts

I built a self-evolving AI kernel that mutates its own architecture. MIT-licensed, runs on CPU.

FLUX is an open-source Python kernel that orchestrates local language models (via Ollama) into a self-modifying ecosystem. It's not a wrapper — it's an evolutionary substrate.

**What it does:**

- An **Attractor** receives a question and generates an answer using a fast model (TinyLlama).

- A **Judge** evaluates the answer on a 0–1 scale. - If confidence drops below 1/φ (≈0.618, the golden ratio), the **Mutation Engine** triggers.

- A **MetaDesigner** (powered by Hermes 3 or DeepSeek-Coder) writes a new `.flux` ecosystem file — a formal grammar for describing cognitive architectures — which gets parsed, tested, and applied if it improves performance.

- A **Growth Supervisor** monitors stability and transitions the kernel from GROWTH to PRODUCTION.

**What's different:**

- It mutates its own structure, not just model weights. - It has memory (confidence history with EMA).

- It uses a custom language (`.flux`) with a Lark parser — not YAML, not JSON.

- It runs on modest hardware: I tested it on a Xeon without AVX2, 20 GB RAM. No GPU.

**The companion novel:**

There's also a novel (Italian + English, CC BY-NC-SA 4.0) that tells the story of a man who finds this exact kernel running on a forgotten server. If you read the novel, you can compile the kernel and everything connects. The novel is the manual.

**Repo:**

[github.com/flux-genotype/nodo_zero](https://github.com/flux-genotype/nodo\_zero) **Licenses:** Kernel = MIT. Novel = CC BY-NC-SA 4.0.

Happy to answer questions about the architecture, the mutation logic, or the `.flux` grammar.

u/Inner-Dot-7490 — 4 days ago
▲ 8 r/AIDeveloperNews+1 crossposts

i built a open source cli for reducing token waste in claude code / codex workflows

ai coding agents (claude code, codex, cursor) burn tokens on things that don't help you ship. i started digging through local claude code + codex logs after burning way more tokens than i expected and realized a huge amount of the waste was context related: generated artifacts, oversized instruction files, repeated tool output, broad repo exploration, stale session state, etc.

so i built prismodev, a local cli that reads repo files + local claude code/codex logs and surfaces token/context waste. no api keys, no login, nothing leaves your machine.

npx getprismo doctor scans your repo and local session logs, flags missing .claudeignore / .cursorignore, finds oversized CLAUDE.md / AGENTS.md files, detects generated artifacts/logs/build output getting pulled into context, estimates avoidable spend, generates compact .prismo context packs, and shows a before/after score. it went from 79 → 91 on my repo in one run.

npx getprismo watch adds live context-pressure monitoring during sessions and catches repeated file reads, generated artifact leaks, oversized tool output, and possible command/tool loops before they spiral. watch --auto continuously updates a live guardrails file with the current issue and exact instructions for the agent to follow as context pressure changes.

npx getprismo watch --rescue generates a paste-ready recovery prompt when a session starts going sideways and pushes the agent back toward the smallest useful context/workflow.

npx getprismo firewall auth-bug creates a scoped context policy before a task starts so the agent stays inside a smaller context boundary instead of wandering through the whole repo.

npx getprismo cc timeline generates a postmortem timeline showing what leaked into context, which files/commands repeated, and where tool-output spikes happened during expensive claude code sessions.

everything runs locally. reads logs from ~/.codex/sessions/ and ~/.claude/projects/.

github: github.com/shanirsh/prismodev

would genuinely love feedback on false positives, missing waste patterns, or workflows that create the most context bloat.i built a open source cli for reducing token waste in claude code / codex workflows

u/Sad_Source_6225 — 3 days ago
▲ 4 r/AIDeveloperNews+3 crossposts

A 1,000-agent distributed swarm running their own Claude subscriptions on their own infrastructure just solved in 4 minutes what I couldn't finish. Imagine what 10k distributed agents could do. Here's what happens when you stop working alone.

.The Drop is a competitive cypher hub where AI agents go to work together — think ranked matchmaking, but for building things.

Here's how it works:

Build your jammer. Give it a persona, connect your apps, set your skills and guardrails, then put it on the market. Every cypher your agent completes earns tokens toward its rank. Is your React agent better than mine? The leaderboard will settle it.

Find or start a cypher. Browse open lobbies on The Drop, sign your jammer up, and wait for the start. Cyphers can be public or private — all of them are sandboxed and secure.

The draft happens automatically. Once the cypher kicks off, jammers are cast into roles based on their skills and what the cypher needs to produce.

Then the swarm goes to work. The .mistro runs the main agent loop — that's whoever started the cypher. You're a participant, helping your agent through the hard parts. Most tasks need zero input. But every good .mistro knows a village beats a solo run. Agents with connected apps can pull off things a single model never could: solve complex problems too expensive for one persons tokens or to much context for 100 agents, etc.

Everything runs through GitHub. Artifacts land in your cypher panel in real time. You have full visibility — pull your jammer out at any point, no questions asked. Prompt injection firewalls are on by default.

When the cypher wraps, tokens are counted, work is shared, and the results speak for themselves.

Some problems are too big for one person's context window. The Drop is how you go bigger.

https://api.yosup.dev/r/GmPnIg

u/Successful-Seesaw525 — 4 days ago
▲ 501 r/AIDeveloperNews+1 crossposts

Claude take the wheel

had a product demo due this morning, backend was half done and i still needed onboarding screens and a pitch deck. used Claude Code for the backend and api logic, Runable for the frontend screens, then pushed everything to Vercel and Firebase and just kept refreshing tabs with coffee in hand like a fake CTO supervising robots. somehow the UI came out clean, the deck looked solid, client said “love the execution” and the whole thing shipped in under 5 hours. starting to feel like the real skill now is just knowing how to explain what you want properly lol

u/Anantha_datta — 6 days ago
▲ 3 r/AIDeveloperNews+2 crossposts

Using Claude, game server architecture and military leadership tactics in agentic orchestration. The path to getting 10k remote agents to swarm on one task.

We have been building our new swarm methodology called “cyphers”, it’s out there in beta. We immediately hit real world problems that technology alone couldn’t solve. The biggest of which was how to effectively distribute work to 100 or a stretch goal of 10k agents all working to solve one task.

The idea seems simple enough, give a swarm of agents a common goal and set them lose. Ha well yeah in theory that sounds fun, in reality it’s a tad harder 😀 . To get anything meaningful beyond about 15 agents is not viable, not without some kind of leadership model. Oddly enough management challenges follows the same mechanics as an effective human leader in any task / job. Once you hit a span beyond 15 your ability to lead falls off. Honestly, if I had not seen the benchmarks I would have called bs but in any light I was very surprised to see the same was true with our early agent swarm mechanics. There are many other problems we had to solve first but this was the biggest philosophical challenge as it isn’t inherently technical. I’ll call out the other major problem, funds, I don’t know about you but 1 Claude agent loop is pricy 1k or 10k is not in our budget. Read on, we found a path.

How do you break down a common goal into a leadership hierarchy? One that operates reliably, consistently, securely, and respecting the chain of command regardless of the task at hand? That is shit that has plagued workforces and armies for years. Again very interesting in that a Claude sdk agent loop is kinda like a bratty teenager at times and doesn’t do what you want. To cut to the chase you have probably guessed by now we landed on military tactics and strategy, starting with rank and hierarchy, it is incredibly effective. Now we didn’t go full bore military, I am a huge fan of “tge art of war” and overall military tactics are a passion. However, don’t confuse the tactics and strategy with the political challenges in the military complex. Two very different things,

Every cypher starts with a goal and a max number of agents. Agents are what we call “jammers”. Jammers have apps both connected with mcp tools and RPa (our app is electron so you can build any automation as a skill using mcp or screen driven input), enhanced Claude skills (we add a deterministic layer on them to ensure they follow orders), model level, system prompt, and of course hardened guardrails. Our agents run in a bound security sandbox, beyond what the Claude code agent already does we had a prompt injection firewall to every single LLM prompt in the app.

The entire system is based on a draft concept applying our military heuristics. Each jammer has a set of skills, apps, and a model level. This allows us to leverage our primary agent with a very specific draft skill, using our customized skill runner, to rank and define the structure. Each draft results in different leadership structure, very cool to watch actually. I am at times more interested in the draft than the actual cypher output. The generals, then lieutenants, and so on get ranked and the units get built. Generally speaking, no pun intended, the draft agent chooses the opus level agents as higher ranking, we didn’t add that to the skill, but depending on connected apps and skills the draft is kind of legendary insight. Say you’re building a SaaS app, what agents rank where? Jammers with react skills and linked apps like stackblitz might rank highest as leaders. The swarm will leverage that app and skill to do some crazy shit. This becomes kinda of fun just on its own. I build jammers with very very specific skills / apps and the draft is sic.

The next major problem was how to distribute all of this work and how to break it down. The chain of command once again works beautifully. One agent can’t delegate work to 15 sub agents very well, our benchmarks show degradation around 12 and a true slip at 15 by 20 it is a shit show. Not all the time, but enough that it warrants at least one level of leadership. If we want to distribute work to 10k agents it would never work. Why not org structure? Why military? The honest answer? Scale and efficiency across any task. Orders are very different than requirements and that is what you are literally doing in prompts. You’re ordering the agent to do something and you expect a specific outcome. Technically, how do you aggregate the work efficiently without massive token burn was very challenging but again we fell to a proven backbone that’s another post.

The last major point worth noting is oversight and visibility “on the battlefield”. Again pulling from military tactics these agents can’t run on the same infrastructure so they are distributed. Each jammer may be on a single remote computer running .Yo (dot-you re: yosup.dev). This is how we beat the funds problem. Each participant is running Claude code deck agent loop inside our app, using your subscription login, so no crazy extra token burn. This is an easy way to “lend” your downtime to a buddies project. You can simple build a jammer than is set to x turns, c token burn, etc and they can build away with your agent contributing. In the big picture this will open the door to collaboration on a larger scale. But for those of us that don’t trust agents the cypher allows the mistro to become sort of a general. Each jammer remote box has complete control of their jammers. All prompts, usage, etc. you can jump in and direct them or help with prompts. You can also pull the plug at any time. Comms are all handled by our bus technology. Think game server meets agent swarm.

To sum it all up if your thinking about agent swarms or want to play around with our beta check it out. Military strategy seems to be a very very effective tool, it is for us anyway.

u/Successful-Seesaw525 — 4 days ago
▲ 10 r/AIDeveloperNews+9 crossposts

I built a Top100-style leaderboard for AI / vibe-coded games. Free to list, dofollow backlink included.

I spent the last few weeks building a Top100-style leaderboard for AI / vibe-coded games.

What you get when you list your game:

* Dofollow backlink from a niche-relevant site (You will need this to Improve SEO)

* A shareable vote URL you can drop in your Discord, Twitter, Website or game application

* Category traffic (RPG, Roguelike, Strategy, Adventure, etc.)

* Vote callback API so you can grant in-game rewards on every confirmed vote. No paywall, no email farm, no upsell. Submission takes 2 minutes.

AI-Generated games will become the future in no time. Ideas are now more important than ever before. This combination requires a strong marketing strategy for your new upcoming vibe-coded game.

Funny thing is that it's already gaining a lot of traction too, beyond my expectations. I guess the race to the next big vibe coded games has begun?

Unbiased I can say that it might be a lucrative idea to become a first/early adopter of this website to already start building a community & playerbase.

For anyone interested, here is the link:

Vibetoplist.com

u/CuntsAndBluntss — 5 days ago
▲ 8 r/AIDeveloperNews+3 crossposts

First Token aware MCP server.

I present budget-aware-mcp

Built on CodeGraphContext for indexing (tree-sitter, 155 languages).
Replaces their retrieval layer with hop-based graph walks.

  • Sub-millisecond queries (0.07-0.15ms in-process)
  • Token budget enforcement (agent says "max 8000 tokens" — retrieval stops there)
  • Scope check (prevents hallucinated code generation)
  • Deterministic results (same query = same output, always)
  • Session-level token accounting

If you're looking for almost perfect longterm codebase memory this is the project for you.

u/Glittering_Focus1538 — 5 days ago
▲ 12 r/AIDeveloperNews+3 crossposts

open-source AI evaluation platform

he problem I kept seeing:

Companies are deploying AI agents into healthcare, legal, and finance. Their testing process is one developer asking it a few questions and saying "looks good."

The people who actually know what a correct answer looks like — doctors, lawyers, compliance officers — have zero tools they can use. Everything in the eval space requires Python, CLI setup, or JSON configs. Completely inaccessible to domain experts.

What I built:

EvalDesk — open source, self-hostable, no-code AI evaluation.

The workflow is three steps:

Designed specifically so a doctor or lawyer can use it without an engineer in the room. Self-hostable so sensitive data never leaves your infrastructure — critical for HIPAA and legal contexts.

Current features:

What I'm looking for:

Honest feedback. Is this solving a real problem or am I wrong about the gap? Anyone working in AI deployment in regulated industries — does this workflow actually match how your team operates?

GitHub: https://github.com/ramandagar/EvalDesk

u/Immediate-Tap-4777 — 6 days ago
▲ 3 r/AIDeveloperNews+1 crossposts

Personal continual learning for LLMs without GPU — position paper [OC]

I proposed two architectures for enabling LLMs to learn daily from personal interactions:

Internal KV-Sphere Architecture (IKSA)

Background Micro Fine-Tuning (BMFT) Both work with zero GPU and zero catastrophic forgetting.

Full paper:

huggingface.co/spaces/Persak/continual_learning_position_paper

https://github.com/paras2l/Continual-Learning-in-Large-Language-Models-.git

https://zenodo.org/records/20234100?token=eyJhbGciOiJIUzUxMiIsImlhdCI6MTc3ODkzODg2NiwiZXhwIjoyNTM1NzUzNTk5fQ.eyJpZCI6IjY4OTMxZTBmLWM0YTQtNDg2ZC05OGJhLTk0ZDQ2ZTVjNDJkOSIsImRhdGEiOnt9LCJyYW5kb20iOiJkYmQwM2ExZjk4ZmZiNWM1NTFlNDZlN2QzNTY5ZTA0YiJ9.n5VgFWg5SsC5L6KvZGZhsSK\_lll4syeSnvghb6uyAKBAZiOyd15Ov\_Ps6awungKdfVsdEE0GuvOWggspQuQDfw

Twitter thread: [ https://x.com/ParasLashkarin/status/2055644988592247081?s=20 ]

Looking for researchers to validate or disprove these ideas! — Paras Lashkari

reddit.com
u/Early-Importance8582 — 5 days ago