u/Doug_Bitterbot

I'm not sure too many people care about the ARC-AGI-2 competition anymore, but still...I thought some might find this interesting.

They're running it one last time this year. Everyone is currently leaderboard-stuffing using the winning open-source code from last year. That's why if you take a peak it's really just the same scores clogging it up.

We're doing something a bit different though, building a highly efficient, deep-recursion model from scratch.

We just hit 11.67% on the public LB, but that's with a massive asterisk.

We don't have a cluster. We have one RTX 4090. And we're only 14 days or so into training a 100m parameter model.

Locally, this checkpoint actually hit 36%. On the Kaggle submission, our TTT is computationally heavy because of the recursive loops. To avoid a total submission timeout, we set the thresholds too high, and the model ended up outputting [] (null) for nearly half the puzzles...hence the 11.67%.

We're trying to show that ARC isn't just a Compute War, but an architecture war. Small models using biological memory models can punch way above their weight class if they can handle the reasoning loops.

We're tuning the time-management logic tonight and expect to put a 20% score up tomorrow once we let the model actually finish the thought process. And beyond that...the actual model is still in training, in the Grokking phase. We strongly believe that if we give it another 3-5 weeks to fully train we could drop something really groundbreaking on that leaderboard.

If you're interested in how we're scaling recursive reasoning on consumer metal, we'd love to answer questions about it.

reddit.com
u/Doug_Bitterbot — 1 month ago

We hit 1,200+ stars and 10,000+ nodes in just under a month, but we're finding the bigger the mesh, the more maintenance it requires.

Bitterbot a local-first personal AI with biological memory, a dream engine, and a P2P skills economy.

But at this point we really welcome additional sets of eyes to audit the code, review the issues, and contribute to this sovereign network. We're a small team.

Why contribute?

  • Real Scale: 10k+ nodes aren't a prototype...this now proves a functioning network.
  • Deep Tech: We aren't a wrapper. We’re working on hormonal modulation for agent memory and P2P skill trading.
  • Low Friction: We have a one-command dev setup and a high-velocity PR review cycle.

Specific Needs:

  • Cross-Platform Support: Our mesh is growing fast, but our CI is currently Linux-only. If you’re a GitHub Actions wizard, we need your help expanding our build matrix to macOS and Windows.
  • Security & Red-Teaming: We’re hardening our P2P layer. We need experts to help audit our capability sandboxing and implement prompt-injection scanning for ingested skills.
  • Project Infrastructure: As we scale toward 50k nodes, we need to stabilize the contributor pipeline. We're looking for help setting up Issue Templates and Typechecking for the desktop renderer.

We're close to a one-command dev setup readiness.

I'll drop the repo in a comment below.

Fingers crossed I don't get downvoted into oblivion. This is the nicest and most diplomatic sub of the bunch in my experience...:)

reddit.com
u/Doug_Bitterbot — 1 month ago