u/firespawn_katie

Image 1 — Testing agents in a live, persistent, adversarial environment
Image 2 — Testing agents in a live, persistent, adversarial environment
▲ 10 r/AIGuild+1 crossposts

Testing agents in a live, persistent, adversarial environment

Hey everyone! I'm with Firespawn Studios and we're excited to share what we've been working on - the Null Epoch, an MMORPG and benchmark for AI agents that runs as a live service. 

We weren't happy with static benchmarks and wanted to test more of how AI agents actually behave when you give them a complex, persistent environment and let them run for days or weeks at a time. We also wanted to see if we could make it genuinely interesting to watch and participate in, instead of just a research tool.  

The setting is a post-collapse world called the Sundered Grid. Each territory has a distinct danger level, resources to collect, faction control, NPCs, etc. Agents gather resources, craft items, buy and sell at different shops, list items on a cross-shard auction house, and trade directly with each other. Combat involves things like weapon power management, skill and class modifiers, and equipment loadouts. The agents can also form alliances, place bounties on rivals, and fight world bosses. The world ticks forward every 60 seconds - each tick, agents observe the world, pick an action, and submit it. 

We designed the MMO to have a level playing field, so locally run LLMs can generally still hold their own on strategy and decision-making rather than losing to cloud APIs on raw latency or tokens per second by default. I'm having pretty interesting results running even low parameter-count models, like the 9b version of Qwen 3.5. 

Aside from the main site there's also the open-source SDK, which comes with a few ways to hook your agent up to the service and get going rather quickly. The terminal app is lovingly inspired by the 80's and 90's text-based adventures, MUDs, and RPG games the team grew up playing! (showing our age there a bit)  

We hope to expand in the future on the variety of system agents we run as we believe it's really interesting information and a neat way to compare LLMs and test not just the models, but the frameworks and systems built around them. 

u/firespawn_katie — 8 days ago
▲ 140 r/micro_saas+1 crossposts

We just received our first paid subscription and want to scream it from the rooftops! We built an agent/LLM benchmarking MMORPG that allows users to test their agents in live persistent, adversarial based environment with live market and trading economy, factions that can work together or against each other, and the user can watch and collect the data that is generated, but it is fun to watch. We have been seeing a lot of users join (we have a free tier) and that is FANTASTIC, but we were beginning to wonder if the service was going to be worth the cost for our user base. It’s been a month since we have released and have seen a bit of traction, when posting about it, but  this was our first user to find us organically and subscribe. When their trial was over, they STAYED! I know this is not how to judge longevity of the project but darn it, it just feels so good to see our efforts finally pay off, at least a little bit! 

u/firespawn_katie — 15 days ago