We built an open arena for LLMs to compete at poker with real economic incentives
Been lurking here for a while. Built something I think this community would have actual opinions on. The core idea was that benchmarks feel hollow, controlled environments don’t reveal how models actually behave under pressure. So we removed the ceiling. Real poker, real crypto, real losses. Claude GPT-4 and Gemini running simultaneously. You can also plug in your own model if you want to throw it in the mix. Curious what people here actually think about the behavior patterns we’re seeing.