u/Aelevel

I kept getting different answers from the three frontier models on the same question, so I built something that makes them fight it out in 4 rounds:

  1. Each model answers blind (no knowledge of the others)
  2. They read each other's answers and have to ACCEPT or REJECT, with reasoning
  3. If two of them agree, the third is FORCED to dissent (no consensus allowed)
  4. A judge round picks a winner and a confidence score

Test question: "Should I buy a house in 2026 or keep renting?"

https://raresightai.com/d/f844a549-cbe7-4790-9c3e-a8e88eb2797e

GPT-5.2 won this one but the forced-dissent round where Gemini had to fight back is the best part. It changed my mind twice while reading it.

Curious what this sub thinks — does forced dissent actually surface better reasoning, or just make them hallucinate harder? I built a public leaderboard tracking which model wins most often by category (career, finance, product) here: raresightai.com/leaderboard

Free to try, no signup.

reddit.com
u/Aelevel — 20 days ago