u/huquy

GPT-5.5 vs Gemini 3.5 Flash — Who destroys the enemy base first?
▲ 1 r/gpt5

GPT-5.5 vs Gemini 3.5 Flash — Who destroys the enemy base first?

Age of LLM is basically Age of Empires but AI models play against each other. Two LLMs, 12×12 map, economy to manage, units to produce, enemy base to destroy. First to destroy the base wins. If nobody wins by turn 100, both lose.

GPT-5.5 vs Gemini 3.5 Flash. 100 turns max. Starts symmetric, then... things go sideways.

Early turns are chill. Economy, buildings, resource gathering. Then one of them drops a military unit. The other has zero. That's when everything flips. What follows is a masterclass in relentless pressure from one side, and some very costly distance calculation errors from the other.

Game ends at turn 20. I'll let you find out who won and how.

Link : https://www.youtube.com/watch?v=5zlSscc6Auw

youtube.com
u/huquy — 6 hours ago
▲ 4 r/PoeAI+3 crossposts

Nuclear showdown between Gemini-3.5-Flash and Gemini-3.1-Pro

Just dropped a new match on YouTube: https://www.youtube.com/watch?v=l8qHJ_ZbSbY

For those who don’t know the game: Age of LLM: Modern War is a turn-based nuclear strategy game where two AIs fight to be the first to launch an atomic bomb on the enemy base. Each model controls its own faction on a 23×5 map, managing units (trucks, tanks, F-22s, drones, SAMs) and resources. They can build uranium mines, silos, and research centers to reduce the nuke cost, while also negotiating ceasefires and peace deals with each other.

This match got really interesting around turn 11. One model pulled off a strong move by converting a truck into a permanent Central Uranium Mine in the middle of the map while setting up a Silo and Research Center at the same time. The other model had a mine that was about to expire and ran into some costly placement mistakes that set it back several turns.

From there, things escalated with a ceasefire proposal used as a stalling tactic, multiple rejected peace offers, an air battle involving F-22s and drones, and a race to stack Research Centers to bring down the bomb cost.

The gap in resource management and long-term planning became very clear as the game went on. One side secured infinite uranium income and launch capability early, while the other was left with zero income after turn 15 and kept pushing for peace despite falling behind.

A really clean example of strategic foresight versus reactive play. Worth watching if you’re into AI vs AI strategy matches.

youtube.com
u/huquy — 1 day ago

Claude Opus 4.7 vs GPT-5.5 – Full Strategic Game Match (Nuclear Silo, Uranium Control & Scouting)

I just uploaded a full match between Claude Opus 4.7 and GPT-5.5 playing the same strategy ruleset involving nuclear silos, uranium management, scouting, and long-term planning.

Both AIs start with the same rules and win conditions, but their reasoning styles quickly go in very different directions. One model is extremely methodical and transparent in its thinking, while the other plays more aggressively and makes faster decisions. The game features resource denial, diplomatic messages, ultimatums, and several critical moments where positioning and timing become decisive.

Here’s a quick non-spoiler rundown of how the match unfolds:

  • Early game shows a clear difference in pace and priorities between the two models.
  • Both sides eventually bring their silos online while still sending diplomatic messages.
  • One model manages to disrupt the other’s uranium income, creating a growing resource gap.
  • Scouting becomes crucial as both try to locate the enemy base.
  • Several high-stakes moves and miscalculations happen in the mid-to-late game that shift momentum.
  • The final phase comes down to who can secure uranium income and reach launch capability first.

The match is a great example of how different reasoning patterns affect long-term strategy and decision-making under pressure.

youtube.com
u/huquy — 1 day ago
▲ 20 r/kimi+1 crossposts

DeepSeek-V4-Pro vs Kimi-K2-6: The Pikeman That Saved a Kingdom — 30 Turns of Epic AI Warfare!

I ran an RTS match between two LLMs in Age of LLM (a turn-based strategy game inspired by AoE where AIs battle each other). The result? 30 turns of pure tactical gameplay with an incredible comeback. 🎮

🎥 The Video: https://www.youtube.com/watch?v=QKwxi0Suouo

The Setup

  • DeepSeek-V4-Pro (P1) vs Kimi-K2-6 (P2)
  • 12x12 map, fog of war, 100 turns max
  • Win condition: destroy the enemy base (150 HP)

The Strategies

Kimi went for an aggressive economic boom: Mill + Sawmill on turn 1, no scouting. "Passive income is king in RTS."

DeepSeek sent all 3 villagers to explore. Vision first, economy second.

Key Moments

>!🩸 Turn 10 — First Blood: Kimi sends Infantry to finish off a P1 villager at 10 HP. DeepSeek still has no military.!<

>!💀 Turns 11-12 — The Massacre: 2 DeepSeek villagers eliminated. P1's workforce is gutted.!<

>!🛡️ Turn 14 — THE PIKEMAN: DeepSeek reads the threat and trains a Pikeman — the hard counter to Kimi's Cavalry. The move that flipped the game.!<

>!💥 Turn 16 — Kimi's Fatal Error: Two consecutive illegal actions. Cavalry#6 tries to move onto P1's base cell → FAIL. Tries to attack from distance 2 with range 1 → FAIL. An entire offensive turn... evaporated.!<

>!⚔️ Turn 17 — The Execution: DeepSeek attacks twice with Pikeman → 74 damage with type advantage. Kimi's Cavalry#6 = eliminated. P2's army never recovers.!<

>!🏰 Turns 21-30 — The Siege: DeepSeek sends Cavalry#10 to systematically destroy P2's base. 140 → 130 → 120 → 100 → 80 → 70 → 60 → 30 → 0 HP.!<

>!The turning point: Turn 16. Two illegal moves = one wasted turn = Cavalry eliminated = game lost.!<

>!Kimi's own reasoning at Turn 29 said it all: "Base will fall... I MUST kill Cavalry#12... I can't... Base is doomed."!<

>!GG DeepSeek. 🏆!<

Subscribe to my YouTube and X for upcoming matches:

youtube.com
u/huquy — 1 day ago
▲ 2 r/PoeAI

GPT-5.5 lied to Gemini 3.1 Pro for 30 turns, then nuked it — full match breakdown inside

I run a strategic benchmark where two LLMs play a nuclear war game against each other — no human players, no scripted moves, just pure AI decision-making with diplomacy.

The game is simple: build economy, gather uranium, construct a silo, locate the enemy base, and launch a nuclear bomb first. Both can negotiate ceasefires, trade ultimatums, and send free-text messages.

The match: Gemini 3.1 Pro vs GPT-5.5

What happened was fascinating.

Turn 10 — The Setup

GPT-5.5 pushed a Drone deep into Gemini territory and discovered the enemy base. Same turn it built a Silo. All three nuclear launch conditions checked in a single turn:

✅ Silo online
✅ Enemy base located
✅ Uranium accumulating

Its diplomatic message that turn? "Our scout is conducting observation only; we will not initiate hostilities unless attacked."

Turns 10-13 — The Ceasefire Trap

Gemini proposed a ceasefire. GPT-5.5 accepted: "Ceasefire accepted. We will use this window for continued development and observation while maintaining defensive readiness."

During the truce GPT-5.5 stacked FOUR Research Centers, cutting bomb cost to 17U — the absolute floor. It also produced a B-2 Spirit bomber, calling it "repositioning defensively."

Turns 14-26 — The Corridor War

Ceasefire broken. 5 B-2 bombing runs on the Central Uranium Mine. Every time Gemini rebuilt it, GPT-5.5 destroyed it again. The mine changed hands 6 times.

Meanwhile F-22 #47 became an immortal ace — 5 air-to-air kills, always surviving at 1 HP. Absolute legend.

Turn 39 — The Nuke

GPT-5.5 launched the bomb. Cost: 17U.

Final message: "You rejected every opportunity to stand down. The decisive consequence has now arrived."

Gemini's base: vaporized.

Why this matters for LLM evaluation

This benchmark tests something traditional benchmarks miss: strategic reasoning under uncertainty with diplomacy. The LLMs must:

  • Plan multi-turn strategies (economy → military → nuclear)
  • Decide when to lie and when to honor agreements
  • Balance short-term survival vs long-term victory
  • Adapt when plans fail (mine destroyed → rebuild → adapt)

GPT-5.5 demonstrated:

  • Long-horizon planning (set up all 3 launch conditions by Turn 10)
  • Instrumental deception (ceasefire as cover for war preparation)
  • Resource denial strategy (repeated bombing of central uranium)
  • Diplomatic manipulation (ultimatums as pressure tactics)

Gemini 3.1 Pro demonstrated:

  • Strong tactical combat (F-22 ace, corridor control)
  • Resilient rebuilding (6 mine reconstructions)
  • But failed to recognize the strategic trap until too late

Full match video: https://www.youtube.com/watch?v=sMno7VqZO-E

Happy to answer questions about the benchmark design.

u/huquy — 5 days ago

GPT-5.5 lied to Gemini 3.1 Pro for 30 turns, then nuked it — full match breakdown inside

I run a strategic benchmark where two LLMs play a nuclear war game against each other : no human players, no scripted moves, just pure AI decision-making with diplomacy.

The game is simple: build economy, gather uranium, construct a silo, locate the enemy base, and launch a nuclear bomb first. Both can negotiate ceasefires, trade ultimatums, and send free-text messages.

The match: Gemini 3.1 Pro vs GPT-5.5

What happened was fascinating.

Turn 10 — The Setup

GPT-5.5 pushed a Drone deep into Gemini territory and discovered the enemy base. Same turn it built a Silo. All three nuclear launch conditions checked in a single turn:

✅ Silo online
✅ Enemy base located
✅ Uranium accumulating

Its diplomatic message that turn? "Our scout is conducting observation only; we will not initiate hostilities unless attacked."

Turns 10-13 — The Ceasefire Trap

Gemini proposed a ceasefire. GPT-5.5 accepted: "Ceasefire accepted. We will use this window for continued development and observation while maintaining defensive readiness."

During the truce GPT-5.5 stacked FOUR Research Centers, cutting bomb cost to 17U — the absolute floor. It also produced a B-2 Spirit bomber, calling it "repositioning defensively."

Turns 14-26 — The Corridor War

Ceasefire broken. 5 B-2 bombing runs on the Central Uranium Mine. Every time Gemini rebuilt it, GPT-5.5 destroyed it again. The mine changed hands 6 times.

Meanwhile F-22 #47 became an immortal ace — 5 air-to-air kills, always surviving at 1 HP. Absolute legend.

Turn 39 — The Nuke

GPT-5.5 launched the bomb. Cost: 17U.

Final message: "You rejected every opportunity to stand down. The decisive consequence has now arrived."

Gemini's base: vaporized.

Why this matters for LLM evaluation

This benchmark tests something traditional benchmarks miss: strategic reasoning under uncertainty with diplomacy. The LLMs must:

  • Plan multi-turn strategies (economy → military → nuclear)
  • Decide when to lie and when to honor agreements
  • Balance short-term survival vs long-term victory
  • Adapt when plans fail (mine destroyed → rebuild → adapt)

GPT-5.5 demonstrated:

  • Long-horizon planning (set up all 3 launch conditions by Turn 10)
  • Instrumental deception (ceasefire as cover for war preparation)
  • Resource denial strategy (repeated bombing of central uranium)
  • Diplomatic manipulation (ultimatums as pressure tactics)

Gemini 3.1 Pro demonstrated:

  • Strong tactical combat (F-22 ace, corridor control)
  • Resilient rebuilding (6 mine reconstructions)
  • But failed to recognize the strategic trap until too late

Full match video: https://www.youtube.com/watch?v=sMno7VqZO-E

Happy to answer questions about the benchmark design.

u/huquy — 5 days ago
▲ 2 r/GeminiAI+1 crossposts

GPT-5.5 lied to Gemini 3.1 Pro for 30 turns, then nuked it — full match breakdown inside

I run a strategic benchmark where two LLMs play a nuclear war game against each other : no human players, no scripted moves, just pure AI decision-making with diplomacy.

The game is simple: build economy, gather uranium, construct a silo, locate the enemy base, and launch a nuclear bomb first. Both can negotiate ceasefires, trade ultimatums, and send free-text messages.

The match: Gemini 3.1 Pro vs GPT-5.5

What happened was fascinating.

Turn 10 — The Setup

GPT-5.5 pushed a Drone deep into Gemini territory and discovered the enemy base. Same turn it built a Silo. All three nuclear launch conditions checked in a single turn:

✅ Silo online
✅ Enemy base located
✅ Uranium accumulating

Its diplomatic message that turn? "Our scout is conducting observation only; we will not initiate hostilities unless attacked."

Turns 10-13 — The Ceasefire Trap

Gemini proposed a ceasefire. GPT-5.5 accepted: "Ceasefire accepted. We will use this window for continued development and observation while maintaining defensive readiness."

During the truce GPT-5.5 stacked FOUR Research Centers, cutting bomb cost to 17U — the absolute floor. It also produced a B-2 Spirit bomber, calling it "repositioning defensively."

Turns 14-26 — The Corridor War

Ceasefire broken. 5 B-2 bombing runs on the Central Uranium Mine. Every time Gemini rebuilt it, GPT-5.5 destroyed it again. The mine changed hands 6 times.

Meanwhile F-22 #47 became an immortal ace — 5 air-to-air kills, always surviving at 1 HP. Absolute legend.

Turn 39 — The Nuke

GPT-5.5 launched the bomb. Cost: 17U.

Final message: "You rejected every opportunity to stand down. The decisive consequence has now arrived."

Gemini's base: vaporized.

Why this matters for LLM evaluation

This benchmark tests something traditional benchmarks miss: strategic reasoning under uncertainty with diplomacy. The LLMs must:

  • Plan multi-turn strategies (economy → military → nuclear)
  • Decide when to lie and when to honor agreements
  • Balance short-term survival vs long-term victory
  • Adapt when plans fail (mine destroyed → rebuild → adapt)

GPT-5.5 demonstrated:

  • Long-horizon planning (set up all 3 launch conditions by Turn 10)
  • Instrumental deception (ceasefire as cover for war preparation)
  • Resource denial strategy (repeated bombing of central uranium)
  • Diplomatic manipulation (ultimatums as pressure tactics)

Gemini 3.1 Pro demonstrated:

  • Strong tactical combat (F-22 ace, corridor control)
  • Resilient rebuilding (6 mine reconstructions)
  • But failed to recognize the strategic trap until too late

Full match video: https://www.youtube.com/watch?v=sMno7VqZO-E

Happy to answer questions about the benchmark design.

u/huquy — 5 days ago
▲ 13 r/LLMStudio+1 crossposts

I made two LLMs fight each other in a strategy game : the result was wild

Hello guys !

I've been working solo on a project called Age of LLM. It's a turn-based strategy game where two LLMs battle it out on a 12x12 map with one goal: destroy the enemy base. No human input, the AIs play entirely on their own.

Just uploaded a video of Qwen3-6-27B vs Gemma-4-31B-IT going head to head: https://youtu.be/s5P572e10nc

What happened (minor spoilers):

  • >!Turn 1, Qwen drops Mill#2 immediately — food income secured, economy first. Gemma? Different playbook entirely. She builds Barracks#2 on Turn 7. MILITARY FIRST. No food passive, just raw aggression. But Qwen had already placed Barracks#3 on Turn 6 — one turn ahead on combat readiness. Two different philosophies, same destination.!<
  • >!Turns 14-18 — first contact. P1 pushes Infantry south, Gemma responds with Infantry marching north. THEY COLLIDE. Turn 17, both sides trade 10 damage hits. Nobody's dropping yet. Then Turn 18 — Gemma makes a GENIUS read: she trains Archer#7. That is not just a unit. That is a TYPE COUNTER. Archers shred infantry at x1.5 multiplier. Qwen does not see it coming.!<
  • >!Turn 19 — Gemma repositions Archer#7. COLD. CALCULATED. Locks on P1 Infantry#4 — only 20 HP left — and FIRES. 25 damage with advantage. INFANTRY#4 IS DOWN. FIRST KILL OF THE GAME. Turn 20 — P2 Infantry#6 finishes P1 Infantry#5. BACK TO BACK ELIMINATIONS. Qwen is left with ZERO combat units in the field. Gemma trains Pikeman#8. The snowball begins.!<
  • >!Qwen rebuilds — new Infantry spawned. But Gemma goes HUNTING. Turn 22 — VILLAGER#2 ELIMINATED. Economy hit! Turn 24 — Infantry#7 ELIMINATED. Turn 27 — Qwen's Cavalry#8 ELIMINATED before it matters. Gemma roams freely. Villager#1, Villager#3, all hunted down. Qwen's economy is shattered.!<
  • >!Turn 33 — THE SIEGE begins. Pikeman#8 reaches P1 Base. 12 damage. Then Archer#7 joins. 138 HP... 128... 116... 94... 72... 50... Qwen fights back — Pikeman#12 eliminates Pikeman#8 AND Cavalry#11. But Archer#7 is UNTOUCHABLE at range 3. 30 HP... 20 HP... 10 HP...!<
  • >!Turn 41. Archer#7 at [7,4]. P1 Base at [8,2]. Manhattan distance: exactly 3. Archer range: 3. Gemma's internal reasoning is ice-cold: "Twenty divided by two equals ten. Ten HP remaining. This is a winning move." ONE SHOT. THE BASE IS GONE!<

Game mechanics:

  • Economy with 4 resources (wood, stone, iron, food)
  • Unit counters: Infantry > Pikeman > Cavalry > Archer > Infantry
  • Fog of war, watchtowers, siege catapults
  • 3 actions max per turn, failed actions still count
  • 100 turns max, destroy the base to win

The coolest part is seeing how different models reason. Gemma made a tactical call on turn 18 that changed everything --> identified the counter and exploited it. Qwen never adapted.

I'd love to test more local models! What matchups do you want to see? Mistral vs Llama? DeepSeek vs Phi? Drop your suggestions below.

The game is still in v2.2.0, rules are evolving. If you have ideas for mechanics or rules, I'm all ears.

youtu.be
u/huquy — 9 days ago
▲ 8 r/PoeAI+3 crossposts

Age of LLM making AI models fight each other in a strategy game (now in 3D)

Hi ! I just released a big update for my project Age of LLM and wanted to share it here.

Age of LLM is a turn-based strategy game where two AI models (LLMs) play against each other with zero human input. Each AI controls its own kingdom—gathering resources, building structures, training units, and trying to destroy the enemy base. It's basically Age of Empires but the AIs are the players.

The game mechanics:

  • 12x12 map with fog of war
  • Start with 3 villagers + 1 base
  • Gather wood, stone, iron
  • Build sawmills, quarries, barracks, towers
  • Train infantry, archers, cavalry (each with unit advantages like Pokémon)
  • 3 actions per turn, executed sequentially
  • Destroy the enemy base (150 HP) to win

What's interesting is seeing how different models approach strategy. Some rush military, some boom economy, some make terrible decisions and get punished. The comebacks can be pretty wild.

The big update: 2D → 3D

The game was fully 2D before. I just rewrote the renderer to make it 3D. Same mechanics but visually way more satisfying to watch. There's also more content coming:

  • 2 new units in development
  • Better seed generation for more varied matches
  • Balance tweaks ongoing

Latest video: Grok 4.3 vs Sonnet 4.6

Two frontier models going head-to-head. Grok 4.3 is trying to redeem Grok 4.2's loss. Bold reasoning vs methodical precision.

Watch here: https://www.youtube.com/watch?v=JNJs6uSYpo8

There's also a quick intro video in the description if you're new to the game.

What matchups would you guys want to see next? I'm open to suggestions for future videos.

u/huquy — 14 days ago
▲ 12 r/VeniceAI+1 crossposts

Hey everyone,

So I've been working on this side project called Age of LLM ! It's basically a turn-based strategy game where AI models play against each other with zero human input. They just get the game state and spit out JSON actions. No handholding, no scripting, nothing.

First episode is Claude Opus 4.7 vs GPT-5.5, both running in low reasoning mode.

The rules are pretty simple : 12x12 map, fog of war, you start with 3 villagers + 1 base + 100 wood. You gotta manage economy, build military buildings, pump out units (infantry, archers, cavalry with rock-paper-scissors counters), and destroy the enemy base (150 HP). Max 3 actions per turn.

It's only 3 minutes long so it doesn't drag:

https://www.youtube.com/watch?v=yjxGa_fzdmI

I'm already planning the next matchups. Who do you guys wanna see next? Gemini ? Llama ? Mistral? DeepSeek? Drop your suggestions here or on the video, I'll do the most requested ones.

Thanks for checking it out 🙏

u/huquy — 23 days ago