
75,000 simulations later: How we balanced an indie board game prototype
As a board game lover and data nerd, for the last few months I've been working on a board game engine that lets me encode strategy games, build agent players, simulate thousands of games, and mine that data for insights, strategic edges, and recommendations for improving game balance. I started simple with published games (Yahtzee, Uno, Connect 4, Ticket to Ride, and Catan), but the goal was always to eventually support indie board game development - and I'm looking for the next indie prototype to encode, more on that at the bottom.
A few weeks ago I finally got my chance to work with an indie board game designer and passionate Chesapeake Bay sailboat racer who was building a game as thematically authentic to sailing as possible. After 75,000+ simulated games and many iterative changes to gameplay, I wanted to share how the process worked, what I learned, and what we changed.
This isn't intended to replace human playtesting — it complements it. Human playtesting still tests fun-ness, aesthetics, lore, and the subjective aspects of a board game that matter most. This just collects data from thousands of playtests that would take years (decades? centuries?) to run with humans.
The game in 60 seconds
At its core, this is a racing game played on a non-uniform 519-hex board modeled after the West River of the Chesapeake Bay. There are 10 different courses, 12 different wind speed × wind direction combinations, and a variety of mechanics intended to simulate true sailboat racing: tacking, right of way (ROW), and in-irons (so you can't sail directly into the wind). Each player rolls 2D6 for movement, and there are "Turn of Event" (TOE) hexes throughout the board that let players draw event cards that impact gameplay. An optional "Regatta" format allows players to do multiple races to determine an overall winner (think Mario Kart-style).
The process in a nutshell
The process starts with writing the game rules in code. I had a pretty good workflow with this after encoding my first 5 published games. This time, it required sitting down with the designer and documenting every rule, component, nuance, and edge case in detail to make sure the gameplay was clear. A clear rulebook and a photo of the game board is a good place to start.
The next step is encoding the computer agents that play the game. I always start with random agents that just randomly choose between legal moves each turn, then train them to be progressively smarter with differing strategic edges. Then I playtest the games in the simplest Command Line Interface (CLI) I can encode. It's a way for me and the designer to play the games, target confusing edge cases, and validate that the agents are playing intelligently. It's not the most beautiful or user-friendly, but it's exactly what's needed to make sure gameplay and agents are behaving correctly. We each probably played 20-30 games against the agents, and we caught things we would've missed otherwise. More on that in a second.
Once that's validated, finally, the fun part — simulate and iterate. I ran multiple batches of thousands of simulations, mining the results for data. Each time, we identified enhancement opportunities, tuned the mechanics, and re-ran the simulations until the balance was in-spec and the designer was ready to prototype.
5 Design Questions the Simulation Answered
1) Turn of Event Deck Audit
The original deck had 52 cards: 13 different card types with 4 copies of each. By running simulations and identifying 1) the % chance of drawing each card and 2) the percentage-point impact each card had on winning, we were able to nerf and buff cards by changing their card count or card effect, and bring more balance to the game. We ended up with a 44-card deck with 12 different card types.
Key findings and changes:
- Wind Change: originally, the player who drew this card could pick any wind direction × wind speed they wanted, but the designer determined this wasn't authentic to sailing. Wind can change during a real sailboat race, but not based on a sailor's preference. Replaced with 4 hardcoded wind speed × direction combos that immediately take effect when the card is drawn.
- Becalm (another) / Run Aground (another): these two cards were mechanically identical, and combined they were the most common card type in the deck (8/52 = 15% of cards). They also had a big impact on winning, so we nerfed them in quantity, from 8 cards to 2 cards (and kept them named "Run Aground" for consistency).
- Run Aground (self): the biggest negative-impact card on win pp. Nerfed from 4 cards to 2.
- Auto movement cards (Lift, Super Lift, Gust, Header, Foul): each of these had very little impact on winning, so we buffed them, doubling the movement effect for each card.
2) Player Count Sweet Spot
The game could theoretically support 2-8 players, but what's the recommended player count? The simulation says 4-6.
At 2-3 players, the game is too short, the right of way (ROW) mechanics (the fun part where players "crash into each other") barely fire, and the Commodore first-mover advantage is much larger than intended. At 7-8 players, the game drags on and the right of way mechanics are constantly firing. At 4-6 players, everything hits its sweet spot: good game length, good right of way interactivity, and a slight but not too dominant Commodore advantage.
3) Turn of Event Hex Refresh Format
When a player passes through a TOE hex and draws a card, what happens to that hex afterward? Two refresh modes were on the table: Once Per Hex Per Player (each player can claim each hex once over the course of the race) versus Once Per Hex, Period (the first player to claim a hex kills it for everyone else for the rest of the race).
We wanted the TOE deck to be an actual mechanic that threw some fun chaos into the game that would shake up positions, reward route diversity, and give players incentives to detour. The deck only does that work if cards actually get drawn. So the design question was simple: how many cards does each refresh mode actually produce per game?
Once Per Hex, Period cut card draws from 5.9 per game to 2.5. At that volume, the deck stops being a meaningful game system. So Once Per Hex Per Player won. It delivered enough card volume to make the deck a real part of the game without impacting other key metrics like game length.
4) Commodore Design
The "Commodore" is the player who rolls the highest 2D6 before the game starts. We wanted the Commodore to have a small advantage as a reward for winning that random roll, but we weren't sure whether 1) first placement of boat on the start line or 2) first movement roll would be the bigger advantage. So we tested both with 2,000 games each.
Moving first is worth more than placing last costs. The current design (place last, move first) gets the Commodore to a 29.5% win rate against a 25% chance baseline, the intended slight advantage. The inverse (place first, move last) drops the Commodore to 21.4%, a slight disadvantage. The design target was "small random reward for winning the roll," which is what shipped.
5) Downwind rule
This is where the playtesting in the CLI before simulating came in handy. The original spec had a general downwind movement bonus where every step in the wind direction was discounted. The rule never made it past CLI playtest. It was confusing to encode (which steps count as "downwind"? does it stack? what about partial downwind?), and at the table it kept causing arguments about edge cases. The fix: keep Spinnaker as the only source of a downwind bonus, drop the general rule. If a rule is hard for the engine to encode cleanly, that's often a warning that it'll be hard for humans to track at the table. Data showing downwind movement was rare anyway (14% of all steps) sealed it.
The bigger question: does the design intent hold up?
Internal tuning is one thing. The harder question: was the game designed so that different conditions reward different approaches?
The designer wanted that. Sometimes the most direct path should be optimal. Sometimes detouring for TOE cards should be optimal. Sometimes blocking opponents and optimizing for right-of-way positioning should be optimal. The simulation tests this directly: build a competent baseline agent (CourseOptimizer), then build three variants that each tilt toward one of those play styles (CardOpportunist, CardHoarder, ROWPositioner), and see whether any of them dominates.
40,000 games across different player count × course × wind conditions. The headline:
All four agents finish within ±1pp of each other in aggregate. No "winning strategy."
The differentiation lives in the conditions. ROWPositioner gains +4.4pp in WNW wind. CardOpportunist underperforms on courses 5 and 7. The strategy that wins depends on what's in front of you.
That's the designed property: competent play converges, conditions create the variance. Plus the obvious: rolling high still wins games.
Benchmarking this game against the other 5 published games
After finalizing the sailing game mechanics, I wanted to see how it stacked up against the other games in my library. For now, I've settled on 5 metrics to compare across games: Game length (how long is the game?). Decision density (how many decisions per turn?). Lead stickiness (how often does the leader at midpoint go on to win?). Seat bias (does where you sit matter?). Score spread (how decisive are wins?).
A few dimensions where Sailing stood out:
Shorter, tighter races — Sailing games run about 38 turns on average with a 5.5-turn gap between first and last finisher. That's faster than Catan (86 turns) and TTR (153) at comparable player counts, with a tighter score spread relative to game length. A typical race plays in roughly the time it takes to pour a beer and explain the rules.
Seat bias — 20.3 percentage points between best and worst seat at 4 players, on the high end of the library. Earlier in turn order means cleaner board state; later means navigating around everyone else's choices. Two caveats: this measurement is at 4 players (the bias shrinks at higher counts), and the recommended regatta format re-rolls the Commodore each race, so turn-order positions rotate and the per-race bias averages out across the regatta.
Lead stickiness — Leaders at the halfway mark go on to win 70% of the time, vs. Catan (56%), Uno (48%), and TTR (24%). Once you've pulled ahead in this game, the lead is durable. That fits a racing game (there should be momentum to a real lead), but the number is at the higher end of what's typical.
Looking for indie designers
If you're an indie designer with a strategy-game prototype and are curious what this process would surface for your game, I'm interested. Ideally a strategy game (not party/social) and prototype-stage with rules stable enough to encode.
DM or reply here if interested. No cost, no formal pitch, no link, no sales process. Just trying to find the next interesting game to encode and I'd rather work with people whose prototypes I want to learn from.
Two questions for the sub
- What's a design question you'd want a simulation to answer about your prototype? Trying to find the questions designers actually want answered, not the ones I assume they do.
- If you've got a strategy prototype in playtest right now: what's the one mechanic you're least sure about? The thing you keep tweaking and can't decide if it's working.