u/DrewGrgich

For the last several months, I've been working on making a set of games using a unique deck configuration that I've fallen in love with. As part of this process, I've really tried to explore my tastes in games and decide what makes a game fun to me. I determined that I like games that have a high amount of decisions - think chess - but don't end in lopsided scores. I dig games that have lots of "Wow!" moments that you'll remember for weeks afterward. I used these two - - along with nine others as a scoring method to rank some old chestnut and was stunned to find that for me, chess and gin rummy scored as solid C's while checkers and hearts scored B's. My goal was to figure out how to make games for my tastes that were - if not better than these classics - more fun for me to play.

I discussed this at length with all of the major LLMs and along the way, created a skill for Claude Code and Claude Cowork to help me out. The skill takes a game ruleset along with descriptions of any necessary components such as dice or additional cards and writes a Python simulator for that game. These simulations prioritize not just the statistical elements for the game but also create players who play the game with differing methods such as aggressive, defensive, chaotic, skilled, amateur, and so forth. The simulations can be run thousands of times within minutes and collect information about degenerate play methods, worthless scoring methods, or reveal first player advantages. The skill also includes the ability for the simulator to output narrated games that discuss why each player made the moves that they did. This allows me to then work with Claude to find ways to improve the game.

It is critical to point out (although I've likely lost a significant portion of you already) that this methodology does not replace human playtesting. LLMs can't really measure if a game is really fun, of course...see the scores above...but to very loosely paraphrase Feynman, they may be dumb but they sure are fast. As such, I can tell within minutes if my proposed game has a first player advantage problem and this could take hours to find out at the table. I might be able to find out that a cool reward I want players to chase turns out to be worthless at the price I have it set at. I can have the simulator weigh in on better card distributions or different faction abilities. The goal is to find these problems before I get to the prototype phase and then throw it on the table.

This skill set also does not create art or final game manuals. I'm a firm believer that AI art does not belong on any published game and that humans should be behind every major piece of art or writing created for a game.

These heuristic AIs can't determine if the game is fun to be in but they can measure the things that I like and allow me to pump up those areas of the game as well as to help me identify the weaker parts of the game. The scoring of C for chess - long endgames, decisive blowouts - is a tuned measurement of something that matters to me and not a judgement of the quality of the game. Most of what these skills helped me vet were pretty bad games. The major value for me wasn't that these skills produced amazing games but instead, they helped me find the stinkers that much faster.

I saw articles posted by u/Independent-Soft2330 and u/Hot-Rooster1675 and many more on the subject over the last couple of months and wanted to contribute my part. My hope is that you'll give these skills a try - link in the comments - and weigh in on ways to help improve these tools. Users will be able to tune them as they see fit and make these even more useful.

I've made many many many terrible games with these skills but this allowed me to move on from those quickly and try others. I've extremely pleased with the games that these skills have helped me to bring to the table and I look forward to hopefully playing many more!

Introduing Throughline, an AI tool that helps human GMs run tabletop sessions. The system listens to your session live, thinks roughly 8 fictional minutes ahead, and renders that future as a small grid of images — a storyboard for what the players would encounter if they take a particular branch. You glance at the dashboard, pick the path you want to steer toward, and the brain gets the next stretch of thinking time to plan the checkpoint after that.

The thing it deliberately doesn't do: narrate to your players, run combat, voice NPCs, or appear at the table at all. Players never see anything Throughline produces. The GM does all the live performance — voicing, improvising, reading the room. Throughline's job is the long-horizon planning, so the GM can focus on running the table.

Why this exists. Ted (the engineer behind it) has a friend, Ben — math PhD, the best GM Ted's played with. Ben preps three hours per session, voices a dozen NPCs, plans coherent arcs in big worlds, and adapts brilliantly when players surprise him. Ben moved away. The next-best GM in their group is Ted, and Ted's the first to admit he doesn't have Ben's prep time or practice. The original goal was just for Ted to be a better GM. The wedge that emerged: what if the prep stat could be made available to GMs whose strength is the social side of the table — improv, NPC voices, table feel — but who don't have years of long-term narrative planning under their belt?

What's worked so far (six live playtests + a lot of internal testing):

The image-storyboard format actually beats prose at table speed. Glanceable, no reading lag while the table is waiting on you.
One-shots are reliable. Premise → opening narration → live storyboards → clean ending.
The system catches and pays off callbacks better than expected. (We did not anticipate this — it was an emergent property of running ahead of the table.)

What we're still uncertain about:

Multi-session campaigns. Compressed-transcript persistence is the architecture, but we don't have enough sessions to claim it works cleanly across multiple weeks.
Whether GMs other than us-and-our-friends can pick it up cleanly. That's exactly what this playtest is for.
The right branching cadence — should the system propose 1 branch, 3, 5? We've moved this around and don't have a settled answer.

The ask. We're opening access to about ten outside GMs to use it for their own sessions and tell us what works and what doesn't. Fit: GMs strong on the social side (improv, voices, in-the-moment narration) but who either don't have time to prep extensively or don't have years of long-term planning practice. If you're already a great GM who enjoys prep and does it well, Throughline probably isn't for you.

It's a web app. Sign in with Google — no GitHub, no terminal setup. You can run a homebrew world by giving it the lore, or a setting you already love from books / games / shows. We'll monitor and pay LLM API costs — about $0.50 per hour of live play, so a weekly three-hour session runs around $6–$10/month. The intention with playtesting is to have you eat our dog food so we'll provide the dog food - - within reason. :)

There will be bugs. We want testers who find that interesting rather than frustrating, who'll be in active conversation with us, and who have an eye for game design. Design feedback is the main thing we want — not early customers, not business partners.

Site has the longer writeup of how the system works and the design thinking behind it. Link in the comments. Sign up for the alpha there.

Discussion question (regardless of testing interest): what's the part of GMing you'd most want a tool like this to not touch? We've drawn the line at the social layer — improv, voices, table chemistry — those stay entirely human. Curious whether that line matches where you'd draw it, or whether you'd put it somewhere else.

— Drew, on the Throughline team

Most of my designs are bad. I've gotten faster at finding out.

Built a "co-DM" AI that runs alongside the human GM (not replacing) — looking for ~10 playtesters