Most of my designs are bad. I've gotten faster at finding out.
For the last several months, I've been working on making a set of games using a unique deck configuration that I've fallen in love with. As part of this process, I've really tried to explore my tastes in games and decide what makes a game fun to me. I determined that I like games that have a high amount of decisions - think chess - but don't end in lopsided scores. I dig games that have lots of "Wow!" moments that you'll remember for weeks afterward. I used these two - - along with nine others as a scoring method to rank some old chestnut and was stunned to find that for me, chess and gin rummy scored as solid C's while checkers and hearts scored B's. My goal was to figure out how to make games for my tastes that were - if not better than these classics - more fun for me to play.
I discussed this at length with all of the major LLMs and along the way, created a skill for Claude Code and Claude Cowork to help me out. The skill takes a game ruleset along with descriptions of any necessary components such as dice or additional cards and writes a Python simulator for that game. These simulations prioritize not just the statistical elements for the game but also create players who play the game with differing methods such as aggressive, defensive, chaotic, skilled, amateur, and so forth. The simulations can be run thousands of times within minutes and collect information about degenerate play methods, worthless scoring methods, or reveal first player advantages. The skill also includes the ability for the simulator to output narrated games that discuss why each player made the moves that they did. This allows me to then work with Claude to find ways to improve the game.
It is critical to point out (although I've likely lost a significant portion of you already) that this methodology does not replace human playtesting. LLMs can't really measure if a game is really fun, of course...see the scores above...but to very loosely paraphrase Feynman, they may be dumb but they sure are fast. As such, I can tell within minutes if my proposed game has a first player advantage problem and this could take hours to find out at the table. I might be able to find out that a cool reward I want players to chase turns out to be worthless at the price I have it set at. I can have the simulator weigh in on better card distributions or different faction abilities. The goal is to find these problems before I get to the prototype phase and then throw it on the table.
This skill set also does not create art or final game manuals. I'm a firm believer that AI art does not belong on any published game and that humans should be behind every major piece of art or writing created for a game.
These heuristic AIs can't determine if the game is fun to be in but they can measure the things that I like and allow me to pump up those areas of the game as well as to help me identify the weaker parts of the game. The scoring of C for chess - long endgames, decisive blowouts - is a tuned measurement of something that matters to me and not a judgement of the quality of the game. Most of what these skills helped me vet were pretty bad games. The major value for me wasn't that these skills produced amazing games but instead, they helped me find the stinkers that much faster.
I saw articles posted by u/Independent-Soft2330 and u/Hot-Rooster1675 and many more on the subject over the last couple of months and wanted to contribute my part. My hope is that you'll give these skills a try - link in the comments - and weigh in on ways to help improve these tools. Users will be able to tune them as they see fit and make these even more useful.
I've made many many many terrible games with these skills but this allowed me to move on from those quickly and try others. I've extremely pleased with the games that these skills have helped me to bring to the table and I look forward to hopefully playing many more!