I had an AI play Skribbl.io against real humans with no help — it won first place. So I wrote a research paper about it.
So I've been experimenting with Comet, an LLM-based AI agent by Perplexity, and decided to throw it into one of the most chaotic casual games I could think of — Skribbl.io. No scripted help, no API access to the game. Just the AI reading screenshots, parsing the DOM, and acting like a player.
Here's what happened:
- It competed against 5 real human players across 3 full rounds
- It finished 1st place with 2,165 points
- It achieved a 67% word-guessing accuracy
- It actually attempted to draw — with mixed results (turns out controlling a mouse pixel by pixel is hard)
What blew me away wasn't just that it won — it's how it reasoned. It used letter constraints to narrow down words, ranked vocabulary by frequency, and adapted its strategy round by round. It also ran into genuinely funny failure modes, like accidentally drawing with the eraser the whole time.
I ended up writing a full research paper analyzing the whole session — the methodology, results, failure modes, and what this might mean for using games like Skribbl.io as AI benchmarks going forward. It's been submitted to SSRN and published on Academia.edu.
Full paper here: https://docs.google.com/document/d/e/2PACX-1vQMPjRYBeFTF0376cjADgkiKOwlKQK9YPXnhimGlq5eKAdK0nv0hBjS-W3OOY_uIhjHvsP56hzMruJ0/pub