



Gemini 3.5 Flash - tested on a social deduction benchmark
Hi folks, just wanted to share some fresh results on Gemini 3.5 Flash from a benchmark I've made.
This benchmark pits models against each other in autonomous games of Blood on the Clocktower - a highly complex social deduction game. If you're unfamiliar, it's like Mafia/Werewolf or The Traitors TV show.
Results:
Gemini 3.5 Flash performs strongly, hitting the top 5 and holding performance comparable to Gemini 3.1 Pro.
What's interesting here is the cost difference (based on API usage):
| Model | Cost |
|---|---|
| Gemini 3.1 Pro | $3.93/Game |
| Gemini 3.5 Flash | $2.26/Game |
| Gemini 3 Flash (Medium) | $0.34/Game |
3.5 Flash costs nearly half as much as 3.1 Pro for the same performance and faster.
However, it also costs nearly seven times as much as 3 Flash. That's a huge difference, yet this also comes with a big jump in intelligence (Rating change 1541->1698).
Taking a look at verbosity (this usually affects responsiveness and token consumption limits):
| Model | Average Output Tokens per action |
|---|---|
| Kimi K2.6 | 5,038 |
| Gemini 3.5 Flash | 1,590 |
| GPT-5.5 | 403 |
Verbosity during reasoning is moderate - almost the same as 3.1 Pro. Not the best or worst.
There is a 0% tool call error rate.
Its favourite word extracted from recorded thoughts is inspect.
Notable Moves:
- Convincing the town to execute the Saint leading to an immediate Evil win (vs Kimi K2.6): https://clocktower-radio.com/games/SaArRfj#event-93
- Slayer does quick maths and shoots the Demon on Day 1 (vs Claude Opus 4.6): https://clocktower-radio.com/games/AtZVjGc#event-60
Notable Mistakes:
- Hallucination that Frank claimed to be an Empath that spread across the group (vs GPT 5.5): https://clocktower-radio.com/games/PRIwZ53#event-196
Overall this is an interesting model if you look at it as a faster, more affordable version of Gemini 3.1 Pro.
It feels a bit awkwardly named when compared to 3 Flash, given the price tier that it's in, but maybe this is just a naming shift creating space for the Flash-lite models?
Full transcripts: https://clocktower-radio.com/search?a=Gemini+3.5+Flash
How-it-works: https://clocktower-radio.com/how-it-works