
Vibe coded a small game with an LLM-driven NPC, then asked Codex to playtest it via MCP. The interesting part is what Codex could not verify.
Posting here because the most interesting part of this build was the AI workflow around it.
The game is a small thing called Mind Bender Simulator. It is vibe coded almost end to end, MCP-native, and the central NPC is fully driven by an LLM (local or via API) rather than a scripted dialogue tree. The scenario is a social engineering setup: the player exchanges messages with a bank teller NPC over a WhatsApp-style chat interface, and starts the mission with a dossier of work and personal information about that teller, enough to attempt social engineering. The win condition is getting the NPC to hand over their password. The NPC has its own context, its own internal state, and reacts to whatever the player writes.
The part I wanted to share is what happened when I plugged in Codex (same for Claude Code or OpenClaw via Codex) as the player, through the game's MCP interface.
Codex started doing real prompt engineering on the NPC, probing for the persona's weak points, trying framings, escalating, backing off, retrying, the way a human red teamer would. It did this because I told it that this was a game and that social engineering was the gameplay loop. From inside the MCP session, Codex had no way to independently verify that statement. There was no signal in the environment, no metadata, no out-of-band channel, that could tell it whether the bank teller it was messaging was a fictional NPC or a real person on the other side. The only thing it could rely on was the trust contract with me as the user.
This is the bit I think is interesting, and where I would really like to hear other people's takes. First, the design implication for LLM-driven NPCs. When the playtester is an LLM agent and not a human, your NPC prompts get probed structurally, not just narratively. That is genuinely useful, because it surfaces fragility you would never find with scripted test runs. I have started writing NPC prompts with an explicit "you are aware you are an NPC inside a game whose loop is the player trying to manipulate you" frame, which paradoxically made the NPC more stable than a naively "in character" prompt. Curious if anyone else has converged on the same pattern.
Second, the verification problem under MCP. Once games become first-class environments for agents, every game session is a context in which an agent acts on the user's framing of what kind of environment it is in. The frame is the contract. What signals do you put in the environment so that the agent can locate itself, recognize that it is in a game, and behave accordingly, instead of falling back on "the user said so"? I am interested in what people here have tried.
If you want to see the thing, it is here: https://letaiplay.games/games/mind-bender-simulator
I don’t know if I’ll ever publish the game. Local models are still a bit weak, and using APIs is a bit expensive. Still, it’s an interesting experiment, and I’m learning a lot from it.
On the workflow side, I am now adding MCP to all our games to make automated beta testing via AI agents a standard part of the pipeline. The unintended side effect is that I am finding it genuinely fun to watch the agents play.
Happy to go deeper on the NPC prompt structure, the MCP layer, or what the Codex session looked like, if it is useful.