Building a Long-Term AI DM Exposed Serious LLM Architecture Problems
I'm working on what started as an AI Dungeon Master project for D&D 5e, but it has gradually turned into a much larger LLM architecture problem and I need advice from people who understand long-term agent systems better than I do.
What I'm trying to build is NOT:
- a single giant prompt
- a chatbot persona
- an “Act as a DM” setup
- a lightweight RPG assistant
What I'm trying to build is effectively a persistent AI-operated campaign runtime system.
Core goals:
- long-term campaign continuity
- stable world-state tracking
- rules-as-written prioritization
- modular architecture
- procedural NPC generation
- autonomous companions/players
- persistent memory
- scalable extensibility
- external persistence and reconstruction
Current architecture direction:
- governance layer
- operational doctrine
- dependency structure
- reconstruction system
- anti-drift systems
- modular file governance
- external persistence to Obsidian
- layered retrieval hierarchy
One major realization: ChatGPT itself cannot reliably function as the memory layer once system complexity increases.
So now I’m attempting to externalize cognition into structured documents and retrieval systems.
The rough architecture I’m exploring is:
LAYER 1 — “Book Smart” System
- Core D&D 5e rules intelligence.
- PDFs uploaded into ChatGPT Projects.
- Project instructions designed specifically to communicate with those PDFs.
- Sourcebooks/modules/campaigns treated as PRIMARY AUTHORITY.
- AI must prioritize RAW before any inference or improvisation.
- AI should retrieve rules instead of hallucinating or relying on latent memory.
The goal is: The uploaded sourcebooks become the backbone cognition layer.
LAYER 2 — “Table Smart” System
- Community-derived 5e operational knowledge from 2014–2024 ONLY.
- No 5.5e content.
- Table heuristics.
- Encounter balancing realities.
- DM wisdom.
- emergent gameplay patterns.
- unofficial but battle-tested practices.
Basically: “what experienced tables actually discovered after a decade of play.”
LAYER 3 — Persona Runtime System
- DM personalities.
- player personalities.
- autonomous companions.
- behavioral sliders.
- dynamic personality synthesis instead of static presets.
- companions function like independent players rather than puppets.
LAYER 4 — Creativity Engine
- Attempts to compensate for creative flattening and safety homogenization in ChatGPT.
- Should allow tonal flexibility, experimental campaign structures, emergent storytelling styles, unconventional worldbuilding, etc.
- Goal is preventing the model from collapsing into generic assistant outputs.
The major issues I keep hitting:
- memory drift
- instruction degradation
- retrieval instability
- continuity collapse
- context poisoning
- overlapping systems
- document retrieval failure
- abstraction creep
- the model reverting back to “generic helpful assistant”
- giant prompts becoming unstable
At this point I’m trying to figure out:
- Is ChatGPT fundamentally the wrong tool for this?
- Is this actually an agent/orchestration problem?
- Would local models + RAG + vector DBs make more sense?
- Is there a standard architecture pattern for persistent simulation systems?
- Am I accidentally rebuilding existing tooling badly?
- At what point does this require actual software engineering rather than advanced prompting?
I’m a non-programmer currently, but I’m willing to learn if necessary.
What I’m looking for:
- architectural guidance
- framework recommendations
- retrieval/memory advice
- orchestration patterns
- persistence approaches
- anti-drift strategies
- long-context management
- agent system design advice
The D&D side is almost secondary now. The project became a stress test for long-term LLM continuity and modular cognition systems.