u/rtchau

I've been running a persistent local agent for about 2 months - hundreds of sessions, mix of local models (llama.cpp/vLLM/lmstudio) and paid (Claude). One of the things that has been driving me nuts with OpenClaw and Hermes is the way memory/context starts to act up past a certain point. The messier issues are what the memory system does wrong:

Problem 1: Stale memories that look confident

After a few weeks, my agent accurately remembered how my setup was configured - as of 3 weeks ago. The retrieval score was high, there was no signal that the memory was wrong... it just injected it and confidently talked about hardware I'd already replaced. I had to grind the point home that this particular hardware fact was no longer relevant.

I was using a very capable LLM under the agent (Claude Sonnet 4.6) and asked it to start curating its memory a little more carefully (I figured feeding it its own dog food and telling it when things didn't make sense might make for a novel learning approach). After a few rounds of frustration/brainstorming/epiphany, we landed on a contradiction detector: if a newer episode covers the same ground (cosine sim ≥ 0.75, >1 day newer), the injected context leads with \[POSSIBLY OUTDATED - N weeks later: ...\] and surfaces the newer summary instead. The agent knows it might be wrong, not just that it remembers something.

Problem 2: Roleplay/fiction bleed

I do both technical work and creative sessions with the same agent. BGE cosine similarity doesn't care whether two sessions are about "debugging a network config" or "assembling the Nine Heretics of Uzúd'Bog for a marketing/networking seminar" - it'll return the fiction one if the similarity score is higher. Fix was essentially a 50+ keyword heuristic filter (pure string matching, O(1), runs before any embeddings) that keeps anecdotal/fictional sessions out of factual recall. Seems like an obvious problem to have but I haven't seen it in any other library.

Problem 3: Retrieval on every turn

Full embedding lookup every turn is wasteful - most turns don't need episodic context, unless you're deliberately prompting the agent to backtrack to an earlier topic in the session. Fix is a two-tier store: numpy hot path (<5ms) for cosine search over cached summary embeddings; SQLite (for now) cold path only triggered above a similarity threshold. For zero added turn latency, fire the retrieval lookup after the previous turn ends (background thread), cache it, drain it before the next API call. Works cleanly in Hermes and OpenClaw, haven't tested any other agents.

The context bloat was particularly infuriating... verbosity = $200 Anthropic credit gone in 24hrs. Compression = horrible recall, and tons of confabulation from smaller models ("why yes, I DO recall that day, it was a warm Tuesday in spring....")

The library: https://github.com/f00stx/episodic-memory

I use it specifically for Hermes, but it should be useable for any agent layer with plugin functionality (like OpenClaw).

$ pip install git+https://github.com/f00stx/episodic-memory

from episodic_memory import RecallEngine
engine = RecallEngine(store_path="~/.my_agent/memory")
result = engine.query("what GPU setup did we land on?")
if result:
  print(result.context_injection())  # inject into system prompt
if result.is_superseded:
  print(f":warning: Superseded {result.supersession_age_gap_str} later")

No external services - SQLite only (considering adding Postgres and MySQL support for team setups). Embeddings handled by BGE-small-en-v1.5 by default (133MB - I'm using BGE-large locally, but small should be fine). Docker REST service included for multi-agent setups.

Curious whether others have hit the contradiction detection problem specifically. Mem0 and LangChain memory don't address it as far as I can tell - happy to be corrected. I've also taken Honcho and Hindsight for a spin and they didn't seem to help much.

DISCLAIMER: As always, back up your sessions before trying a new memory store.

I built an episodic, 2-tier memory for long-running local AI agents - temporal contradiction detection, fiction/roleplay filter, no vector DB required.