u/talatt

I've been running multi-hour OpenClaw sessions and the context window fill-up is my main pain. Native compaction kicks in late (around the threshold) and it's all-or-nothing — once it summarizes, older detail is gone.

What I wanted instead: compress *gradually*, every turn, but keep the last few turns completely raw so the agent doesn't lose the thread it's mid-way through.

I ended up writing a Plugin SDK hook on before_prompt_build that does this — folds older turns into a compressed episodic view, keeps the trailing turns verbatim. On a long session it cut the re-sent context by roughly 80% without the agent losing track of earlier turns.

Two questions for people running long sessions:

Do you rely on native compaction, or roll your own context management?
Has anyone found the right "keep N turns raw" number? I'm defaultingto 4 but it feels workload-dependent.

(If useful, the hook is here — MIT core:

https://github.com/compresh/compresh-mcp — but mostly curious how others are handling this.)

I've been running long sessions with terminal AI agents (specifically OpenClaw) and something occurred to me: we all know the cost of context.

So I developed Compresh - a hook that sits before the model call, keeping the last few rounds raw and converting the old rounds into a compressed "partitioned memory" view. (So nothing important gets lost in the middle of the conversation.)

How it actually works:
- Old rounds → compressed summary (LexRank + modality jump: code
blocks, tool output, terminal dumps folded)
- Last N rounds → kept raw (optimum: 4)
- It runs locally as a Python MCP server; the OpenClaw plugin just calls it.

Actual numbers from my own test conversations:

- 183 out of 187 rounds were compressed.
- Approximately 99,000 characters were saved in the context of a single round.
- The agent was still correctly answering follow-up questions regarding round 12.

It consists of two parts, both of which can be installed today:
- compresh-mcp (Python, compression engine) — pip/pipx

- @compresh/openclaw-hook (OpenClaw plugin) — npm

The compression core (tulbase) is MIT licensed. There is a paid server-side development layer, but native compression (+ Injection Protection) works without it.

GitHub: https://github.com/compresh/compresh-mcp

I'd be happy to answer your questions about the approach. I must say the "keep the last N rounds raw" part had a much more significant output than I expected.

How do you keep long sessions from eating the whole context window?

I built an open-source hook that compresses an AI agent's chat history — ~60% fewer input tokens on long sessions