
AI memory might not be supposed to live inside the context window at all. A 64-number side state beat methods 600x its size on every benchmark
every frontier lab is solving long-context memory the same way. make the window bigger. 1M tokens, 2M, 10M. more compute, more memory, more brute force. but there's a known issue called "context rot", even with huge windows, models lose track somewhere in the middle. bigger isn't actually fixing it.
a team from Nanyang Tech, Fudan, and Shanghai Jiao Tong asked something different. what if memory was never supposed to live inside the context at all?
they built δ-mem. delta-mem. an 8x8 matrix (literally 64 numbers) sitting OUTSIDE the model. it stores associations from the conversation as the model goes. base weights never touched. when the model is wrong about what comes next, the matrix updates to reduce that error. that's where the name comes from, delta means difference.
the experiment:
- long conversation with the model
- delete the entire chat
- ask questions about what just got deleted
baseline model: 0.08% (basically zero, the info is gone) with δ-mem: 6.48%
memory was sitting in 64 numbers the whole time. not in the context.
δ-mem is 4.87M params (0.12% of the base model). MLP Memory, a competing approach, needs 3 BILLION params to do roughly the same job. δ-mem is 600x smaller and beats it on every benchmark they ran.
i wrote a longer take on what δ-mem does mechanically and why this might be a bigger deal than the absolute numbers suggest, if anyone wants to go deeper: https://ninzaverse.beehiiv.com/p/research-rethinking-how-ai-memory-actually-works