u/Disastrous_Abies8659 — reddlx

Hi r/OpenSourceAI,I'm working on a research prototype called TreeMemory — an external hierarchical memory system designed to solve one of the biggest pain points in current RAG/long-term memory: context contamination.Instead of throwing all facts into one flat pool, TreeMemory organizes knowledge into semantic branches. This keeps retrieval clean and updates highly localized.Simple example:

"Michelin" tires → artifacts/vehicles/car_tires
"Michelin" stars → culture/food/restaurants
"Python" code → artifacts/computing/python_code
"Python" snake → living/reptiles/python_snake

Benchmark Results (google/flan-t5-small)LoRA vs TreeMemory comparison:

Strategy	Accuracy
No Context	0.031
Flat Context	0.625
Gated Tree Context	0.906
LoRA Only	0.094
LoRA + Gated Tree	0.938

Natural Query Benchmark:

Strategy	Top-1 Accuracy	Context Contamination ↓
Flat Retrieval	0.746	0.818
Gated Hybrid Tree	0.797	0.131

Main Takeaway: LoRA by itself performed surprisingly poorly as a factual memory store in this test. TreeMemory alone gave a very strong boost, and combining both approaches achieved the best result.This suggests that LoRA and hierarchical external memory are complementary — LoRA for style/behavior, TreeMemory for clean, updatable factual knowledge.Caveats:

Synthetic + semi-synthetic dataset
Small model (flan-t5-small)
Early prototype (currently lexical routing)
LoRA baseline is simple (not heavily tuned)

Repo + 1-click Colab demos:
https://github.com/g1g4b1t/tree-memoryI'm looking for honest feedback from the community:

Is the LoRA comparison fair as a first baseline?
What stronger baselines would you like to see?
Next step: embeddings + LLM reranker or something else?
What would make this kind of memory benchmark more convincing?

Would love to hear your thoughts!