I built a tool that converts technical PDFs into RAG-ready knowledge bases (Obsidian, AnythingLLM, LangChain)

Tired of cleaning PDFs manually before feeding them into RAG pipelines. Built a tool to automate it. Upload PDF → get clean Markdown, heading-aware chunks, and Obsidian vault with backlinks. Each chunk knows where it sits in the document: ```json { "heading_path": "Chapter 3 > Functions", "tokens": 487, "has_code": true } ``` Also has a CLI for batch processing and direct export to AnythingLLM.

reddit.com
u/Existing_Chard_7535 — 4 days ago
▲ 2 r/documentAutomation+1 crossposts

I built a tool that turns technical PDFs into RAG-ready chunks and Obsidian vaults

Tired of cleaning PDFs manually before feeding them into RAG pipelines. Built a tool to automate it. Upload PDF → get clean Markdown, heading-aware chunks, and Obsidian vault with backlinks. Each chunk knows where it sits in the document: ```json { "heading_path": "Chapter 3 > Functions", "tokens": 487, "has_code": true } ``` Also has a CLI for batch processing and direct export to AnythingLLM.

reddit.com
u/Existing_Chard_7535 — 5 days ago