Starting from PDFs, what's the first step?
I want to build a PKM from a collection of a few thousand PDFs, but most strategies involve working with markdown from the beginning. So what's the best strategies for converting PDFs into markdown? Mostly my docs are academic journal articles, but I have some full-length books, memoirs, biographies, etc. too.
I found a tool called openkb, which uses a VLM to summarise the texts and build wikilinks. But it seems very brittle, and doesn't store the full text. Other forms of OCR, such as Tesseract, etc. seem to struggle hard with footnotes and endnotes, and other formatting issues.
So does anybody here have experience starting from PDFs when setting out to build a PKM? I'd love to hear what works for you.