u/bravelogitex

▲ 6 r/codex

/goal is great

I use codex on opencode and sometimes it decides to not implement a plan in one go. It will stop after a couple seconds of thinking. So I just hop on to codex by itself, tell it to create its own goal prompt to achieve x, and then do /goal {prompt}. And it does keeps on going without a hitch.

It's an experimental feature so you have to enable it. Hope you aren't missing out

reddit.com
u/bravelogitex — 1 day ago
▲ 1 r/Rag

How to get the bounding boxes of columns of tables in pdf's

Made a post recently on how to extract tables reliably from pdf's. No clear answers from commentators. I found the camelot python library to work best but it sometimes combines columns as it can't tell columns apart. It has a columns parameter I can pass in to tell it the x coords of where the columns are to guide it.

Wondering if anyone did this before and what solution worked well for it? There are OCR models giving bounding boxes for words but couldn't find one with some searching that does columns.

reddit.com
u/bravelogitex — 2 days ago
▲ 6 r/Rag

How to parse tables from pdfs with 100% accuracy?

I've tried a lot over the past 2w but can't find a simple solution. I basically have pdf's with 100 row tables, and want to extract the tables into csv's. I tried paid online services like extend, reducto, landing, gemini, none are 100% accurate since they are OCR models.

I get accurate text extraction if I use python pdf libraries like pdfplumber/camelot. The problem is that pdf's don't have a standard way of representing tables so the output columns are sometimes combined/split improperly. 2 columns get merged. I tried adjusting some parameters but it either over or under merges columns.

What is the solution to using python libraries properly? It's a pita to solve and I'm surprised it's not easier.

reddit.com
u/bravelogitex — 3 days ago

I just did some texting across various providers and wanted to share my use case. It was construction spec tables, 100 rows max, png's passed in, and my #1 requirement was maximum accuracy (100% is ideal since mistakes can be costly).

I used the following, here they are ranked from best to worst:

  1. Extend - used their playground easy to play around with, it quickly worked at 100% with minimal configuration. Was a surprise because they seemed similar to reducto (used down below).
  2. Gemini - easy to work with, all I needed to pass in was a base64 of the image and a prompt. 100% accurate for less than 50 rows, couple errors started occuring >50 rows.
  3. Reducto - basically extend but 66% accurate. Results were pretty bad, yikes.
  4. Mistral OCR - used it on just 1 png, it didn't return the bottom couple rows for some reason. Stopped using it as missing rows were unacceptable.
reddit.com
u/bravelogitex — 21 days ago