
Anyone used QWEN3-VL for OCR and information extract on old documents?
Hi 👋 Recently I tried QWEN3-VL-30B API to test reading texts and returning required information from old type-written documents - as a test before I download and use it locally.
When I used it for reading from paragraph-format document, it was very accurate. However, when I tried paragraph & table format document, it made hallucination and mixed up texts from different rows which returned wrong outputs. (I attached the sample page below)
I am thinking between 1) should I move to another version, not VL model? but I need multi-modal input for this project. 2) should I try harnessing engineering? (I have only used prompt-wise ways) If so, what would be the best way? 3) OR should I move to totally different model?
Constraints are: a) I need FREE model which can be downloaded to my pc and locally run.
b) I need multi-modal input (image/pdf & text (prompt). c) I will buy physical GPU with probably 24GB VRAM or little higher, but not super fancy one.
Any insight would be very appreciated! Thanks!
-----------sample page--------