u/InevitableDistinct11

We’re experimenting with a local document verification pipeline using OCR + a small language model (Qwen2.5 1.5B via Ollama), and we’re hitting an interesting issue around consistency validation.

Current pipeline:

PDF/Image

→ OCR extraction

→ cleaned extracted text

→ Qwen2.5 1.5B

→ verification / normalization layer

The OCR itself is working surprisingly well. We’re getting reasonably clean extracted text even from noisy multilingual scans.

The problem starts in the verification stage.

Examples of what we want the SLM to reliably do:

- normalize names

- normalize dates/currency formats

- compare entities across multiple extracted sections

- detect mismatches/inconsistencies

- avoid hallucinating missing values

- maintain deterministic output structure

Example input:

PAN:

Name: Rahul S Shah

DOB: 12/04/1996

Salary Slip:

Employee Name: Rahul Shah

Net Salary: INR 1,20,000

Bank Statement:

Account Holder: Rahul S. Shah

Salary Credits: 120000

Problems we’re seeing:

- inconsistent reasoning between runs

- occasional hallucinated fields

- weak cross-document comparison

- poor long-context consistency

- model sometimes treats semantically identical values as different

- unstable formatting/output

It feels like the model lacks “document context awareness” and structural understanding of what kind of records it is processing.

Questions:

Is this mainly a prompting/context-engineering problem?
Should we move from raw OCR dumps → structured extraction first?
Are smaller models fundamentally weak at entity consistency tasks?
Would rule-engine + SLM hybrid systems work better here?
Should we chunk documents by semantic sections before prompting?
Has anyone had success with constrained decoding / JSON schema enforcement for deterministic verification workflows?
Are there open-source models that perform better specifically for structured document validation/reconciliation tasks?

We’re intentionally keeping everything local/offline, so cloud APIs are not preferred.

Would really appreciate insights from anyone working on:

- document intelligence

- OCR pipelines

- local LLM systems

- entity resolution

- structured extraction

- verification engines

- long-context consistency

Especially interested in architectural lessons learned rather than model benchmarks.

Need a help