u/Gailenstorm

Finetuned Qwen3.5-4B: NuExtract3 released, open-weight 4B VLM for Markdown, OCR and structured extraction

Disclaimer: I work for Numind, the company behind this open-weight model

We just released a 4B model based on Qwen3.5-4B, under Apache-2.0 license. The goal is to make information extraction from complex documents more practical with an open model: PDFs, screenshots, forms, tables, receipts, invoices, multi-page documents, and other visually structured inputs.

Many thanks to the Alibaba Qwen team behind this wonderful model, it was fairly good at OCR from the get-go and the finetuning process was a bliss since most libraries integrate their architecture fast.

Try it, we have a huggingface space that is completely free (you don't even have to sign-up): https://huggingface.co/spaces/numind/NuExtract3

If you ever used NuMarkdown, NuExtract3 is the successor.

There are some examples to guide you. Feel free to re-use this model for any task.

https://preview.redd.it/hrw7bc6m6o2h1.png?width=1080&format=png&auto=webp&s=c2d80fd4404b2157d87e7dda976adcdc6a36b5bf

https://preview.redd.it/oqls4xnm6o2h1.png?width=1080&format=png&auto=webp&s=494a6c7792fa6d6c0ab221259b61e0a2673e3131

A few things it is designed for:

converting document images to Markdown
extracting structured data from documents using a target json template
handling tables, forms, and layout-heavy pages
working with both text and visual document inputs
serving as a local/open-weight alternative for document extraction pipelines

It was trained on a node of 8xH100 for 3 days to train on as much context as we could, so it should perform fairly well even on long document. For Markdown, we'd still recommend going page by page for the best results and inference speed, since you can parallelize better this way.

It's very easy to self-host, since we provide fairly extensive documentation, Safetensors, GGUF and MLX weights. With as little as 4GB of VRAM, you should be good to go. We provide multiple quantizations (GPTQ, W8A8, FP8, Q4, Q6...) so you should be able to run it anywhere.

We mostly tried vLLM, SGLang, llama.cpp.

We have a blog post and a pretty decent model card:

I'm currently writing a paper on this model so I'll post it as soon as it's accepted. It's not yet on Arxiv yet as it has been submitted in a peer-review journal/conference.

I'll try to answer as many questions as possible if you have any. We would really appreciate feedback from the community.

We also have a discord if you're interested
https://discord.com/invite/3tsEtJNCDe

reddit.com

u/Gailenstorm — 24 hours ago

▲ 15 r/Qwen_AI

Artificial Analysis: Qwen3.7 Max - Intelligence, Performance & Price Analysis

Pretty good results overall, scored 56.6 on their index, an almost 5 points increase over Qwen3.6 max

And the least hallucinations of all the models they tested

Can't wait for the open-weight versions

https://xcancel.com/ArtificialAnlys/status/2057374452883788196#m: "The Intelligence Index gains over Qwen3.6 Max Preview are concentrated in scientific reasoning, agentic capability and coding. CritPt +9.7 p.p (3.7% to 13.4%), HLE +9.2 p.p (28.9% to 38.1%), TerminalBench Hard +6.9 p.p (43.9% to 50.8%) and GDPval-AA +42 Elo (1504 to 1546). Scores on other benchmarks in the Intelligence Index are flat compared to Qwen3.6 Max Preview"

artificialanalysis.ai

u/Gailenstorm — 24 hours ago

▲ 3 r/huggingface+1 crossposts

NuExtract3 released: open-weight 4B VLM for Markdown, OCR and structured extraction (self-hostable) [P]

Disclaimer: I work for Numind, the company behind this open-weight model

Try it, we have a huggingface space that is completely free (you don't even have to sign-up): https://huggingface.co/spaces/numind/NuExtract3

If you ever used NuMarkdown, NuExtract3 is the successor.

There are some examples to guide you. Feel free to re-use this model for any task.

https://preview.redd.it/vig784ikyn2h1.png?width=1672&format=png&auto=webp&s=86c342680736ade78d3e42374e360dbf312f8f39

https://preview.redd.it/68c43zjjyn2h1.png?width=1758&format=png&auto=webp&s=b4848940d96fc1070a64279a4e7adf0abdff4aaa

A few things it is designed for:

converting document images to Markdown
extracting structured data from documents using a target json template
handling tables, forms, and layout-heavy pages
working with both text and visual document inputs
serving as a local/open-weight alternative for document extraction pipelines

We mostly tried vLLM, SGLang, llama.cpp.

I'm currently writing a paper on this model so I'll post it as soon as it's accepted. It's not yet on Arxiv yet as it has been submitted in a peer-review journal/conference.

I'll try to answer as many questions as possible if you have any. We would really appreciate feedback from the community.

reddit.com

u/Gailenstorm — 1 day ago

▲ 28 r/MachineLearning

NuExtract3 released: open-weight 4B VLM for Markdown, OCR and structured extraction (self-hostable) [P]

Disclaimer: I work for Numind, the company behind this open-weight model

Try it, we have a huggingface space that is completely free (you don't even have to sign-up): https://huggingface.co/spaces/numind/NuExtract3

If you ever used NuMarkdown, NuExtract3 is the successor.

There are some examples to guide you. Feel free to re-use this model for any task.

https://preview.redd.it/pm2xbooyxn2h1.png?width=1672&format=png&auto=webp&s=1a8a7b262190c8325159496dae98c3d2dfab493c

https://preview.redd.it/b5z7ylfzxn2h1.png?width=1758&format=png&auto=webp&s=a07b3abd6e5065c2635de047bdf154357f903e4c

A few things it is designed for:

converting document images to Markdown
extracting structured data from documents using a target json template
handling tables, forms, and layout-heavy pages
working with both text and visual document inputs
serving as a local/open-weight alternative for document extraction pipelines

We mostly tried vLLM, SGLang, llama.cpp.

We have a blog post and a pretty decent model card:

I'm currently writing a paper on this model so I'll post it as soon as it's accepted. It's not yet on Arxiv yet as it has been submitted in a peer-review journal/conference.

I'll try to answer as many questions as possible if you have any. We would really appreciate feedback from the community.

We also have a discord if you're interested
https://discord.com/invite/3tsEtJNCDe

reddit.com

u/Gailenstorm — 1 day ago

Finetuned Qwen3.5-4B: NuExtract3 released, open-weight 4B VLM for Markdown, OCR and structured extraction

Artificial Analysis: Qwen3.7 Max - Intelligence, Performance &amp; Price Analysis

NuExtract3 released: open-weight 4B VLM for Markdown, OCR and structured extraction (self-hostable) [P]

NuExtract3 released: open-weight 4B VLM for Markdown, OCR and structured extraction (self-hostable) [P]

Artificial Analysis: Qwen3.7 Max - Intelligence, Performance & Price Analysis