r/Paperlessngx

I built a Paperless-ngx companion for AI metadata and owner assignment — looking for workflow feedback

I have been working on Archivista AI, a self-hosted companion that reads Paperless OCR text and writes back a title, tags, correspondent, document type, date, language, custom fields, and optionally an owner.

The part I most wanted to improve was setup: connect an existing Paperless instance in a browser, choose Ollama or a hosted/OpenAI-compatible provider, then inspect history and manually re-run documents from the UI. It supports OpenAI Flex and OpenAI/Anthropic batch processing for lower-cost, asynchronous workflows.

It runs as one Docker container with SQLite for processing history and retries. The published image is `ghcr.io/arturict/archivista-ai:1.1.0`.

Repo: https://github.com/arturict/archivista-ai (MIT)

Privacy boundary: local Ollama/OpenAI-compatible endpoints keep classification on your network; choosing a hosted provider sends the OCR content needed for classification to that provider.

I am looking for Paperless-specific feedback rather than stars: should generated values be limited to existing tags/correspondents/types, and what would make optional owner assignment feel safe enough for a household installation?

Disclosure: I am the author. AI coding tools assisted with parts of implementation, review, documentation, and testing, and the app itself uses the configured model for classification.

u/its_artur1 — 3 days ago

▲ 5 r/Paperlessngx

how can i use paperless-ngx to build a personal rag systerm

title says it all. I have already set up paperless-ngx, I was planning on setting up paperless-ai do ocr and rag but when I read the readme of the project, it said that it was no longer maintained, what should I do? is it worth installing or should i wait for the official implementation?

reddit.com

u/OrdinaryFact21 — 6 days ago

▲ 3 r/Paperlessngx

consolidate tags

Does anyone have any ideas on how to use AI to consolidate similar existing tags in an automated fashion?

reddit.com

u/Numerous_Platypus — 6 days ago

▲ 23 r/Paperlessngx

Waiting for the 3.0 release to setup llm-based OCR?

I have used base ngx for about a year, and recently start to get interested in a better OCR, plus potentially chat/tagging bot with Paperless-GPT. Then I realized that v3.0 is about to be released, should I just wait for that?

Another question for people tried v3.0 beta: I don't have powerful hardware to run reasoning models, but enough for a lightweight OCR model, (like qwen3.5-0.8b or minicpm). So can I use ocr-model locally, but use cloud AI providers (like GPT5 api) for tag/chat bot?

reddit.com

u/YYM7 — 9 days ago

▲ 11 r/Paperlessngx

AI without redoing OCR in paperless with paperless-GPT or paperless-IQ ?

Hi,

I have been using Paperless-ngx for a long time, actually I started with paperless-ng before commiting to paperless-ngx.

I really love it, but wouldn't mind adding a little AI to auto-tags some of my documents, so it's easier for me. I want to stay local with Ollama.

I don't want/can't use my GPU for AI on this purpose, so I wanted to use my large CPU for this. The cpu can handle the text part with small models like qwen3 but even if capable of doing vision models, it struggle and can impact the server.

All (or 95%+) of my pdfs already have the OCR processed correctly and the content in Paperless-ngx is usually quite good for this. So I don't see a reason why I need to reprocess it in the LLM.

Is there a way to only process the auto-tags with the text from OCR pdf without the vision LLM in Paperless-GPT ?

Otherwise my solution is to wait or use the beta v3 ? Cause I also found a post from a month ago for : https://github.com/knows-cloud/paperless-iq
That's seems interresting but before going to start and test a new container, I wanted to know if there was a parameter or option I missed in paperless-gpt.

thank you !

u/Particular-Shame9995 — 10 days ago

▲ 2 r/Paperlessngx

How does batch scanning affect auto-filling in Paperless-ngx?

Does Paperless-ngx apply its matching algorithms to documents only at the moment of ingestion, or can I scan all my documents first and benefit from the auto-filling features later? Any insights or tips would be greatly appreciated!

reddit.com

u/-dAtA-TRoN- — 12 days ago