Built an invoice-scanning service for our accounting team in one afternoon with Claude — sharing the architecture in case it helps someone else
Our AR team was hand-keying ~25 invoices a week into a spreadsheet. I had Claude build us a Python service that watches a network folder, extracts invoice data from any PDF dropped in (vendor, dates, totals, line items, addresses), and appends a row to a shared Excel register. Total chat-to-deployed time: about half a day, including all the deploy headaches.
The architecture, for anyone who wants to replicate this:
- Python service on our Windows file server, registered with NSSM. Auto-starts with the host.
- watchdog library polls the SMB share for new PDFs. Each new file goes through a pipeline.
- Two-tier extraction: per-vendor regex templates first (free, instant, deterministic), then Azure AI Document Intelligence "prebuilt-invoice" model as a universal fallback. Azure handles OCR for scanned PDFs natively, so the same flow works whether AR drops a digital PDF or our MFP scans one from paper.
- SQLite on the local disk is the source of truth. The shared .xlsx is a curated view that gets appended to on each batch. Delete the .xlsx and it'll repopulate fresh from the next batch — handy for resetting.
- Failed extractions go to a
Failed\folder with a sibling.error.txtexplaining why.
Cost reality check: Azure DI free tier covers 500 pages/month. At our volume (~25 invoices/week, mostly 1-2 pages) that's well under the cap. Paid tier is roughly $0.01–$0.05 per page. Cheap enough that I don't think about it.
Gotchas I ran into so others don't have to:
- Azure returns addresses as structured objects, not strings. If you naively
str()them you get the raw Python dict repr in your spreadsheet. Format them manually fromstreet_address/city/state/postal_code. - On Windows Server, PowerShell 7's
Restart-Servicecan throw "Cannot open service" against NSSM-wrapped services for no good reason. Usenssm restart <name>instead. - Python 3.14 is so new that some package wheels aren't published for it yet. Stick with 3.12 for production.
- Tracking "what's new this batch" is way simpler than maintaining a watermark in DB. Just snapshot
MAX(invoice_id)before and after the batch, and only project that range to the spreadsheet.
Things I'd add if/when I have time: vendor templates for our top 5 recurring vendors (cuts Azure cost to zero for those), a daily canary PDF for monitoring, swap the LocalSystem service account for a dedicated low-privilege one.
Happy to answer questions about any specific piece. The whole thing is ~1,500 lines of Python plus a deploy script.