u/Ok_Insurance_919 — reddlx

OCR Buddy, my fully-local OCR extension, is closing in on 1000 users a week in. Here is what shipped from the Reddit community's feedback

I shipped OCR Buddy about a week ago and it is almost at 1000 users. I did not expect that pace, so mostly I want to say thanks, because most of what shipped since launch came from the Reddit community's feedback after I posted early versions here.

What it does: select any region on a page and pull out the text, code, or math, fully on-device. Nothing leaves the browser. It is built around faithful recognition, so it does not invent text that was not there.

What is new since launch:

- Page to Markdown: export a whole page as clean Markdown, copy or download

- Viewport and full-page capture: OCR the visible area or the entire scrolling page, with the tiles merged back together (handling seams and repeated sticky headers)

- Coloured text on light backgrounds now reads correctly, which was broken before

- A few smaller fixes around restricted pages and first-run model download

If you have more feedback i would love to hear and try to implement, ofc if doable and match with the nature of the extension. (also new version is coming soon 2.5.6 with some small fixes)

If you want to try it or look at the code (it is open source, MIT):

- Site: https://www.ocr-buddy.com/

- GitHub: https://github.com/Fanfulla/OCR-buddy

Happy to answer anything, especially on the in-browser side, that was the hardest part to get right.

u/Ok_Insurance_919 — 10 days ago

▲ 0 r/foss+1 crossposts

I built a Chrome extension that does OCR 100% on-device — code, formulas and tables, nothing leaves your machine

I kept needing to grab text off screenshots code from a paused video, a formula in a PDF, a table from a dashboard and every tool either uploaded my image to a server or used a big AI model that confidently invented text that wasn't there.

So I built OCR Buddy. You drag-select any region of the screen and it reads it locally. No server, no account, no telemetry and models are bundled in the extension and run on your device (WebGPU, WASM fallback). Three modes: plain text/code, formula → LaTeX, and table → Markdown.

The design bet is "faithful over fluent": classic detection + recognition instead of a generative model, so when the image is unclear it shows low-confidence or blank instead of inventing a sentence. The source crop always sits next to the result so you can check it.

Free and MIT. I'm the author, happy to answer anything.

Site: https://www.ocr-buddy.com/ · Code: github.com/Fanfulla/ocr-buddy

u/Ok_Insurance_919 — 18 days ago

▲ 80 r/DigitalEscapeTools+2 crossposts

I built a Chrome extension that does OCR 100% on-device — code, formulas and tables, nothing leaves your machine

I kept needing to grab text off screenshots code from a paused video, a formula in a PDF, a table from a dashboard and every tool either uploaded my image to a server around the globe or used a big AI model that confidently invented text that wasn't there and burnt a lot of tokens

Free and MIT. I'm the author, happy to answer anything.

Site: https://www.ocr-buddy.com/ · Code: github.com/Fanfulla/ocr-buddy

u/Ok_Insurance_919 — 9 days ago

▲ 0 r/LaTeX

Screenshot a formula → get LaTeX back, entirely offline (open-source browser extension)

I made a free Chrome extension that turns a screenshotted equation into LaTeX, running fully on your machine. no upload, no account.

How it handles the obvious trust problem with formula OCR: the predicted LaTeX is rendered with KaTeX right next to the source crop, so you can eyeball the match before copying. If it can't render the output cleanly, it abstains and just shows you the image rather than handing you wrong LaTeX. The model is pix2text-mfr running locally on ONNX Runtime Web.

Honest limit: it's a small local model — solid on clean and moderately complex formulas, can struggle on dense low-res ones. The render-beside-crop check is exactly there for that. It also does plain text/code and tables → Markdown if you need them.

MIT, free. Repo: github.com/Fanfulla/ocr-buddy

u/Ok_Insurance_919 — 18 days ago

▲ 0 r/coolgithubprojects

[TypeScript] OCR Buddy — faithful, 100% local OCR in the browser (code, formulas → LaTeX, tables → Markdown)

A Chrome extension (Manifest V3) that does OCR entirely on-device — no server, no telemetry, models bundled in the extension. Drag-select a region and get the text back: prose/code, single formulas converted to LaTeX, and tables converted to Markdown.

The core idea is anti-hallucination by architecture: it uses classic detection + CTC recognition (PP-OCRv5 on ONNX Runtime Web) rather than a generative VLM, so on ambiguous pixels it fails to blank/low-confidence instead of inventing text. The captured crop is always shown beside the result for verification.

Stack: Vite + CRXJS, ONNX Runtime Web (WebGPU + multi-threaded WASM fallback), KaTeX, highlight.js. MIT licensed.

Repo: github.com/Fanfulla/ocr-buddy

u/Ok_Insurance_919 — 18 days ago

▲ 27 r/alternativeto+5 crossposts

I built a Chrome extension that does OCR 100% on-device — code, formulas and tables, nothing leaves your machine

I kept needing to grab text off screenshots — code from a paused video, a formula in a PDF, a table from a dashboard — and every tool either uploaded my image to a server or used a big AI model that confidently invented text that wasn't there.

So I built OCR Buddy. You drag-select any region of the screen and it reads it locally. No server, no account, no telemetry — models are bundled in the extension and run on your device (WebGPU, WASM fallback). Three modes: plain text/code, formula → LaTeX, and table → Markdown.

Free and MIT. I'm the author — happy to answer anything.

Site: https://www.ocr-buddy.com/ · Code: github.com/Fanfulla/ocr-buddy

u/Ok_Insurance_919 — 1 day ago

▲ 1 r/chrome_extensions

OCR Buddy | A brand new drag-select OCR that runs entirely in the browser, no uploads (MV3, open source) - Privacy First

Sharing an extension I built and use daily. Drag-select any region of a page and it pulls the text out locally — code, prose, formulas (→ LaTeX), and tables (→ Markdown).

A few things that were genuinely tricky in Manifest V3, in case anyone's building something similar:

The OCR engine can't live in the service worker (ephemeral, no DOM), so it runs in a long-lived offscreen document made cross-origin isolated for SharedArrayBuffer + WebGPU.
Capture uses chrome.tabs.captureVisibleTab instead of grabbing a <video> frame — frame-grabbing taints the canvas on cross-origin video, so OCR-ing code off a paused YouTube video would fail. captureVisibleTab returns clean composited pixels.
Models are bundled, not fetched at runtime, so it works fully offline.

Free, MIT, no telemetry. I'm the dev — feedback welcome.

Chrome Web Store: chromewebstore.google.com/detail/ocr-buddy/hfbghdhendbnblgnjgkfmpgokiiddlhj · Code: github.com/Fanfulla/ocr-buddy

https://reddit.com/link/1u52h5z/video/z1salj9j947h1/player

https://reddit.com/link/1u52h5z/video/tmyx77fn947h1/player

reddit.com

u/Ok_Insurance_919 — 18 days ago