▲ 19 r/rust
Show : harumi — pure Rust PDF library with automatic CJK font subsetting, zero C dependencies
The Rust PDF ecosystem has a gap: lopdf is low-level with no font subsetting, printpdf only creates new PDFs, and anything with real CJK support tends to pull in C/C++ bindings that break WASM and Lambda deployments.
I built harumi to fill that gap.
What it does:
- Edits existing PDFs (append-only; original structure is preserved)
- Embeds CJK fonts with automatic subsetting — you call one function, it handles the full pipeline at
save()time: glyph collection, TTF subsetting, GID remapping, ToUnicode CMap generation, CIDFont object graph - Invisible text layers for searchable/OCR PDFs (Tesseract/hOCR coordinate converters included)
- Merge, split, rotate, reorder pages
- JPEG/PNG embedding (PNG transparency via SMask)
- In-memory output via
save_to_bytes() - WASM compatible — pure Rust, zero C deps
Quick example:
let mut doc = Document::from_file("scanned.pdf")?;
let font = doc.embed_font(include_bytes!("NotoSansCJK-Regular.ttf"))?;
doc.page(1)?.add_invisible_text(
"This text will be searchable",
font,
[72.0, 700.0],
12.0,
)?;
doc.save("searchable.pdf")?;
Dependencies: lopdf (PDF object graph), allsorts (TTF subsetting), ttf-parser (font metadata). No C, no bindgen.
Verified with Noto Sans CJK for Japanese, Simplified Chinese, Traditional Chinese, and Korean.
GitHub: https://github.com/kent-tokyo/harumi crates.io: https://crates.io/crates/harumi
Happy to answer questions about the font pipeline or the CJK subsetting approach specifically.
u/kent_tokyo — 1 day ago