u/kent_tokyo

▲ 19 r/rust

Show : harumi — pure Rust PDF library with automatic CJK font subsetting, zero C dependencies

The Rust PDF ecosystem has a gap: lopdf is low-level with no font subsetting, printpdf only creates new PDFs, and anything with real CJK support tends to pull in C/C++ bindings that break WASM and Lambda deployments.

I built harumi to fill that gap.

What it does:

  • Edits existing PDFs (append-only; original structure is preserved)
  • Embeds CJK fonts with automatic subsetting — you call one function, it handles the full pipeline at save() time: glyph collection, TTF subsetting, GID remapping, ToUnicode CMap generation, CIDFont object graph
  • Invisible text layers for searchable/OCR PDFs (Tesseract/hOCR coordinate converters included)
  • Merge, split, rotate, reorder pages
  • JPEG/PNG embedding (PNG transparency via SMask)
  • In-memory output via save_to_bytes()
  • WASM compatible — pure Rust, zero C deps

Quick example:

let mut doc = Document::from_file("scanned.pdf")?;
let font = doc.embed_font(include_bytes!("NotoSansCJK-Regular.ttf"))?;
doc.page(1)?.add_invisible_text(
    "This text will be searchable",
    font,
    [72.0, 700.0],
    12.0,
)?;
doc.save("searchable.pdf")?;

Dependencies: lopdf (PDF object graph), allsorts (TTF subsetting), ttf-parser (font metadata). No C, no bindgen.

Verified with Noto Sans CJK for Japanese, Simplified Chinese, Traditional Chinese, and Korean.

GitHub: https://github.com/kent-tokyo/harumi crates.io: https://crates.io/crates/harumi

Happy to answer questions about the font pipeline or the CJK subsetting approach specifically.

reddit.com
u/kent_tokyo — 1 day ago