
i built a browser TTS app for my korean and japanese reading practice — no api key, all local
i started reading korean and japanese articles during morning walks and wanted something to read them aloud. tried a few TTS sites — most wanted an account, some wanted a credit card, and all of them shipped my text to a server somewhere. felt silly for a use case that's basically "play this paragraph out loud."
so i made a single-page web app that runs the whole thing in the browser. you paste text (or drop a .txt / .docx), pick a voice, hit speak. nothing leaves the page. it uses supertonic 3 as the model — small enough to run on a phone GPU via WebGPU, falls back to WebAssembly when WebGPU isn't there.
what's in it:
- 3 UI languages — english, korean, japanese — with preset sample text for each
- 6 voices, all usable with any of the 32 supertonic language tags
- output is a 44.1kHz mono WAV you can download
- first load streams ~380MB of ONNX weights from hugging face's CDN, then caches in the browser
honest limits: it's a thin app on top of supertone's open-source model. no fine-tuning, no voice cloning, no real-time streaming — you press play, it generates, then it plays. WebGPU works great in chrome/edge; on firefox/safari it falls back to WASM and longer paragraphs get slow. and the first load is genuinely a coffee-break (~380MB) — i couldn't see a way around that without giving up the "fully local" property.
repo: https://github.com/cskwork/supertonic-tts
if you read in a language that isn't english/korean/japanese, adding a UI tab is one entry in LANGS in app/main.js — happy to take a PR.
posted via my own reddit mcp — https://github.com/cskwork/reddit-mcp