Follow-up to my earlier posts on omnivoice-triton (NAR, 3.4× speedup) and qwen3-tts-triton (AR, with kernel-fusion drift mitigation). The libraries themselves are unchanged; this update is about the deployment surface.

ComfyUI is increasingly used as a node-graph runtime for AV pipelines (image → video → lipsync). I kept getting asked how to slot Triton-fused TTS into those graphs without a separate gRPC service. So I shipped both as official Comfy Registry nodes.

What ships

ComfyUI-Qwen3-TTS-Triton v0.2.0

Qwen3TTSCustomVoice, Qwen3TTSVoiceClone
7 inference modes covering Triton kernel fusion + TurboQuant KV cache combinations

ComfyUI-Omnivoice-Triton v0.1.0

OmnivoiceTTSAuto, OmnivoiceTTSVoiceClone, OmnivoiceTTSVoiceDesign
6 inference modes (Base, Triton, Triton+Sage, Faster, Hybrid, Hybrid+Sage)
Streamlit A/B dashboard still bundled in the lib

Why it’s a meaningful packaging step

Inference modes are exposed as ComfyUI parameters → no code changes needed for ablation in production-shaped graphs
Per-task nodes (Auto / Voice Clone / Voice Design) keep the ComfyUI graph readable instead of a 30-input monolith
Workflow JSONs included; reproducible across machines

Numbers preserved from the lib release

Omnivoice: 572 ms → 168 ms (~3.4×), Speaker Similarity 0.99 (RTX 5090)
Qwen3-TTS: identical kernels to the standalone PyPI release

What I’d still love feedback on

Real-world latency numbers from A100/H100/Ada under graph-based serving (vs. direct Python loop)
Anyone integrating these into a streaming serving stack (Triton Inference Server, vLLM-style schedulers) — would value engineering input on chunked-output behavior

u/DamageSea2135

What ships

Why it’s a meaningful packaging step

Numbers preserved from the lib release

What I’d still love feedback on

Links