u/easwee

▲ 3 r/soniox+1 crossposts

Real time speech-to-text at ultra-low latency

soniox-rt-v4 transcribing in real time. This is unedited, actual speed (see linked source video from Gawne). Partial results stream as speech comes in before chunks get finalized with confidence. Huge amount of tokens is not a bottleneck for Soniox.

Video source: https://youtu.be/dncb_5CXE7o?si=tTy2vnGJ71YYg_yy&t=41

u/easwee — 15 hours ago
▲ 3 r/soniox+2 crossposts

Speech-to-speech translation demo: STT + translation + TTS over one API, full code on GitHub

New post on the Soniox blog walking through a reference demo for real-time speech-to-speech translation across 60+ languages.

The whole loop is built on Soniox APIs:

  • Real-time STT (stt-rt-v4): streaming transcription with language ID, mid-sentence language switching, and accurate alphanumerics
  • Real-time translation: not a separate endpoint, just a translation field on the STT config. Tokens stream mid-sentence, tagged original or translation in the same WebSocket
  • Real-time TTS: starts generating audio before the sentence is finished, so the voiceover keeps up with the speaker

Full Python + JS source on GitHub, linked in the post. Good read if you're building voice agents, meeting tools, support call translation, or anything where doing just subtitles breaks the UX.

soniox.com
u/easwee — 6 days ago
▲ 3 r/soniox

Soniox vs Deepgram comparison

Code-switching remains one of the harder problems in speech-to-text, especially on real-world audio. We are often asked how Soniox compares to Deepgram so here’s a comparison of Soniox stt-rt-v4 and Deepgram nova-3 on multilingual speech transcription where speakers overlap.

Both providers have speaker diarization turned on and context keyterms are passed for improved entity recognition of foreign names "İclal" and "Mahya".

u/easwee — 7 days ago
▲ 9 r/VocalSynthesis+3 crossposts

Developers using Pipecat can now use a single speech stack supporting the same 60+ languages for both sides of the conversation (STT + TTS) when using Soniox, which makes building multilingual voice agents even easier.

u/No_Use8389 — 17 days ago
▲ 5 r/soniox+1 crossposts

Major step forward for Soniox - we launched Soniox Text-to-Speech - already available through Soniox API.

Soniox TTS is built for the hardest parts of speech generation:

• Native-speaker-quality speech in 60+ languages
• Hallucination-free speech generation
• Alphanumerics spoken correctly like numbers, IDs, addresses
• Correct pronunciation for names and foreign words
• Ultra-low-latency streaming for real-time voice applications

Read more:
https://soniox.com/blog/soniox-text-to-speech
https://soniox.com/docs/tts/models

u/easwee — 24 days ago