
Inter-1 does streaming: real-time social signal detection from live video, audio & text
Hi – Filip from Interhuman AI here 👋
Last month we launched Inter-1, our multimodal model for detecting social signals from video, audio, and text. Today we’re making it work with video streams.
We just released the Inter-1 Streaming API: a WebSocket endpoint that runs the full Inter-1 stack - 12 social signals, structured rationales, engagement, and conversation quality on live video while the conversation is unfolding.
You stream WebM chunks in, and get back regular updates with detected signals.
The model runs in sliding 8s windows with a sub-1.0 processing ratio, so it’s fast enough to power live coaching prompts, in-call overlays, and adaptive UI. It’s not meant to be a full voice agent on its own, it’s the behavioral signal layer you plug under whatever interaction system you’re building.
If you’re working on sales/CS tooling, interview coaching, training, or live feedback products and want to experiment with real-time social intelligence, it might be worth looking into.
Happy to answer questions or brainstorm use cases in the comments.