u/Sardzoski

We built a model that scores pitch delivery, not just the script, here's what it caught in a real pitch

Following up on our Inter-1 Streaming work (some of you may have seen our earlier post on the hallucination bug we found). This time it's a product demo rather than a research writeup.

The core idea: transcript-based pitch scoring can't tell the difference between a confident claim and a hedged one, because the words on the page can be identical. "We're growing 40% month over month" reads the same whether you believe it or not.

We built a demo that streams video to Inter-1 in real time and scores delivery signals (confidence, hesitation, energy) alongside a content score, each signal tied to the exact moment it happened.

Tested it on my own pitch. Content scored 87. Delivery caught a hesitation landing right on the traction number, confidence at 50, overall dropped to 80.

u/Sardzoski — 5 days ago

▲ 0 r/EntrepreneurRideAlong

My pitch scored 87 on content and 80 overall, the 7 points I lost taught me more than the 87 I got right

Quick story from building my own product. I recorded a pitch to test a demo we'd been working on. The script scored well, an 87. Clean hook, real traction number, clear ask.

Then I saw the delivery score. There was a hesitation right on the traction number and my confidence read at 50. The ask trailed off at the end too. Overall score: 80.

That gap is basically the whole insight behind what we built. Investors don't fund the doc, they fund the person in the room, and no pitch deck feedback catches that because it's not in the transcript.

We built a tool that scores both halves side by side using Inter-1, our AI model for reading delivery signals from video. If you've ever felt like your pitch reads better than it lands, this might show you why.

reddit.com

u/Sardzoski — 5 days ago

▲ 5 r/SideProject

Built a demo that scores how you deliver a pitch, not just what you write

I recorded a pitch to test this. Read as a transcript, it scored an 87. Good hook, real traction number, clear ask.

Then I watched the delivery score. A hesitation landed right on my traction number, confidence dropped to 50, and the overall came out to 80. The gap between those two numbers is the whole reason I built this.

It's called The Pitch Practice, built on top of Inter-1 Streaming (the model my team builds at Interhuman). You record a pitch to your webcam, and it scores your delivery in real time: confidence, hesitation, energy, each tied to the exact moment it happened, alongside a content score from the transcript.

Stack: WebSocket streaming to Inter-1, event-driven timeline on the frontend, scores update while you're still talking.

Would love feedback, especially from anyone who's pitched investors and knows where the script and the delivery tend to split.

reddit.com

u/Sardzoski — 5 days ago

▲ 367 r/aicuriosity+3 crossposts

We chased a hallucinated quote through 30k training records, 4,600 transcripts, and our own system prompt. Turned out to be two separate bugs

Some of our customers noticed Inter-1 (our omni-modal social-signal model) would occasionally "hear" a quote that didn't exist. Feed it a video with zero audio and ask what was said, and it would sometimes report: "Yeah, Friday at five." Verbatim. Same line, every time.

We assumed it had to be baked into the training data somewhere, so we went looking everywhere:

30,960 training records with datetime mentions → zero hits on the phrase
4,603 video transcripts → zero hits
~800 inference probes, 584 storage objects → zero hits

Turns out the phrase was sitting in our own system prompt — a worked example we'd written to show the model the expected output format, buried in a version our GEPA prompt-optimizer had shipped.

But that only explained where the words came from, not why the model would say them over total silence. So we ran two ablations in our internal eval harness:

Swap the word, keep the model: changed the prompt's example to "Tuesday at noon." Fabrication rate went up (37%→50%), and the invented quote tracked the swap exactly — Friday→Tuesday.
Swap the model, keep the prompt: ran the same byte-identical prompt through larger variants and an earlier checkpoint of our own model. They barely fabricated (0–2%). Only the further-post-trained Inter-1 confabulated at ~12%.

So it's not one bug, it's two stacked priors: the prompt supplied the script, but post-training is what gave the model the compulsion to recite something rather than report silence. Deleting the prompt example stops that one sentence — it doesn't stop the model from inventing different dialogue instead.

We think this is a textual/in-context variant of the audio-visual "Clever Hans effect" that's been documented for vision priors (model writes "thud" over a silent skateboard wipeout) — except ours shows the same reflex gets worded by whatever's nearest in the context window, which a vision-only diagnostic wouldn't catch.

Full writeup with the fabrication-rate forest plot and log data: https://www.interhuman.ai/blog/goblin-yeah-friday-at-five

u/Sardzoski — 11 days ago

▲ 7 r/aicuriosity+2 crossposts

Inter-1 does streaming: real-time social signal detection from live video, audio & text

Hi – Filip from Interhuman AI here 👋

Last month we launched Inter-1, our multimodal model for detecting social signals from video, audio, and text. Today we’re making it work with video streams.

We just released the Inter-1 Streaming API: a WebSocket endpoint that runs the full Inter-1 stack - 12 social signals, structured rationales, engagement, and conversation quality on live video while the conversation is unfolding.

You stream WebM chunks in, and get back regular updates with detected signals.

The model runs in sliding 8s windows with a sub-1.0 processing ratio, so it’s fast enough to power live coaching prompts, in-call overlays, and adaptive UI. It’s not meant to be a full voice agent on its own, it’s the behavioral signal layer you plug under whatever interaction system you’re building.

If you’re working on sales/CS tooling, interview coaching, training, or live feedback products and want to experiment with real-time social intelligence, it might be worth looking into.

Happy to answer questions or brainstorm use cases in the comments.

interhuman.ai

u/Sardzoski — 2 months ago

We built a model that scores pitch delivery, not just the script, here's what it caught in a real pitch

My pitch scored 87 on content and 80 overall, the 7 points I lost taught me more than the 87 I got right

Built a demo that scores how you deliver a pitch, not just what you write

We chased a hallucinated quote through 30k training records, 4,600 transcripts, and our own system prompt. Turned out to be two separate bugs

Inter-1 does streaming: real-time social signal detection from live video, audio &amp; text

Inter-1 does streaming: real-time social signal detection from live video, audio & text