u/Overall_Challenge_66

How are you tying conversation data back to product events?

Hitting a wall trying to make sense of our conversation data and curious how others are dealing with it.

We're at maybe 45-50k convos a month on our AI agent. Got Langfuse for traces, Mixpanel for product events, transcripts dumped in Postgres. Each tool is fine on its own. Tying them together is where everything falls apart.

Most of the questions our PMs want to answer need data from all three. Stuff like which users who hit the pricing prompt last week ended up paying, broken down by what the agent actually said to them. Right now that's a data eng ticket and it takes two weeks. Half the time the answer is irrelevant by the time it lands.

I keep thinking we'll just build something internal to bridge it but every time I scope it out it's a quarter of work and I don't trust we'll get it right.

Is anyone running this kind of thing on something off-the-shelf, or is everyone gluing things together with their own internal tooling?

reddit.com

Working setups for catching regressions in conversation data at scale?

Anyone got a working setup for spotting regressions in conversation data at scale? We're around 50k convos/month and manual review just isn't an option anymore.

Stuff we've tried that kinda works but not really:

We embed segments, cluster them weekly, look for clusters where the outcome correlation looks off. Sometimes catches real stuff. Signal/noise gets bad on small clusters and we spent a couple weeks tuning parameters that didn't really move the needle.

We also tried running LLM-as-judge over a 5% random sample. Decent results, but the cost climbs fast at 2k+ labels a week. Gemini Flash is OK on the obvious stuff, Claude on the ambiguous, but it's still enough money that someone in finance asked about it.

The hybrid (cluster first, label only centroids, propagate to members) is cheaper but falls apart when clusters aren't internally consistent, which honestly seems to be most of them.

The hardest part is getting PMs to trust the output. They keep dropping back to reading transcripts manually because they don't believe the automated signal. Anyone gotten past that?

reddit.com

Doing VoC at AI conversation scale what's the cadence at other PM teams?

Anyone else trying to do VoC at AI-conversation scale and feeling stuck?

We're at around 50k AI agent conversations a month and our process is basically the same one we had at 500 touches a quarter.

So now we're doing things like sampling 1% randomly and reading them, which is fine for vibe but useless for anything statistical. We also tried LLM auto-tagging by topic. Categories look clean, but it never tells you why a specific customer in the "pricing question" bucket didn't convert.

The other option is asking data team for cohort cuts. Two-week SLA, and by the time the answer arrives the cohort is usually long gone.

We end up either getting loud about the 10 transcripts we actually read, or staring at sentiment dashboards trying to find a signal that isn't really there.

How are other PMs at AI-native companies running this loop? Curious about the cadence and where you're pulling the inputs from.

reddit.com
▲ 1 r/voiceagents+1 crossposts

CX reporting on AI voice agents at 50K+ calls/month what's working?

Looking for advice from CX leaders running AI voice agents at real volume.

We're a fintech and our AI voice agent now handles around 100k inbound calls a month — billing questions, account issues, the usual mix. It's been a win on cost and response time. The problem now is reporting up to leadership.

Every Monday our exec team wants to know things like why customer sentiment shifted last week, what's driving the spike in cancellation requests, whether certain account types are calling more often than others. Real voice of customer questions.

Today we answer those by pulling a sample of 50 calls, listening to them, categorizing by hand, and writing up the patterns we noticed. It takes the team most of the week. The answer is always partial because we're sampling 50 out of 100,000.

How are other CX teams in this position handling reporting? Is anyone happy with what they're doing today, or is everyone stuck on sampling and guessing?

reddit.com
u/Overall_Challenge_66 — 2 days ago
▲ 1 r/Agent_AI+1 crossposts

How are teams handling prompt QA at scale?

Curious how teams are handling prompt QA once volume gets high.

We’re at ~40k conversations/month and currently have PMs manually reading transcripts to figure out:

  • what broke
  • where users get frustrated
  • which prompt/workflow changes helped or hurt

The annoying part is the review workload scales almost linearly with conversation volume.

We ship a lot of prompt updates every month, so keeping quality high is becoming a real bottleneck.

I keep feeling there has to be a better way than “read more transcripts.”

Are people actually using automated systems to surface issues/regressions in production?
Like:

  • “this flow started failing more after version X”
  • “users in this branch churn more”
  • “these conversations became longer after the prompt change”

Not looking for vendor pitches honestly — more interested in what’s genuinely working in production.

reddit.com
u/Overall_Challenge_66 — 2 days ago