u/DeshMamba

after 2.5 years running ~1k calls a day, here's the voice ai stack i'd actually pick today. llm, stt, tts, the whole thing

this is one i've been wanting to write for a while. every time someone asks "what should i use for llm/stt/tts" the honest answer is "depends what you're optimizing for" which is genuinely not useful to anyone trying to ship.

i've been running voice ai across a few hundred businesses, ~1k calls/day for 2.5 years. here's how i'd actually pick the stack today if i was starting from zero. fair warning that the space moves fast, so what's true in may is probably not true in august.

llm:

gpt-4.1 mini is my default for most voice agent loops right now. cheap enough, smart enough, low enough latency that the model basically disappears into the loop. its instruction following on long system prompts is what keeps me from migrating off.

gpt-4o mini still works. slightly faster, slightly worse at multi-turn context. fine for short flows.

groq is the fastest inference layer i've tested by a real margin. first-token latency feels unreal when you hear it. the catch is the open models running on it (llama, qwen) follow instructions less reliably than the openai stack on the exact same prompt. great for narrow agents. less great when the conversation gets messy.

people overthink this layer tbh. unless your agent is doing real reasoning, the gap between 4.1 mini and llama 3.3 on groq is mostly perceived latency, not capability. so pick speed unless you really need the reasoning.

stt:

deepgram is still my default. nova-3 handles accents well, streaming latency is competitive, and the tooling is mature.

openai's whisper is top tier on accuracy but the streaming endpoints lag deepgram. fine for post-call. i wouldn't put it in the live loop yet.

groq whisper is the fastest whisper deployment i've used. if you don't need deepgram's full streaming protocol, groq whisper is genuinely underrated.

stt is mostly a solved problem at this point. the real bugs aren't in transcription quality, they're in how your platform's streaming protocol talks to your turn-taking model. that's where the gnarly debugging happens.

tts:

this is where the most perception lives. nobody complains the llm sounds bad. they complain the voice sounds weird. so this is the layer i'd actually spend the most tuning time on.

elevenlabs flash 2.5 is the safe pick. voices sound right out of the box. the cost gets steep at scale, especially on enterprise tier, but it works.

cartesia sonic 3 is my favorite for price-to-quality right now. fast, voices are solid, cheaper per minute than 11labs. has some lingering edge cases on numbers and acronyms but it's closing.

rime arcana is the most "human" sounding model i've heard in production. great for inbound where you really don't want the caller to feel like they're talking to a robot. it's a tick slower than cartesia or 11labs flash though.

sarvam is the only serious option for indian languages right now (hindi/tamil/telugu). for non-indian languages it's not worth the swap.

starting from zero today i'd go 4.1/4o mini + deepgram nova-3 + cartesia sonic 3. swap in groq for narrow high-frequency agents. swap in elevenlabs flash 2.5 if the budget is there and the brand voice matters. swap in rime if "doesn't sound like a robot" is the top requirement.

none of this would have been the right answer 6 months ago. ask me again in october.

reddit.com
u/DeshMamba — 21 hours ago
▲ 2 r/Agentic_Marketing+1 crossposts

the 5 things we ended up building into our voice ai platform after 2.5 years of trying to fake them. and the boring outcomes that made the work worth it

ok so 2.5 years into voice ai and 18 months building a voice ai platform and i'm gonna admit something. like half the features we ship now are things we spent way too long trying NOT to build. here's 5 of them and what happened when we finally caved.

  1. native integrations, not zapier.

for like 8 months we told everyone "just use zapier or n8n to connect your crm." agencies hated it. tbh i don't blame them.

ended up building native for the 8 things agencies actually used. highlevel, hubspot, twilio, slack, gmail, cal.com, google sheets, notion. plus webhooks for the long tail. tickets about "leads aren't syncing" dropped maybe 70%. nobody mentions integrations anymore which is kind of the goal.

  1. live call transfer to a real human.

we resisted this one for a long time. felt like admitting the ai failed. built these elaborate escalation queues that routed unclear calls to "human review later." clients ignored them. what they actually wanted was for the agent to qualify and then hand the phone to whoever could close, right then.

so we built real time transfer to any phone number, mid call, with context handed over. roofing client booked 3 jobs in one afternoon that would've just been voicemails. that's when we got it.

  1. rag knowledge base instead of monster system prompts.

early version had system prompts the length of a short novel. clients would change a price in one place and forget to update the prompt. agent would quote stale numbers on calls. like actually told a customer the wrong price for an emergency service call once.

shipped a rag layer that agencies update like a notion doc. agent reads from it on every call. "the agent said something incorrect" tickets basically went to zero.

  1. multilanguage without switching agents.

we had separate spanish and english agents for a while. agencies hated managing two. and callers in bilingual markets would start in english, switch to spanish mid sentence, agent would just give up.

built language detection at the audio layer so one agent handles a caller code switching mid call. a dental clinic in texas saw their bilingual no show rate drop noticeably. nobody actually requested this feature. they just stopped complaining about something they couldn't quite name.

  1. multi client dashboard with sub accounts.

ok this one is the most embarrassing in hindsight. for 6 months agencies were managing 10+ clients by logging in and out of separate accounts. like physically signing out, signing into the next one. brutal. we just hadn't built the dashboard yet.

shipped a single agency view with per client analytics, white labeled to the agency's brand. one agency went from 4 clients on the platform to 18 in 4ish months. nothing about the product got more powerful. per client overhead dropped from like 30 min/week to 5. you only find this stuff when you watch a real person use it for a week.

every single one of these we wished we'd built ~2 quarters earlier than we did. probably a few more on that list we still haven't gotten around to.

how are you implementing voice ai into your agency?

u/DeshMamba — 2 days ago

watching ~35 agencies pitch voice ai to their clients for the past 12 months. here are the 5 pitches that closed and the 5 that didn't

ok so this is one i keep thinking about and haven't seen anyone talk about it.

watching agencies pitch voice ai to their clients over the past year, you start seeing the pattern fast. like the same 5 pitches close and the same 5 die in proposal. it's almost depressing how predictable it is.

5 that closed:

  1. anchored to revenue per client. not "save time" or "increase efficiency." literally "you bill this client $X/mo, this captures 30% more of the inbound that's currently going to voicemail, here's what that adds back." the closer the math is to the client's actual P&L, the faster they say yes.
  2. demoed the agent calling THEIR phone live. not a slack screenshot. not a youtube demo. on the actual call. picked up the phone, agent answered, used the client's real script. when the client hears their own intake flow being run by an ai in real time they get it. before that they're polite. after that they're sold.
  3. bundled it with a service the client already paid for. agencies who said "add voice ai for $400 on top of your retainer" got hosed. agencies who said "we're rebuilding your missed call flow, voice ai is part of it" closed without resistance.
  4. came with a 30 day "we run it for you" intro. takes the implementation fear off the table completely. client doesn't need to learn anything in month 1. agency handles the prompts, the voice, the testing. once it's stable they hand the dashboard over.
  5. used the client's actual script as the demo agent. the second the prospect hears their own greeting come out of an ai, the energy in the room changes. agencies who wrote a generic demo agent had to fight to get to that moment. agencies who built the personalized one for the meeting didn't.

now the 5 that didn't close:

  1. led with the tech. "powered by gpt and elevenlabs." every single one of these died in proposal. clients don't buy an llm. they buy a thing that picks up the phone.
  2. quoted setup fees over $5k upfront. especially for sub $10k/mo clients. math just doesn't work for them. agencies who folded setup into month 1 or made it free with a 3 month commit closed way more.
  3. pitched "ai receptionist" without naming the actual job. "ai receptionist" gives the client nothing to picture. "ai for after hours intake when your team is asleep" or "ai for handling the 80% of leads that ghost your appointment setter" gives them a job to staff.
  4. showed competitor logos in the deck. felt premium to the agency. felt like a vendor lineup to the client. side by side comparison decks especially. the prospect immediately starts shopping.
  5. tried to sell it as an add on. anything pitched as "in addition to" what you already do gets cut first in the budget review. anything pitched as "instead of" your current setup gets defended.

tbh the wild part is these were basically the same product. same tech, same price range, same use cases. the only thing that changed was how the pitch was put together.

curious which side of this the agency owners here are on. lmk in the comments.

reddit.com
u/DeshMamba — 2 days ago

spent ~10 years in paid media before i landed on the other side of the table. the channel is cooked, and here's where i think it's actually going

This is a longer post than usual but i think a lot of people in here are quietly feeling this and not saying it.

i ran paid media for clients for about a decade. mostly meta, google, some tiktok later on. across that run i managed something north of $60m in ad spend for a couple hundred clients. agencies, dtc brands, b2b, the whole spread. i'm not saying that to flex, i'm saying it because what i'm about to write only makes sense if you know i wasn't watching from the sidelines.

paid media is cooked. it's not collapsing tomorrow but the math has gotten brutal:

cpms keep climbing while ios14 made attribution mostly vibes. half my clients in 2019 ran campaigns where we could literally see the customer journey, now we report on "modeled conversions" and pretend it's the same thing. then google started rewriting search results themselves and organic traffic to client sites dropped 30-60% across the board. now meta's testing ai-generated ad creative directly inside the platform, which means the "creative + targeting + landing page" stack the entire agency world was built on is getting eaten from inside the platform.

so the question every marketer i talk to is asking is, where does the actual edge move next.

after sitting in the paid media seat for a decade and now spending the last couple years on the other side (i work on a voice ai platform now), here's the shift i'm watching happen in real time:

  1. the new edge isn't getting them to click. it's owning what happens when they call. paid media used to end at the form fill. now most of the buying decision happens after the form fill, in a phone call or sms thread or chat. agencies that only own the "before the call" part are getting squeezed. the ones moving into the conversation layer are the ones still growing margins.
  2. white-label voice ai is quietly becoming the new "we'll run your ads" service. agencies who got beaten up by margin compression on paid retainers are bolting voice ai onto their offering. they pay a platform fee to a white-label provider, sell the agent setup to their clients for $500-2500/mo, and pocket the spread. the unit economics are insane compared to running ad campaigns. nobody on linkedin is writing about this because the agencies doing it well are too busy onboarding clients to make content.
  3. the conversation layer is where attribution gets clean again. a phone call you can fully transcribe, intent-classify, score, and tie back to the click that generated it. the actual sales conversation is the truest signal of intent. paid media trained marketers have a huge advantage here because we already think in funnels, cost per qualified lead, ltv. we just have to extend that thinking past the lead form.
  4. tbh the people who'll win in marketing over the next 5 years aren't the ones learning prompt engineering. it's the ones who already understand consumer psychology, funnel math, and channel economics, and apply that thinking to the conversation layer. the prompt is the new ad copy. the call outcome is the new conversion event. if you can think in those terms you're already ahead.
  5. the failing pattern i keep seeing: marketers treating voice ai as "another tactic to layer on" instead of recognizing it's the next channel. it's not a feature inside your stack. it's the part of the funnel that nobody had real visibility into before, that now you can actually control and improve. treating it like an add-on is the same mistake people made with paid social in 2013.

zooming out, i don't think paid media is going to die. it's just going to be one piece of a wider conversation-first stack. the ad gets them to call or chat. the conversation closes them. the marketers who can think across that whole arc are going to be way more valuable than the ones who only know the click-to-form-fill part.

ngl this is the most clarifying career shift i've made and i was extremely skeptical of voice ai when i started. coming from paid media, where everything is measurable, i thought ai-driven calls were going to be a black box. they're actually more measurable than ads at this point, because every call is a transcript with structured outputs.

curious what other paid media / marketing folks are seeing. is anyone here actually moving into the conversation layer, or is your agency still running the same retainer model from 2019? what's holding the industry back from making this shift faster?

reddit.com
u/DeshMamba — 7 days ago
▲ 1 r/SMMA

the new agency upsell isn't an seo retainer anymore, it's voice ai. ~12 months watching this happen, here's what i'm seeing

ok hot take but i don't think it's that hot anymore.

the marketing services world has been quietly cracking for like 3 years. seo got commoditized once google started rewriting answers itself. paid ads keep getting more expensive while attribution gets worse. content is basically free now, anyone with chatgpt can publish 20 blog posts a week. the entire "we'll grow your traffic" pitch is harder to sell every quarter.

so agencies are scrambling for the next thing to bolt onto a retainer. and from where i sit (i help run a platform that powers voice ai agents for a bunch of agencies and msps), the answer most of them are landing on is voice.

some things i'm seeing on the white-label / reseller side:

  1. the smart agencies stopped trying to invent it themselves. 12 months ago every agency owner with a vapi account was "building their own voice ai." by month 6 they realized telephony, latency, compliance, integrations, and call ops are not weekend projects. now they white-label a platform and focus on what they're actually good at, which is selling and onboarding clients.
  2. the pricing gap is wild and people aren't talking about it. a real white-label voice ai platform runs an agency around $1k/mo + ~10 cents a minute. agencies are billing their clients $500-2500/mo per deployment. so an agency with 10-15 clients on it is doing $5k-30k/mo in margin off one tool. that's better economics than any seo retainer i've ever seen, and the work is way less hands-on once it's set up.
  3. per-client cost collapses at scale. one agency platform fee of ~$1k. at 13 clients that's $77/client. at 50 clients it's $20/client. the platform is basically free at scale. this is why the agencies who go all in early are about to eat the ones still selling $1500 seo packages.
  4. the failing playbook: agencies trying to sell voice ai the same way they sold seo. monthly retainer, vague deliverables, "we'll improve your inbound." doesn't work. clients want a specific outcome (book more appointments, qualify leads, answer after-hours). the agencies winning are pitching outcomes and ROI math, not "ai-powered solutions."
  5. the segments moving fastest aren't the obvious ones. i thought it'd be marketing agencies first. it's actually msps, voip resellers, and bpo shops. they already have the trust + integration into their clients' phone systems, so adding a voice ai layer is a natural upsell. marketing agencies are catching up but they're slower because they don't usually own the phone number.
  6. the "ai receptionist" framing is a trojan horse. clients buy "an ai answering service" and 6 months later they're using it for outbound, qualification, win-back calls, internal IVR replacement. the receptionist is the wedge, not the destination. agencies that understand this are already upsold their clients 2-3x.

zooming out, i think we're watching the same shift that happened when agencies stopped just running ads and started "owning the funnel" in 2015. the new line is owning the conversation. whoever owns the phone call owns the client relationship. agencies that move into the conversation layer in the next 12 months are going to look like the ones who got into facebook ads in 2013. the ones who wait are going to be selling commodity services to clients who already have a voice ai stack and don't need them anymore.

tbh i don't think this is even controversial anymore inside the industry. it just hasn't shown up in the public discourse yet because the agencies actually doing it are too busy printing money to write linkedin posts about it.

curious what other folks are seeing. what's your agency doing about this, ignoring it, building, or reselling?

reddit.com
u/DeshMamba — 7 days ago

the new agency upsell isn't an seo retainer anymore, it's voice ai. ~12 months watching this happen, here's what i'm seeing

ok hot take but i don't think it's that hot anymore.

the marketing services world has been quietly cracking for like 3 years. seo got commoditized once google started rewriting answers itself. paid ads keep getting more expensive while attribution gets worse. content is basically free now, anyone with chatgpt can publish 20 blog posts a week. the entire "we'll grow your traffic" pitch is harder to sell every quarter.

so agencies are scrambling for the next thing to bolt onto a retainer. and from where i sit (i help run a platform that powers voice ai agents for a bunch of agencies and msps), the answer most of them are landing on is voice.

some things i'm seeing on the white-label / reseller side:

  1. the smart agencies stopped trying to invent it themselves. 12 months ago every agency owner with a vapi account was "building their own voice ai." by month 6 they realized telephony, latency, compliance, integrations, and call ops are not weekend projects. now they white-label a platform and focus on what they're actually good at, which is selling and onboarding clients.
  2. the pricing gap is wild and people aren't talking about it. a real white-label voice ai platform runs an agency around $1k/mo + ~10 cents a minute. agencies are billing their clients $500-2500/mo per deployment. so an agency with 10-15 clients on it is doing $5k-30k/mo in margin off one tool. that's better economics than any seo retainer i've ever seen, and the work is way less hands-on once it's set up.
  3. per-client cost collapses at scale. one agency platform fee of ~$1k. at 13 clients that's $77/client. at 50 clients it's $20/client. the platform is basically free at scale. this is why the agencies who go all in early are about to eat the ones still selling $1500 seo packages.
  4. the failing playbook: agencies trying to sell voice ai the same way they sold seo. monthly retainer, vague deliverables, "we'll improve your inbound." doesn't work. clients want a specific outcome (book more appointments, qualify leads, answer after-hours). the agencies winning are pitching outcomes and ROI math, not "ai-powered solutions."
  5. the segments moving fastest aren't the obvious ones. i thought it'd be marketing agencies first. it's actually msps, voip resellers, and bpo shops. they already have the trust + integration into their clients' phone systems, so adding a voice ai layer is a natural upsell. marketing agencies are catching up but they're slower because they don't usually own the phone number.
  6. the "ai receptionist" framing is a trojan horse. clients buy "an ai answering service" and 6 months later they're using it for outbound, qualification, win-back calls, internal IVR replacement. the receptionist is the wedge, not the destination. agencies that understand this are already upsold their clients 2-3x.

zooming out, i think we're watching the same shift that happened when agencies stopped just running ads and started "owning the funnel" in 2015. the new line is owning the conversation. whoever owns the phone call owns the client relationship. agencies that move into the conversation layer in the next 12 months are going to look like the ones who got into facebook ads in 2013. the ones who wait are going to be selling commodity services to clients who already have a voice ai stack and don't need them anymore.

tbh i don't think this is even controversial anymore inside the industry. it just hasn't shown up in the public discourse yet because the agencies actually doing it are too busy printing money to write linkedin posts about it.

curious what other folks are seeing. what's your agency doing about this, ignoring it, building, or reselling?

reddit.com
u/DeshMamba — 7 days ago
▲ 2 r/SaaS

watched ~40 voice ai microsaas wrappers launch this year. here's who's making money and who's already dead

ok this is gonna sound cynical but it's also kinda the truth from inside the space.

i work on a voice ai platform that powers a bunch of agencies and resellers. because of that, i get to see every "ai receptionist for dentists" and "ai voice agent for law firms" microsaas that launches. probably tracked 40+ of these in the last 12 months. some hit, most are zombies, a few are already shut down.

a few patterns i can't unsee:

  1. the wrapper-of-a-wrapper crowd is getting wiped out. if your microsaas is literally "vapi but with a niche landing page," you have maybe 6 months before vapi/retell/elevenlabs ship the niche feature themselves or your customers realize they can just go direct. the providers are not your friends, they're your future competitors. building on top of them with no real moat is a "lifestyle business with a 3 month shelf life" play.
  2. the ones actually making money sold integration and ops, not "ai." nobody pays $400/mo for an llm and a tts voice. they pay for an ai agent that's wired into their crm, their calendar, their sms, their payment processor, and someone who picks up when it breaks at 9pm. the microsaas making real revenue look more like a service business with software margins than software with service margins. people don't say this out loud because it sounds unsexy.
  3. niche depth is the only moat. "voice ai for plumbers" is a real microsaas if the prompts know what a hydro jet costs and the agent can ask the right 6 diagnostic questions before booking. "voice ai for service businesses" is a graveyard. the wrappers winning right now spent 3 months in their niche learning the actual workflow before writing a line of code. the ones dying picked their niche off a list of "high-value verticals."
  4. churn in this category is brutal and nobody's talking about it. a friend running a niche voice ai wrapper had ~60% logo churn in the first 90 days. why? clients buy on the demo, then the agent confidently invents a price or transfers to the wrong number once, and they cancel. voice ai churn isn't like saas churn. one bad call is enough. the founders who survive are the ones obsessed with eval pipelines and guardrails, not feature velocity.
  5. the durable niches aren't sexy. cleaning companies, dental front desks, dispatch for hvac, hoa management, towing, locksmiths. high call volume, predictable workflows, owner-operators who hate the phone. the "ai sdr for series b saas" wrappers are dying because their buyers can build it in a weekend.
  6. tbh the real microsaas opportunity in voice ai isn't building another wrapper. it's building tools FOR the wrappers. eval frameworks, call analytics, voice cloning ops, compliance/recording tools, intent classification. the picks-and-shovels layer has like 3 serious players and a hundred wrappers that need them. nobody asked for my opinion but if i was starting from zero today i'd build for the builders, not for the end customer.
  7. one last thing. the wrappers making $20-50k mrr quietly aren't on twitter. they're not posting build-in-public threads. they're not at indiehackers meetups. they're running 8 clients in a single vertical, sleeping fine, and not telling anyone. the loudest voice ai microsaas accounts are usually the ones still at $0 mrr trying to build an audience first.

curious what others in here have seen launching anything in voice/ai-agent space. who's actually making money, who quietly shut down, what would you not build today.

reddit.com
u/DeshMamba — 7 days ago

the new agency upsell isn't an seo retainer anymore, it's voice ai. ~12 months watching this happen, here's what i'm seeing

ok hot take but i don't think it's that hot anymore.

the marketing services world has been quietly cracking for like 3 years. seo got commoditized once google started rewriting answers itself. paid ads keep getting more expensive while attribution gets worse. content is basically free now, anyone with chatgpt can publish 20 blog posts a week. the entire "we'll grow your traffic" pitch is harder to sell every quarter.

so agencies are scrambling for the next thing to bolt onto a retainer. and from where i sit (i help run a platform that powers voice ai agents for a bunch of agencies and msps), the answer most of them are landing on is voice.

some things i'm seeing on the white-label / reseller side:

  1. the smart agencies stopped trying to invent it themselves. 12 months ago every agency owner with a vapi account was "building their own voice ai." by month 6 they realized telephony, latency, compliance, integrations, and call ops are not weekend projects. now they white-label a platform and focus on what they're actually good at, which is selling and onboarding clients.
  2. the pricing gap is wild and people aren't talking about it. a real white-label voice ai platform runs an agency around $1k/mo + ~10 cents a minute. agencies are billing their clients $500-2500/mo per deployment. so an agency with 10-15 clients on it is doing $5k-30k/mo in margin off one tool. that's better economics than any seo retainer i've ever seen, and the work is way less hands-on once it's set up.
  3. per-client cost collapses at scale. one agency platform fee of ~$1k. at 13 clients that's $77/client. at 50 clients it's $20/client. the platform is basically free at scale. this is why the agencies who go all in early are about to eat the ones still selling $1500 seo packages.
  4. the failing playbook: agencies trying to sell voice ai the same way they sold seo. monthly retainer, vague deliverables, "we'll improve your inbound." doesn't work. clients want a specific outcome (book more appointments, qualify leads, answer after-hours). the agencies winning are pitching outcomes and ROI math, not "ai-powered solutions."
  5. the segments moving fastest aren't the obvious ones. i thought it'd be marketing agencies first. it's actually msps, voip resellers, and bpo shops. they already have the trust + integration into their clients' phone systems, so adding a voice ai layer is a natural upsell. marketing agencies are catching up but they're slower because they don't usually own the phone number.
  6. the "ai receptionist" framing is a trojan horse. clients buy "an ai answering service" and 6 months later they're using it for outbound, qualification, win-back calls, internal IVR replacement. the receptionist is the wedge, not the destination. agencies that understand this are already upsold their clients 2-3x.

zooming out, i think we're watching the same shift that happened when agencies stopped just running ads and started "owning the funnel" in 2015. the new line is owning the conversation. whoever owns the phone call owns the client relationship. agencies that move into the conversation layer in the next 12 months are going to look like the ones who got into facebook ads in 2013. the ones who wait are going to be selling commodity services to clients who already have a voice ai stack and don't need them anymore.

tbh i don't think this is even controversial anymore inside the industry. it just hasn't shown up in the public discourse yet because the agencies actually doing it are too busy printing money to write linkedin posts about it.

curious what others are seeing. what's your agency doing about this, ignoring it, building, or reselling?

reddit.com
u/DeshMamba — 7 days ago

watching cleaning companies adopt ai phone agents for ~10 months now, here's what's actually happening

ok so quick context, i work on a voice ai platform and a lot of our clients are cleaning companies. residential, commercial, post-construction, the whole spread. we're running enough call volume across them that i can see some patterns now.

if you run a cleaning business you already know the math. you miss a call, you lose the job. most quotes go to whoever picks up first. an audit one of our clients ran showed north of 40% of their inbound calls were going to voicemail because the owner was on a job or it was after 6pm. that's literally money on the floor.

here's what i'm seeing now that ai phone agents are getting actually usable for cleaning:

  1. the after-hours window is where the easy money is. most cleaning leads call between 5-9pm (people are home from work, thinking about saturday). owners are wiped, not picking up. an ai agent that just collects name, address, type of clean, square footage, and preferred date during off-hours is booking real jobs while you sleep. one client more than doubled their weekend-booked jobs within the first 2 months just from after-hours capture.
  2. the killer feature isn't booking, it's qualification. cleaners get a TON of tire kickers. "how much for a 4 bedroom" with no follow-through. agents that ask the right 4-5 questions (frequency, type, pets, square footage, urgency) filter out the people who aren't ready to book and only push real leads to the owner. owners we work with are spending way less time on dead-end calls.
  3. recurring client management is where it gets interesting. rescheduling and confirmation calls are 60-70% of inbound for established cleaning companies. ai handles these almost perfectly because the conversations are predictable ("can you push my tuesday to thursday"). some of our cleaning clients have basically given the ai a customer list and let it manage the whole reschedule/confirm flow.
  4. the pricing math actually works for small operators. owners assumed this would be enterprise-only. in reality you can run a voice ai phone line for maybe $200-400/mo all-in (platform + minutes) and if it books even 1-2 extra jobs a month it's already paid for itself. cleaning has a high per-job value so the ROI shows up faster than basically any other industry i've seen.
  5. the failing implementations all sound the same. cleaners try to make the ai do everything immediately. complaints, custom quotes for weird jobs, billing disputes. that's where it falls over. the ones who win start narrow (just after-hours new lead intake) and expand once they see what works. tbh boring but it's the difference between "this thing is amazing" and "this thing sucks" 30 days in.
  6. the bigger trend nobody's saying out loud: cleaning is one of the few service businesses where the owner's time is the actual bottleneck on growth. you can't add jobs without adding admin work. voice ai is the first piece of tech that actually reduces owner phone time instead of adding to it. crm's, scheduling software, all of that adds work. an ai phone agent removes it.

if you're a cleaning owner thinking about this, the move isn't to find the cheapest tool. it's to start with one specific job (after-hours new leads) and get that locked in before adding anything else. the cleaners using this well aren't tech people. they just picked one painful problem and let the ai handle that one thing.

curious if any cleaning owners here are already running something like this. what's working, what's broken, what would you not do again?

reddit.com
u/DeshMamba — 7 days ago

The new agency upsell isn't an SEO retainer anymore. It's voice AI. ~ 12 months watching this happen. Here's what I'm seeing.

ok hot take but i don't think it's that hot anymore.

the marketing services world has been quietly cracking for like 3 years. seo got commoditized once google started rewriting answers itself. paid ads keep getting more expensive while attribution gets worse. content is basically free now, anyone with chatgpt can publish 20 blog posts a week. the entire "we'll grow your traffic" pitch is harder to sell every quarter.

so agencies are scrambling for the next thing to bolt onto a retainer. and from where i sit (i help run a platform that powers voice ai agents for a bunch of agencies and msps), the answer most of them are landing on is voice.

some things i'm seeing on the white-label / reseller side:

  1. the smart agencies stopped trying to invent it themselves. 12 months ago every agency owner with a vapi account was "building their own voice ai." by month 6 they realized telephony, latency, compliance, integrations, and call ops are not weekend projects. now they white-label a platform and focus on what they're actually good at, which is selling and onboarding clients.
  2. the pricing gap is wild and people aren't talking about it. a real white-label voice ai platform runs an agency around $1k/mo + ~10 cents a minute. agencies are billing their clients $500-2500/mo per deployment. so an agency with 10-15 clients on it is doing $5k-30k/mo in margin off one tool. that's better economics than any seo retainer i've ever seen, and the work is way less hands-on once it's set up.
  3. per-client cost collapses at scale. one agency platform fee of ~$1k. at 13 clients that's $77/client. at 50 clients it's $20/client. the platform is basically free at scale. this is why the agencies who go all in early are about to eat the ones still selling $1500 seo packages.
  4. the failing playbook: agencies trying to sell voice ai the same way they sold seo. monthly retainer, vague deliverables, "we'll improve your inbound." doesn't work. clients want a specific outcome (book more appointments, qualify leads, answer after-hours). the agencies winning are pitching outcomes and ROI math, not "ai-powered solutions."
  5. the segments moving fastest aren't the obvious ones. i thought it'd be marketing agencies first. it's actually msps, voip resellers, and bpo shops. they already have the trust + integration into their clients' phone systems, so adding a voice ai layer is a natural upsell. marketing agencies are catching up but they're slower because they don't usually own the phone number.
  6. the "ai receptionist" framing is a trojan horse. clients buy "an ai answering service" and 6 months later they're using it for outbound, qualification, win-back calls, internal IVR replacement. the receptionist is the wedge, not the destination. agencies that understand this are already upsold their clients 2-3x.

zooming out, i think we're watching the same shift that happened when agencies stopped just running ads and started "owning the funnel" in 2015. the new line is owning the conversation. whoever owns the phone call owns the client relationship. agencies that move into the conversation layer in the next 12 months are going to look like the ones who got into facebook ads in 2013. the ones who wait are going to be selling commodity services to clients who already have a voice ai stack and don't need them anymore.

tbh i don't think this is even controversial anymore inside the industry. it just hasn't shown up in the public discourse yet because the agencies actually doing it are too busy printing money to write linkedin posts about it.

curious what's your agency doing about this, ignoring it, building, or reselling?

reddit.com
u/DeshMamba — 7 days ago

2.5 years building voice AI and ~1k calls a day later, here's what i'd tell past me

so this is gonna be more of a brain dump than a structured post.

i've been building voice AI agents for about two and a half years. what we ship is running a little over 1,000 calls a day right now. mostly inbound receptionist and qualification, some outbound follow-ups.

i see a lot of "is voice AI ready yet" and "how do i build this" posts in here so figured i'd dump what i actually learned. not what the docs say. the stuff that only shows up after you've shipped a few hundred thousand calls.

  1. latency is the entire game. the model can be smarter, the prompt can be better, none of it matters if there's a 1.2 second pause before the agent responds. callers will either hang up or talk over it. anything under ~700ms feels human. anything over a second feels like a robot reading a script. probably 60% of our engineering time goes here, not into the LLM layer.
  2. interruption handling matters more than script quality. a "smart" agent that can't be cut off feels worse than a basic agent that yields the second you start talking. barge-in detection is the most underrated part of the stack. nobody talks about it because it's boring.
  3. voice selection is doing more work than your prompt. same exact prompt, different TTS voice, completely different outcomes. we've tested this dozens of times. the voice is probably 60% of perceived intelligence. people will rate a dumb agent with a warm voice higher than a smart agent with a clinical one.
  4. hallucinations on phone calls hit different than in chat. on chat you can scroll back and correct it, the user has time to notice. on a call, the agent confidently quotes a wrong price or invents an appointment slot and the call is over. trust is gone. guardrails on pricing, availability, and policy are the most important code we write and they're the least glamorous.
  5. the call almost never fails. the handoff does. AI handles the conversation fine. then it transfers to a human and the human gets half the data, or it writes to the CRM and the fields don't map, or it sends the calendar invite to the wrong timezone. the voice agent is maybe 30% of the actual product. the rest is integration plumbing that nobody puts in their demo video.
  6. people are way more chill with AI than i expected, but only if you tell them. agents that open with "hi, i'm an AI assistant for [business], how can i help" outperform agents that try to pass as human. tbh i thought it'd be the opposite when we started. the "trick them" play feels clever for a week and then you start losing calls because someone caught on.
  7. volume reveals everything demos hide. the first 100 calls feel like magic. at 1,000 a day you find out about people calling from inside a moving truck, kids screaming in the background, three way calls, an entire call in Spanglish, an old phone with a 300ms transmission delay. you cannot prompt your way out of these. you have to engineer for the chaos.

happy to get into any of these if anyone's curious.

reddit.com
u/DeshMamba — 7 days ago

2.5 years building voice AI and ~1k calls a day later, here's what i'd tell past me

so this is gonna be more of a brain dump than a structured post.

i've been building voice AI agents for about two and a half years. what we ship is running a little over 1,000 calls a day right now. mostly inbound receptionist and qualification, some outbound follow-ups.

i see a lot of "is voice AI ready yet" and "how do i build this" posts in here so figured i'd dump what i actually learned. not what the docs say. the stuff that only shows up after you've shipped a few hundred thousand calls.

  1. latency is the entire game. the model can be smarter, the prompt can be better, none of it matters if there's a 1.2 second pause before the agent responds. callers will either hang up or talk over it. anything under ~700ms feels human. anything over a second feels like a robot reading a script. probably 60% of our engineering time goes here, not into the LLM layer.
  2. interruption handling matters more than script quality. a "smart" agent that can't be cut off feels worse than a basic agent that yields the second you start talking. barge-in detection is the most underrated part of the stack. nobody talks about it because it's boring.
  3. voice selection is doing more work than your prompt. same exact prompt, different TTS voice, completely different outcomes. we've tested this dozens of times. the voice is probably 60% of perceived intelligence. people will rate a dumb agent with a warm voice higher than a smart agent with a clinical one.
  4. hallucinations on phone calls hit different than in chat. on chat you can scroll back and correct it, the user has time to notice. on a call, the agent confidently quotes a wrong price or invents an appointment slot and the call is over. trust is gone. guardrails on pricing, availability, and policy are the most important code we write and they're the least glamorous.
  5. the call almost never fails. the handoff does. AI handles the conversation fine. then it transfers to a human and the human gets half the data, or it writes to the CRM and the fields don't map, or it sends the calendar invite to the wrong timezone. the voice agent is maybe 30% of the actual product. the rest is integration plumbing that nobody puts in their demo video.
  6. people are way more chill with AI than i expected, but only if you tell them. agents that open with "hi, i'm an AI assistant for [business], how can i help" outperform agents that try to pass as human. tbh i thought it'd be the opposite when we started. the "trick them" play feels clever for a week and then you start losing calls because someone caught on.
  7. volume reveals everything demos hide. the first 100 calls feel like magic. at 1,000 a day you find out about people calling from inside a moving truck, kids screaming in the background, three way calls, an entire call in Spanglish, an old phone with a 300ms transmission delay. you cannot prompt your way out of these. you have to engineer for the chaos.

happy to get into any of these if anyone's curious. also kind of want to know what others are running real volume have found, lowkey feel like this sub doesn't talk about the ops side enough.

reddit.com
u/DeshMamba — 7 days ago

Built my own voice AI platform after Vapi burned me. Wrote up everything I learned shopping for one.

Ok so my background is paid media, mostly lead gen. For years I'd watch the same thing happen with every client. We'd run ads, generate solid leads, hand them off, and the client would call like half of them. The other half just sat in the CRM dying. From the paid media side that's brutal bc you're literally paying to fill a pipeline nobody works.

So in 2024 I started messing around with voice agents to call the leads automatically. Started with Vapi. Spent way more than I should've figuring out what Vapi is good at and what it isn't. Then it kinda hit me that I was going to be duct-taping Vapi + n8n + GHL + Twilio + a CRM together forever, and any client of mine who wanted the same setup would be on the same hook. Felt more like a science project than a business lmao.

So I ended up just building my own platform bc nothing on the market actually solves what an agency needs. Workflow builder, conversations unibox, native CRM integrations, all in one place. Won't pitch it here, just context for why I have opinions.

Anyway. Stuff I wish someone had told me when I was shopping:

That "$0.05/min" number on every homepage is kinda a lie. Once you stack TTS + STT + LLM + telephony + platform fee, real cost is more like $0.15-$0.30/min depending on the voice. Nobody walks you through that math on the demo. You gotta ask, and tbh most sales teams don't have a clean answer ready.

Latency only looks good when the caller cooperates. The 700ms they show you is a perfectly worded customer handing the agent a script. Real callers interrupt and mumble and change their mind halfway through a sentence. Most platforms can't keep up with that.

White-label is mostly marketing language. A lot of these platforms call themselves white-label when really they just put your logo in the corner. The actual test: can your client log in, click around the dashboard, look at the URL, open an email notif, and never figure out who's actually powering it. Most fail that test.

Anyway I wrote all of it up in a free doc. Side-by-side pricing at 100+ concurrent calls, latency from real deployments, white-label audit, and which platforms a non-technical agency owner can actually deploy without needing a dev. Link in comments

Not gated, no email signup, just the doc.

Two things I'd do before signing with anyone, even if you skip the guide:

Ask them what your pricing looks like at month 6 call volume. The economics break at scale and they will not bring it up themselves.

Run a trial before committing. Anyone who won't let you do that is telling you something tbh.

Ask me anything specific in the comments if you're mid-shopping rn.

reddit.com
u/DeshMamba — 8 days ago

the 7 things an AI receptionist actually needs to do well in 2026, and most still don't do 4 of them

ok the AI receptionist space has gotten really noisy in the last 18 months. every vendor's landing page sounds identical. natural voice, books appointments, 24/7 coverage, you know the script. but when you actually run one of these in a real business you find out pretty fast that most platforms fall over on the same handful of things, and the things they fall over on are usually not what the marketing site is hyping.

been watching deployments across a bunch of verticals (HVAC, dental, legal, cleaning, a few others) for a while now. here's what i've actually seen matter.

1. sub-second response latency

this is the biggest reason callers hang up on AI bots imo. there's a UX rule from the 70s/80s called the Doherty Threshold that basically says people perceive anything past about 400ms as laggy and over 1 second as broken. on a phone call it's brutal. a 2 second pause after the caller stops talking and they assume they got disconnected.

the weird thing is most platforms benchmark voice quality but not end-to-end latency. you can have the most human-sounding voice and still lose calls bc the response time is 1.8 seconds.

easy way to test: call the demo, finish a sentence, count Mississippi's. if you can get to "one Mississippi two" before it speaks, it's too slow.

2. real interruption handling

humans interrupt each other constantly on the phone. conversation analysis research out of Stanford has put interruption frequency at every 12-15 seconds in natural phone conversation. a good AI receptionist needs to stop talking the second the caller starts, and pick up where the caller actually went, not where the agent was reading from a script.

a lot of platforms either keep talking over the caller (terrible) or stop dead and ask the caller to "please repeat that from the beginning" (also terrible). both kill calls.

3. writes directly to your scheduling system

there's a Harvard / InsideSales study floating around that says leads contacted within 5 minutes are around 21x more likely to convert than at 30 minutes. but most AI receptionists "book" appointments by creating a CRM task for a human to action later. by the time someone actually looks at that task the caller's already on the phone with your competitor.

when the bot finishes the call, ask yourself: does it write directly to Google Calendar / Calendly / Jobber / HouseCall Pro / whatever you use, or does it just generate a follow-up task? if it's the second one you're basically paying for a fancier voicemail.

4. SMS recovery on dropped or abandoned calls

call abandonment in inbound business phone systems usually sits around 10-15% per ICMI's contact center benchmarks, and for AI receptionists specifically i've seen it run higher in the first 60-90 days bc people are still figuring out how to talk to one.

when a call drops at like 70-80% completion, a decent platform sends an SMS with a booking link and a "wanna finish this real quick" follow up. most platforms just lose the lead.

barely anyone talks about this feature and it's one of the bigger ROI moves on the list.

5. handles regional accents and noisy environments

ASR (the speech recognition layer) is not equal across accents. published research from MIT and Stanford has shown error rates 2-3x higher for Southern US, Boston, Scottish, Indian English, and a bunch of others vs general american english. in production this looks like the bot saying "i didn't catch that, can you repeat?" three times in a 90 second call. caller hangs up.

worth asking any vendor what ASR they use under the hood. Deepgram, AssemblyAI, Whisper, Google Speech all perform pretty differently, and most platforms don't tune for the markets your customers actually live in.

6. vertical-specific qualification flows

generic "book an appointment" flows don't really work for most service businesses. a plumber needs to triage emergency vs scheduled work first. a dental practice needs to know if it's a new patient or a recall or an emergency. a law firm needs practice area and conflict-check info. a roofer needs to separate storm/insurance jobs from retail.

most platforms ship a generic template and tell you to "customize it." in practice that means weeks of prompt engineering, and most operators don't have that kind of time. ask any vendor for a real call recording from an actual customer deployment in your vertical. not a demo. an actual production call.

7. structured data extraction into your CRM/operations stack

at the end of every call the bot should be outputting structured data into whatever you're running on the backend. as fields, not as a transcript dump. things like caller name, callback number, what they wanted, how urgent, address, preferred time.

a lot of platforms quietly skip this. they give you the transcript and assume someone will read it. but if your CSR or tech has to read 4 minutes of transcript to figure out what the caller needed, you didn't save any time, you just moved the work around.

honestly curious what other folks have run into in actual production. especially anyone deploying for the trickier verticals (legal, dental, multi-location franchises). the space still feels pretty early and right now you basically have to grill every vendor before you sign anything.

reddit.com
u/DeshMamba — 8 days ago

If you're running a new agency, voice AI is one of the cleanest second-services to bolt on rn (up to $2,397 per referral if you'd rather just refer it out)

I'm posting this bc i wish someone had told me this when i was running my first agency.

quick context. i run a voice AI platform called Wave Runner AI. agencies use it to offer ai phone receptionists to their clients. it picks up inbound calls 24/7, qualifies, books appointments, and recovers missed calls.

reason i'm posting here. most newer agencies are stuck running one service (ads or seo or web design usually) and trying to scale by selling more of it. that ceiling hits fast around $10-15k mrr bc your delivery time caps out.

voice AI is one of the only second-services where the math actually works for a small operator. agencies charge clients somewhere in the $1.5-3k/mo range for it, the delivery cost is pretty minimal (a few hundred a month plus some setup time month one), and once it's running it doesn't really eat into your week. that's how agencies break the ceiling without hiring 3 people.

if you'd rather just refer it out instead of building the service yourself:

→ up to $2,397 per referral that converts
→ recurring revshare available
→ you keep your client relationship and we sit underneath your delivery

mostly relevant if your clients are local service businesses (HVAC, dental, legal, cleaning, contractors, real estate, that kind of thing). if you're pure ecom this won't move the needle for you bc your clients aren't really fielding phone calls anyway.

if any of this is useful or you wanna see how it actually works, drop a comment.

reddit.com
u/DeshMamba — 8 days ago

Helping cleaning businesses set up AI answering service. What actually works and what's oversold tbh.

Ok quick context bc relevant. I run paid media for service businesses and ended up building a voice AI platform after watching too many of my clients' leads die in CRMs bc nobody called them back fast enough. Cleaning is one of the verticals I keep seeing get specifically burned on this so figured I'd share what I'm seeing work.

Heads up first: cleaning is kinda harder than other home service verticals to automate. Most AI receptionist platforms are built for generic "answer the phone and book an appointment" stuff. But cleaning has way more pricing logic. Move-out cleans price different from recurring. Carpets, deep cleans, post-construction, all have their own time and pricing rules. If the bot can't quote based on what the caller actually asks for, you're either underquoting jobs or scaring people off with bad numbers.

Anyway, stuff that actually works for cleaning specifically:

The agent has to qualify before it quotes. Bed and bath count, sqft if it's commercial, type of clean, pets, frequency. Get those answers first, then quote. Otherwise you'll have someone asking "how much for a deep clean" and the bot saying $400 when it's a 4,500 sqft house and should've been like $750. That's a real money leak.

Live pricing pulled from your CRM, not a static list. The second you change your rates, a static price list is wrong. And your rates change way more than you think they do.

Books directly into your calendar AND sends the SMS confirmation. Anything that just "creates a follow up task" is kinda broken tbh. Like 60% of platforms I've looked at do exactly that and the booking gets lost in someone's inbox.

Handles the call drop. Like 1 in 5 calls drop before booking is finalized. You want SMS recovery built in so when someone drops at 80% through, they get a text with the booking link to finish. Most platforms don't have this and the customer just calls a competitor rn.

Now stuff that's oversold:

Voice quality. Honestly almost every platform sounds fine now. The "ours sounds more human" pitch is mostly marketing. Way more important is the response latency. A 2 second pause before the bot talks makes people hang up regardless of how human it sounds.

24/7 coverage. Cleaning calls cluster heavy 8am-11am and 4pm-7pm. The bigger win is catching the 9:15am call when your line is busy bc you're already on with another customer. After hours coverage sounds appealing but most of your missed revenue is happening during business hours when you can't pick up fast enough imo.

"Sounds completely human." Customers figure out it's a bot like 30 seconds in. They don't actually care, as long as the bot answers their questions and books the appointment. The platforms that try the hardest to "fool" callers are the ones that sound the weirdest btw.

Stuff to actually ask any vendor before paying:

When it books, does it write directly to my calendar or does it create a task for someone to follow up?

Can I listen to a recording of an actual cleaning business deployment, not just a generic demo?

If they can't answer all 4 cleanly, walk away tbh.

Ask away if you're shopping rn or got burned by a bad setup already.

reddit.com
u/DeshMamba — 10 days ago

Built my own voice AI platform after Vapi burned me. Wrote up everything I learned shopping for one.

Ok so my background is paid media, mostly lead gen. For years I'd watch the same thing happen with every client. We'd run ads, generate solid leads, hand them off, and the client would call like half of them. The other half just sat in the CRM dying. From the paid media side that's brutal bc you're literally paying to fill a pipeline nobody works.

So in 2024 I started messing around with voice agents to call the leads automatically. Started with Vapi. Spent way more than I should've figuring out what Vapi is good at and what it isn't. Then it kinda hit me that I was going to be duct-taping Vapi + n8n + GHL + Twilio + a CRM together forever, and any client of mine who wanted the same setup would be on the same hook. Felt more like a science project than a business lmao.

So I ended up just building my own platform bc nothing on the market actually solves what an agency needs. Workflow builder, conversations unibox, native CRM integrations, all in one place. Won't pitch it here, just context for why I have opinions.

Anyway. Stuff I wish someone had told me when I was shopping:

That "$0.05/min" number on every homepage is kinda a lie. Once you stack TTS + STT + LLM + telephony + platform fee, real cost is more like $0.15-$0.30/min depending on the voice. Nobody walks you through that math on the demo. You gotta ask, and tbh most sales teams don't have a clean answer ready.

Latency only looks good when the caller cooperates. The 700ms they show you is a perfectly worded customer handing the agent a script. Real callers interrupt and mumble and change their mind halfway through a sentence. Most platforms can't keep up with that.

White-label is mostly marketing language. A lot of these platforms call themselves white-label when really they just put your logo in the corner. The actual test: can your client log in, click around the dashboard, look at the URL, open an email notif, and never figure out who's actually powering it. Most fail that test.

Anyway I wrote all of it up in a free doc. Side-by-side pricing at 100+ concurrent calls, latency from real deployments, white-label audit, and which platforms a non-technical agency owner can actually deploy without needing a dev: Here's the guide

Not gated, no email signup, just the doc.

Two things I'd do before signing with anyone, even if you skip the guide:

Ask them what your pricing looks like at month 6 call volume. The economics break at scale and they will not bring it up themselves.

Run a trial before committing. Anyone who won't let you do that is telling you something tbh.

Ask me anything specific in the comments if you're mid-shopping rn.

u/DeshMamba — 10 days ago