u/tarunyadav9761

I made a small ElevenLabs cost/workflow calculator for long-form TTS users

I made a small ElevenLabs cost/workflow calculator for long-form TTS users

I’ve been thinking a lot about the hidden cost of long-form TTS workflows.

Short clips are easy. But once you’re doing YouTube narration, course audio, audiobook chapters, training material, etc., the expensive part is not always the final export. It’s all the retakes:

  • one paragraph sounds off
  • pacing feels too clean
  • you regenerate a section
  • then compare 3-4 takes
  • then export again
  • then the credits slowly disappear

So I made a small calculator/workflow finder for people comparing cloud TTS usage with local/offline draft generation:

https://www.murmurtts.com/tools/elevenlabs-alternative-calculator

It estimates rough monthly voice costs, where credits get burned, and whether a local Mac workflow makes sense for drafts/retakes before using ElevenLabs for final output.

I built it because I’m working on a local Mac TTS app, so full disclosure there. But the calculator is free and might be useful even if you stay fully on ElevenLabs.

Curious how other people here handle long-form iteration: do you regenerate tiny sections, full chunks, or use a separate draft pipeline before final voice?

u/tarunyadav9761 — 17 hours ago
▲ 26 r/GenAiApps+1 crossposts

[macOS] [$49 → 10× Lifetime Free] LoopMaker offline AI music generator with unlimited generations. [Giveaway: Lifetime Promo Codes]

Hey r/GenAiApps,

I’m Tarun, solo indie dev behind LoopMaker.

It’s a native macOS app that generates music from text prompts entirely on your own Mac. No cloud, no subscription, no credits, no queue.

You can make lo-fi beats, cinematic background music, ambient loops, electronic tracks, hip-hop, jazz, instrumentals, or vocals with lyrics. The app runs locally and exports normal audio files you can use in video editors, DAWs, games, podcasts, etc.

Why I built it:

Most AI music tools are subscription + credit based. That works for some people, but if you generate a lot, it starts feeling like every idea has a meter attached to it. I wanted something closer to buying a tool once and using it as much as you want.

What LoopMaker does:

  • Generates music from text prompts
  • Runs offline on Mac
  • No accounts, no cloud queue, no monthly limits
  • Unlimited generations
  • WAV / M4A export
  • Six presets: lo-fi, cinematic, ambient, electronic, hip-hop, jazz
  • Vocals + lyrics support
  • Full commercial use for generated tracks

Normal price is $49 one-time, but I’m giving away 10 lifetime licenses here.

How to enter:

Just upvote and comment, what you’d use LoopMaker for YouTube background music, game loops, podcast intro, short films, beats, anything.

I’ll pick 10 people and DM the license codes.

Link: https://tarun-yadav.com/loopmaker

Happy to answer questions about the app, local AI music generation, or building native AI apps for Mac.

u/tarunyadav9761 — 9 days ago

The one thing missing from every Whisper transcription app on Mac (so I built it) [Giveaway: Lifetime Promo Codes]

I know, I know "another Whisper app?" Hear me out.

Over the past 2 years, I've tried basically every voice-to-text option on Mac:

What I tested:

  • MacWhisper - Great for batch file transcription. Speaker diarization is solid. It's cost around $89.05 that is lot of money for many people.
  • Superwhisper - Best UX of the bunch. But subscription model (~$84/year) and no auto-insert into apps.
  • VoiceInk - Open source, great value at $25. Lacks some polish.
  • Wispr Flow - Amazing AI post-processing, but cloud-based (privacy concern) and pricey. Monthly sub that i have about these app.
  • Apple Dictation - Free and built-in, but accuracy is frustrating and it only works in the focused text field.

The gap I kept hitting:

I wanted to dictate into Notion, Obsidian, Slack, email, cursor basically anywhere. Most apps either:

  • Only transcribe files (MacWhisper)
  • Copy to clipboard and you have to paste (most others)
  • Require you to be in a text field already (Apple Dictation)

None of them auto-inserted text directly into whatever app I was using via Accessibility APIs.

So I built EchoText

  • Auto-inserts text into any app not clipboard, actual text insertion
  • Menu bar app with global hotkey
  • One-time purchase, not subscription
  • 100% on-device, works offline, no account needed
  • Also does file transcription + meeting recording (system audio)
  • Multi language
  • Support the parakeet v2 for 100-200x faster transcription.

Honest limitations:

  • No speaker diarization yet (MacWhisper has this)
  • No AI summarization (Wispr Flow does this better) [ thinking to do this for future update].
  • macOS only (no iOS companion yet)

Who it's for:

  • People who want Superwhisper-level convenience but don't want another subscription
  • Writers/note-takers who work across multiple apps
  • Privacy-conscious users who want on-device processing

Launch deal:

  • Giving away 10 FREE lifetime licenses to people who comment and upvote on this post, I'll pick randomly in 48 hours.

Happy to answer any questions or take feedback. And yes, I know the market is crowded but I genuinely think these feature fills a real gap at this price point.

u/tarunyadav9761 — 11 days ago
▲ 0 r/audiobooks+1 crossposts

Would prompt-designed voices help for placeholder game dialogue?

I’m building a Voice Design feature in Murmur, a local Mac TTS app.

The idea is to describe a character voice instead of recording temp VO or cloning a real person.

Examples:

  • tired noir detective, low gravelly voice, slow pacing
  • cheerful shopkeeper, bright and playful
  • smooth villain, quiet and controlled
  • elderly fantasy narrator, calm and theatrical

You generate a short preview, adjust the direction, then save the voice for reuse.

I’m thinking mostly about early game prototypes: quest drafts, NPC dialogue, mood tests, trailer scratch audio.

For game devs, would this help during prototyping, or do you prefer simple temp recordings until casting?

u/tarunyadav9761 — 1 day ago
▲ 4 r/aicuriosity+3 crossposts

I built a prompt-to-voice design workflow for a local Mac voice app

I’ve been working on Voice Design inside Murmur, my local AI voice app for Apple Silicon Macs.

The idea is simple: instead of choosing from preset voices or cloning a reference sample, you describe the voice you want.

For example:

“A warm, gravelly middle-aged narrator with slow pacing, suitable for long-form audiobooks.”

or

“A punchy young product-ad voice with crisp diction and bright energy.”

The workflow is:

describe voice -> generate preview -> adjust prompt -> save the voice -> reuse it in future scripts

What surprised me is that this feels different from normal voice cloning. Cloning starts from a person. Voice design starts from a role, character, or production need.

Curious what people think: is prompt-based voice design useful, or do most creators still prefer picking presets / cloning a real voice?

u/tarunyadav9761 — 13 days ago
▲ 9 r/ebooks

I’ve been thinking a lot about the gap between ebooks and audiobooks.

There are so many books I want to read that either:

  • don’t have an audiobook version
  • have an audiobook in a different edition/translation
  • are niche nonfiction or academic books
  • are public domain / older titles with bad audio options
  • are personal drafts or documents I just want to listen to

For me, audio is not really a replacement for reading. It is more like a second way to get through books during walks, chores, travel, or low-energy days.

The annoying part is that normal text-to-speech works okay for a paragraph, but it gets messy for long books. You need chapters, consistent narration, maybe different voices for dialogue, clean exports, and a way to come back to the project later.

So I’ve been experimenting with a local Mac workflow where an ebook or chapter can become an audio project:

EPUB / text → chapters or sections → narrator voice → optional character voices → timeline edits → export audio

I’m not talking about piracy or sharing book files. More about personal use with books you own, DRM-free ebooks, public domain books, or your own writing.

Curious how people here feel about this.

Would you use AI/TTS audio for ebooks that don’t have audiobook versions, or does generated narration still feel too unnatural for long listening?

reddit.com
u/tarunyadav9761 — 20 days ago

I’m curious how small YouTube creators here are handling voiceovers, especially for scripted videos.

Most AI voice tools are easy when you only need one short clip.

But once the video gets longer, the annoying part becomes the workflow:

  • splitting the script into sections
  • regenerating one bad sentence without redoing the whole voiceover
  • keeping the same voice consistent across the full video
  • handling narrator + character / guest voices
  • adding pauses or reactions naturally
  • matching the voiceover timing to the edit
  • adding music/SFX underneath
  • exporting clean audio without juggling 20 different files

For people making faceless videos, explainers, tutorials, commentary, or story videos:

Do you usually generate the full voiceover in one go, or do you generate it line-by-line and assemble it later?

And would a script-to-audio timeline workflow actually help, or is the simple “paste text → generate voice → export” flow still better for small channels?

reddit.com
u/tarunyadav9761 — 20 days ago

I’m curious how people here are handling long AI voiceover workflows.

Most TTS tools are fine when you need one short clip.

But once you’re making an actual 8–15 minute video, the annoying part is usually not just the voice quality. It’s the whole workflow around it:

  • splitting the script into usable sections
  • regenerating one bad sentence without redoing everything
  • keeping the same voice consistent across the full video
  • handling multiple speakers or narrator/character parts
  • adding pauses, reactions, or emphasis
  • matching voiceover timing to visuals
  • adding music/SFX under narration
  • exporting clean audio for editing

Right now, do you usually generate one full voiceover, or do you generate line-by-line and assemble it later?

Also curious: would a script-to-timeline workflow actually help for AI YouTube creators, or do most people prefer the simple text box → generate → export flow?

Not posting a link or asking for channel critique here. Just trying to understand what workflow people are actually using for AI/faceless channels.

reddit.com
u/tarunyadav9761 — 20 days ago
▲ 15 r/TextToSpeech+7 crossposts

I’ve been working on Murmur, a local text-to-speech app for Apple Silicon Macs.

The new feature I’m building is called Projects / Story Studio, and it solves a problem I kept running into:

TTS tools are fine for one-off clips, but messy for actual audio projects.

If you’re making a podcast segment, audiobook chapter, course lesson, ad, or game dialogue, you usually need multiple speakers, multiple takes, pauses, reactions, music, edits, exports, and a way to come back to the project later.

So I built a project-based workflow:

Write a script → assign voices → generate dialogue → edit clips on a timeline → add music/SFX → export final audio.

It supports things like:

  • multiple scripts inside one project
  • Host / Guest / Narrator / Character speakers
  • inline tags like [pause], [laugh], [chuckle]
  • per-block regeneration
  • timeline editing with waveforms
  • media lane for music and SFX
  • ripple editing and gap tools
  • WAV/M4A export
  • transcript and stem export

Everything runs locally on Mac, so long scripts and voice samples do not need to be uploaded to a cloud service.

I’m still polishing the workflow and would love feedback from Mac users, especially people who make podcasts, audiobooks, courses, YouTube narration, or game dialogue.

u/tarunyadav9761 — 11 days ago
▲ 66 r/elearning+12 crossposts

I use text-to-speech way more than I expected.

Articles, long docs, random PDFs, drafts I want to hear out loud, sometimes full chapters from books I’m working through. The annoying part was that every decent TTS tool I tried either had a subscription, credits, or sent the text to the cloud.

So I built a Mac app for myself around local TTS models.

It’s called Murmur. Runs on Apple Silicon, works offline after model download, and does the usual stuff like paste text, import files, batch jobs, export MP3/WAV, etc.

The main reason I built it was pretty simple: I didn’t want to think “is this text worth spending credits on?” every time I wanted to generate audio.

Not pretending it replaces professional audiobook narration or voice acting. It’s more for boring useful stuff: listening to articles, reviewing drafts, generating narration, converting long text to audio.

Would love feedback from Mac users. Especially on what would make this feel like a daily utility instead of a niche AI app.

Link: https://murmurtts.com/

u/tarunyadav9761 — 17 hours ago

I’ve tried a bunch of local AI stuff on Apple Silicon, and honestly TTS has been one of the most practical use cases.

Not the flashiest, but very useful.

The reason is that the output has an obvious job: turn text into audio I can listen to while walking, cooking, commuting, etc. I don’t need it to be magical. I need it to be fast, private, and cheap to run a lot.

My current split is roughly:

Kokoro for fast long-form narration
Qwen-style models for multilingual stuff
Fish Speech when I want more expressive output
Other dialogue models for experiments

The boring workflow is the one I keep coming back to: dump in long text, generate audio, listen later.

I ended up building a Mac GUI around this called Murmur. Disclosure, it’s my app. It basically wraps local TTS models into a native workflow so I don’t have to keep managing separate scripts and model folders.

Runs on Apple Silicon, no cloud after model downloads.

Link: https://murmurtts.com/

Curious if anyone here is running TTS models directly with MLX. I’d love to know what models people are getting good results from on M1/M2/M3/M4.

u/tarunyadav9761 — 26 days ago