u/HandsOnArch

What would you build if realtime avatars were cheap enough for SaaS?

I’ve been working on a realtime avatar system that runs fully as SaaS.

The user device does almost nothing. No local processing, no GPU requirement, no “works only on good laptops” problem.

The main trick is that the expensive video work happens upfront in preprocessing. At runtime, the server cost is almost negligible compared to normal realtime video streaming.

That means you can embed a photorealistic avatar into a website, app, onboarding flow, tutor, companion, sales demo, support flow or whatever, without worrying that every minute destroys your margin.

I’m mostly curious from other founders:

If realtime avatars became cheap enough to use like a normal SaaS component, what kind of product or use case could you build with it?

Demo: https://avatar.letkimdoit.com

reddit.com
u/HandsOnArch — 8 days ago

What would you build if photorealistic live avatars were cheap enough?

It feels like almost every app idea has already been vibe-coded in some form.

So I started working on something that could enable a different kind of AI product:

a photorealistic live-avatar layer for AI apps.

Most of the expensive visual work happens upfront during preprocessing, so runtime stays lightweight and cheap, even for B2C apps where per-minute costs matter a lot.

Could be useful for AI tutors, onboarding agents, interview trainers, product guides, companions or interactive landing pages and more.

Demo here:

https://avatar.letkimdoit.com

What ideas would you build with something like this?

Any feedback is highly appreciated!

reddit.com
u/HandsOnArch — 11 days ago

Live avatars are finally cheap enough to embed in websites

I’m building a SaaS that lets founders add photorealistic live avatars to their own websites.

The main difference to tools like HeyGen:

Most of the expensive video work happens upfront during preprocessing.

At runtime, the avatar is lightweight: no GPU-heavy live video generation, low server load, and smooth playback even on weaker devices.

The tradeoff is that lip sync is not perfect Hollywood quality. But for many SaaS use cases like landing pages, onboarding, product demos, tutors, companions or support agents, the cost structure becomes finally realistic.

Curious what other builders think:

Would you rather integrate this as a simple website widget, or as an API where you control the AI/audio and only use the avatar layer?

Demo:

https://avatar.letkimdoit.com⁠�

reddit.com
u/HandsOnArch — 11 days ago

I am trying to reason through something in current architecture planning, and I am curious if others see the same shift.

Team boundaries have always been expensive. That is not new.

What feels new is the ratio.

If AI agents make implementation, refactoring, test creation, documentation and API adjustments much faster, then human handoffs become much more expensive relative to the actual implementation work.

Another team. Another backlog. Another priority. Another sprint. Alignment. Review. Integration.

The technical boundary is not necessarily the problem.

An API boundary can be useful.
A service boundary can be useful.
A module boundary can be useful.

They give humans and agents smaller contexts, clearer contracts and safer change boundaries.

The problem starts when a technical boundary automatically creates a human handoff.

A normal product feature often touches UI, API, validation, data model, behavior and tests. If that automatically turns into multiple teams, multiple sprints and multiple integration points, it feels increasingly disproportionate in an AI-driven development world.

And I do not think this only makes delivery slower. It can also make architecture worse.

Conceptually related changes get split apart. Context gets lost. Decisions are translated into handoffs. Architecture becomes a negotiated compromise between ownership boundaries instead of a coherent change to the system.

This does not mean “back to the monolith”.

It also does not mean every team should build its own little platform.

My current intuition is more like:

Larger domain change spaces per team.
Shared platforms for recurring complexity.
Technical boundaries without human handoffs.
Automated gates instead of alignment as the default process.

In large systems, some coordination will always remain. Product decisions, platform strategy, governance and release constraints do not disappear.

But I wonder if the default should shift much harder toward teams owning broader domain and technical scope, including more services and more end-to-end responsibility.

Put differently:

The human as a bottleneck in software development has to learn to step aside.

Not out of responsibility.
But out of unnecessary handoff points.

How would you design large software systems if you assume that AI agents will be heavily involved in implementation, refactoring, tests and documentation?

Would you cut team, service and platform boundaries differently than before?

reddit.com
u/HandsOnArch — 17 days ago

I’m working on a side project around photorealistic live avatars.

Most realtime avatar tools are impressive, but the runtime cost still feels too high for many small apps, B2C products, or side projects.

So I’m testing a different approach:

Instead of generating avatar video in realtime, most of the expensive work happens once during preprocessing.

One portrait photo → reusable photorealistic live avatar → runtime cost is mostly audio/LLM instead of video generation.

Current trade-offs:

- preprocessing takes around 15 minutes

- lip sync is not perfect yet

- this is still early

The upside:

- very low runtime cost

- latency is basically audio latency

- only one portrait photo needed

- builders could embed this into their own apps later

I’m mainly building this as infrastructure for other builders.

Possible use cases: onboarding assistants, tutors, product demos, interview practice, support flows, AI characters, companions, etc.

I’d love honest feedback:

Does this feel useful for builders?

I added demo avatars here for anyone curious to see the current state: https://avatar.letkimdoit.com/

Thanks for any thoughts or feedback!

reddit.com
u/HandsOnArch — 18 days ago
▲ 0 r/TestMyApp+1 crossposts

Hi all!

I’m preparing to launch a photorealistic live avatar product on Product Hunt soon.

The attached video is a real realtime excerpt and fully representative of the current quality.

Avatar creation and the conversation are both one-shot. No cherry-picked reruns.

The important part is this: the avatar is created once in a preprocessing step, and after that you can speak with it live in realtime. Beyond the normal audio latency, the additional avatar latency is negligible.

What I think is interesting about this approach:

• one portrait photo is enough

• avatar creation is fully automated

• the pipeline is reproducible and scalable

• most of the cost happens once during preprocessing

• runtime cost afterwards is extremely low

• that makes new B2C use cases much more realistic

Why I care about that:

If photorealistic live avatars stay expensive at runtime, they remain limited to a few premium use cases.

If they become cheap enough to run at scale, they can become a real building block for consumer products like interview practice, tutors, companions, language training, onboarding flows, support experiences, and other interactive AI products.

Current status:

• the pipeline is already product ready

• frontend and launch polish still need some work

• lip sync is not always perfect yet, as you can see in the video

• one-time avatar generation currently takes around 15 to 20 minutes

• my goal is to bring that below 10 minutes for the one-time preprocessing step

Would love honest feedback from builders before launch. Thanks in advance!

u/HandsOnArch — 21 days ago

I’m working on a small interview-prep tool where you can practice with a live AI interviewer instead of just using text or voice.

It’s not finished yet, but there is already a demo people can try. It already runs quite smoothly on decent hardware.

I’d really love feedback on the core use case: do you think something like this would actually help with interview preparation?

Personal note: I’m not a native English speaker myself (probably not that obvious here thanks to translation tools) but for me, this kind of live practice would be especially helpful for things like getting used to the pressure of answering out loud, and preparing for interviews in another language.

The idea is simple: paste a job description, optionally add CV context, and then do a mock interview with a live AI interviewer that talks back in real time.

Would this be useful to you, or does normal chatbot/voice practice already solve the problem well enough?

I hope this kind of feedback request is okay here. I’m mainly trying to understand whether the use case is actually helpful. Thanks a lot!

Demo: https://avatar.letkimdoit.com/interview

reddit.com
u/HandsOnArch — 23 days ago

Hi together,

I’ve been experimenting with real-time AI avatars recently, and ran into a problem pretty quickly: everything out there (like HeyGen) is insanely expensive to run in real time.

That’s maybe fine for some enterprise use cases, but for anything consumer-facing it basically kills the idea before it even starts.

So I started building my own pipeline to see if I could get the cost down far enough to make these kinds of use cases viable. At some point I had automated the whole thing so much that it started to feel like its own standalone project, not just part of the original idea.

What also made this interesting to me is that it feels like a lot of the traditional “unfair advantage” of being a software engineer for consumer apps has shifted recently. So instead of just building another app, I got more interested in creating something that could expand what others are able to build. If real-time avatars become cheap enough, it potentially unlocks a whole new set of use cases that just weren’t practical before.

I now have a rough alpha you can try here:

avatar.letkimdoit.com

What’s different:

-Runtime cost is minimal compared to existing solutions

-You only need a single portrait photo to generate a live avatar

The idea is to shift most of the cost into preprocessing, so running the avatar later is cheap enough for real apps.

Since everything is based on a single image, you can generate the same person in different scenes or contexts, which opens up some interesting new use cases.

Current limitations:

- Avatar generation takes ~20 minutes right now (target is closer to 10)

- Lip sync isn’t as perfect as the big player

- Emotions / expressions are still missing

- Some bugs, especially sometimes desync at the start

Where I’m unsure: I’m trying to figure out where this actually fits best.

My initial thoughts were things like:

Website onboarding / Assistants but maybe better simple consumer apps, where high pricing doesn’t work or maybe even lightweight “AI experience” apps?

I’m also currently debating whether this makes more sense as a standalone product, or if I should focus on building specific vertical use cases on top of it (Or just drop it altogether?.. )

reddit.com
u/HandsOnArch — 29 days ago