u/AlexeyUniOne

Mistakes that most startups do sending OTP emails

Hey founders, Alexey here, CTO @ UniOne - transactional email service for startups.
My team just wrote a guide this week and figured I'd share the meat of it, since this comes up in every founder DM I get.

Here is the pattern I see often: a founder builds their auth flow, sends OTP through the same SMTP they use for marketing and promo newsletters, everything works fine at first users. Then suddenly it breaks, password resets start hitting spam folders and nobody on the team can figure out why..

3 crucial things most startupers miss when they ship:

1 - Shared sending streams quietly destroy auth deliverability. When your promo email gets flagged as spam by even a small number of recipients, that reputation hit drags down everything else on the same stream, including the critical transactional emails your users actually need to log in. Gmail and Microsoft see one sender domain with a complaint rate, and they act accordingly.

The fix isn't expensive - just use a dedicated subdomain like "auth.yourdomain.com" for transactional traffic, set up separate DKIM keys, and ideally use a separate IP pool if your provider supports it. The whole thing takes about 1 hour (or even less) of DNS configuration and saves you from a category of incidents that's genuinely hard to debug once it starts happening.

2 - Retries without idempotency turn into duplicate codes, which turn into spam complaints. If your backend retries on a network timeout and the OTP gets delivered twice, users get confused, panic and mark it as spam.

We added idempotency keys to our API specifically because we kept seeing this pattern - same key means same code, no duplicate send, no panicked user. Worth implementing on your side regardless of which ESP you use.

3 - Webhooks beat polling for anything OTP-related. You need to know within seconds whether the code was accepted, deferred, bounced, or hit a spam folder, and polling stats every five minutes is just too slow for an auth flow where users abandon in under a minute. Set up webhooks, store the job_id next to your OTP request in your own database, and when a user complains "I never got the code," you can pull up the exact event timeline in 10 seconds.

Full write-up with API + SMTP code examples, DNS setup, and the full sending flow is available in our blog. Please, let me know in the comments if you want me to share the direct link to the guide

Happy to answer specific questions in the comments

reddit.com
u/AlexeyUniOne — 2 days ago

A common but underrated failure mode in email infrastructure is when your ESP's API returns 202 Accepted, your dashboard shows the message as "delivered," and the email silently never arrives.

By the time users start asking why they didn't get the password reset, hours have already passed.

Statusfield wrote about this exact pattern in their April 14 piece on SendGrid outages: the API may return 202 Accepted while the email silently fails to deliver, and the failure only becomes apparent hours later when users report missing emails. The root cause is async architecture - the endpoint that accepts your request and the system that actually processes the message are decoupled, and most providers conflate "we received your call" with "we queued your email."

We recently added a dedicated "accepted" status as a separate webhook event in UniOne specifically to close this gap.

The 200 OK on the API call has always been there - that just confirms we received the request. The new "accepted" webhook signals that we've accepted the email itself for sending, which is a different commitment. From there the lifecycle moves through "sent" once it leaves our infrastructure, then "delivered" when the receiving server confirms acceptance, and so on through opens and clicks.

Each transition is observable through webhooks, but also directly inside the UniOne dashboard or via CSV export through the API. You can trace the full status history of any individual message, which makes debugging the exact case we're talking about much easier. An "accepted" event with no "sent" event after it is immediately visible, and you find the gap at the step it actually happened, not three hours later when a user complains.

If you're running production email at any volume, this is one of the few cases where the plumbing pays for itself within a quarter.

I'll drop you docs for the full callback format if you're interested. Write a comment below

reddit.com
u/AlexeyUniOne — 17 days ago