u/blameitonthenight34

a Series C founder DMed me last week asking how our 3 person team does QA. here's the story

It started with a flopped demo, we had a investor call, small startup, me and two other guys. spent two weeks prepping the product and the day of the call, we open the app and the main flow is broken like completely broken, something had changed in a build two days earlier and none of us caught it.

Call went okay but we could tell it rattled them. We didn't get the meeting.

After that i became obsessed, not with hiring a QA person, we couldn't afford it. obsessed with why we keep missing things.

turns out we were testing the same happy paths every time, we were human, we'd check the stuff we built that week, nobody was checking the stuff from 3 months ago that we assumed was fine.

i spent a weekend rebuilding how we do it, not writing a checklist, actually rethinking what needed to happen after every single build without us touching it.

took a few iterations. couple of weeks of adjusting, then it just worked.

We haven't had a regression slip through in 7 months, we ship faster now than before we had any of this set up.

I posted about it here a while back mostly venting. someone from a Series C company DMed me and said their QA team of 6 was dealing with problems our 3 person team had solved.

reddit.com
u/blameitonthenight34 — 1 day ago

We almost killed Drizz three months in. One Slack message saved it.

I should preface this by saying I am not a natural at handling doubt. My cofounder will tell you I went through approximately four existential crises before we had a single paying customer.

The third one nearly ended the company.

We were three months in. The vision AI approach was working technically but adoption was slower than we expected. The QA leads we were talking to liked the demo. They'd say things like "this is exactly the problem" and "we've been looking for something like this." And then they'd go quiet for two weeks and come back with "we need to think about budget."

I started convincing myself we'd built the right solution for the wrong buyer. Maybe QA leads didn't have purchasing power. Maybe we needed to go upmarket. Maybe the whole vision-based approach was interesting as a demo but not compelling enough to replace what people already had. I wrote a three page document arguing we should pivot to developer tooling.

My cofounder read it and didn't say anything for a full day.

Then a Slack message came in from one of our early users. A QA engineer at a Series B company. Not a lead, not a decision maker. Just someone on the team who'd been using the free version.

He said "I just want you to know I used Drizz to catch a payment flow regression that would have shipped on Friday. My manager has no idea. I just wanted someone to know."

That was it. No ask. No feature request. Just a guy telling us the thing worked when it mattered.

I closed the pivot document and never reopened it.

We went back to every quiet lead and asked a different question. Not "are you interested" but "what would it take for this to be a no brainer." The answers were actually useful. Pricing structure, SSO, a specific integration. Concrete things we could build.

Four of them converted within six weeks.

I think about that Slack message a lot. We almost made a major strategic decision based on pipeline anxiety and a slow month. The product was working. We just couldn't see it from where we were standing.

If you're building something and the demos go well but nothing is closing, talk to the people actually using it before you change anything. They know something you don't.

Still building Drizz. Still occasionally spiraling. Link in first comment if you want to try it.

reddit.com
u/blameitonthenight34 — 3 days ago
▲ 6 r/softwaretesting+1 crossposts

AI didn't give developers their time back.

from my experience I work more not less

close tickets faster but somehow the ticket count just keeps up, the time I saved didn't go back to me it just got absorbed into the next thing on the list

I know some people who genuinely clocked out earlier after adopting AI tools and their managers didn't notice or care as long as the work was done

is anyone actually working less or did the bar just quietly move for everyone

u/blameitonthenight34 — 9 days ago

The founder with 50k followers and no customers.

I know a founder with 80k followers on LinkedIn

every post gets hundreds of likes and comments from people saying this is so insightful and needed to hear this today, reposts from other founders, the algorithm loves him

he has eleven paying customers

and before you say it, yes the product exists, yes it solves a real problem, yes the pricing is reasonable, yes he's been at it for over a year

the followers are real, the engagement is real, the revenue is not

he's not doing anything wrong exactly, the content is good, the consistency is there, he built an audience the way everyone told him to build an audience

he just built it full of the wrong people

motivational content attracts people who like motivational content, thought leadership attracts other thought leaders, posts about the journey of building a startup attract other people building startups who are too busy building their own thing to buy yours

somewhere in those 80k followers are probably 200 people who have the exact problem his product solves and are actively looking for a solution right now

he has no idea who they are

that's the real problem nobody talks about, not the follower count, not the content quality, not the posting frequency. the complete absence of any signal about which people in that audience actually have intent

likes are not intent, comments are not intent, follows are definitely not intent

intent is someone engaging with your content because something you said described their exact situation and they're already in the market looking for a way out of it

those people are in your audience right now. mixed in with everyone else, invisible unless you have a way to find them.

Has anyone solved this issue?

reddit.com
u/blameitonthenight34 — 9 days ago

Company raised 650k, Laid Off Half the QA Team Two Weeks Later and I'm probably Next

It finally happens. The company I work for has just raised 650k, yet two weeks ago they hit us with a wave of layoffs because of AI. I lead 2 teams of QA engineers, 2 on each team, and now I am down to just 1 dedicated tester, 2 if you include me because now I have no choice. For context, there are 5 or 6 developers per team, and no BAs, so analysis falls on the developers as well.

We have a meeting recently to discuss how to move forward. Developers are now expected to cover testing and automation with the help of AI, and I am supposed to help oversee and establish governance on this, as if I don't already have my hands full trying to catch up with deliverables.

I think it's only a matter of time before my role gets absorbed by the dev leads and they let me go as well. I need to save myself and start looking for opportunities out there, and I am seriously considering moving out of QA entirely.

Sorry to post another sob story about QA jobs getting replaced by AI. We already have enough of these in this subreddit. I just needed to vent.

reddit.com
u/blameitonthenight34 — 10 days ago

here's the hiring sequence i've watched play out at probably a dozen startups:

year one: founder does the books in a spreadsheet. it's fine. they tell themselves they'll clean it up later.

year two: the spreadsheet is a disaster. they hire an accountant to file taxes once a year. feel like they've solved it.

year three: investor asks for monthly financials. founder realizes they don't actually know their burn rate, their gross margin, or what's in accounts receivable. they panic-hire a fractional CFO.

the fractional CFO spends the first three months not doing CFO work. they're cleaning up two years of bad bookkeeping because none of the foundational work was done right. you're paying CFO rates for bookkeeper work because you skipped the bookkeeper.

the roles are actually distinct and the order matters:

bookkeeper keeps the records clean in real time. categorizes transactions, reconciles accounts, makes sure the data going in is accurate. this is not glamorous. this is also the foundation everything else depends on.

an accountant interprets the records. taxes, compliance, financial statements, strategic advice on structure. they need clean books to do this well. if your books are a mess your accountant is spending half their time being a bookkeeper and billing you accordingly.

CFO makes decisions with the financial data. fundraising strategy, runway modeling, unit economics, investor relations. they need accurate books and competent accounting under them. a CFO with no clean data is just an expensive person with opinions.

the controversial part: most early stage startups do not need an accountant yet. they definitely don't need a CFO. they need someone to keep their books clean for $300-500 a month so that when they do need an accountant, the accountant can actually do accountant work.

the irony is the tool that's helped most isn't the impressive-sounding one. it's the boring one that just makes sure the data going in is correct.the reason founders skip the bookkeeper: it feels too small. too administrative. not strategic enough. so they either do it themselves badly or jump straight to hiring someone impressive-sounding who ends up doing the unglamorous work anyway at five times the cost.

your books are either clean or they're not. no amount of CFO credibility fixes data that was never recorded correctly in the first place.

reddit.com
u/blameitonthenight34 — 17 days ago

Going to say something that will annoy a lot of people in this thread.

Most early stage founders asking “how do I get more leads” are asking the wrong question. Not because leads don’t matter. Because the question assumes the problem is volume when it’s almost always specificity.

I’ve watched teams run Apollo sequences to 3,000 contacts and get 4 calls. Then manually identify 40 companies that had a very specific trigger new funding, recent hire in a relevant role, a job post signaling a pain point they’re actively trying to solve and get 11 calls from those 40. Less volume, better conversations, higher close rate, and they actually learned something about who buys and why.

The “do things that don’t scale” advice is right but most people implement it wrong. It doesn’t mean do cold outreach manually instead of with a tool. It means get so specific about who you’re targeting and why right now that you couldn’t automate it even if you wanted to.

If your conversion is rough, the issue is almost never the channel. It’s that you’re reaching people who don’t have the problem badly enough, right now, to act on it. No sequence fixes that.

The 40-50 leads you have do you actually know why the ones that didn’t convert didn’t? Not assumptions. Did you talk to them? Because the pattern in those conversations is worth more than your next 500 Apollo contacts.

I work on GTM tooling at my company and the teams we see actually convert early leads are obsessive about signal what triggered this person to be reachable right now not just identity. Most outbound ignores timing entirely and wonders why response rates are 2%.

Activity feels like momentum. Most early GTM is just that activity.

reddit.com
u/blameitonthenight34 — 17 days ago

Been going deep on AI QA tools for the past few months. Most of them sound impressive until you try to use them daily.

Here’s what I’ve seen real results with:

**BotGauge:**Generates test cases directly from product specs or user stories. Handles UI and API tests and auto-updates when the UI changes. Fast to set up, which matters when your backlog is already full.

**QA Wolf:**Managed service their team builds and maintains the test suite for you. Good if you want to fully offload QA, but the timeline to get up and running is slow and you’re dependent on their team for any changes.

**Rainforest QA:**No-code automated testing with a mix of manual and automated. Lower barrier to entry but you hit ceilings fast with complex flows.

**Testim:**AI-assisted automation with solid CI integration. Works well for web apps but anything non-trivial still needs scripting knowledge. Not truly no-code once you get into edge cases.

**Mabl:**Self-healing tests and visual regression coverage. Reliable at scale but pricing climbs quickly once you’re running tests across multiple environments.

Drizz: Runs your actual app flows across real device profiles and OS combinations, catches regressions between builds, and gives you screen recordings so you can see exactly what broke and where. The device coverage angle is what sets it apart most of the tools above test your app in controlled conditions.

What are you guys using day to day and what made a difference?

reddit.com
u/blameitonthenight34 — 17 days ago

I’ve shipped software with 90%+ test coverage that had embarrassing production failures. I’ve shipped software with 40% coverage that ran cleanly for years. The correlation is weak and everyone quietly knows this but coverage is in the metrics dashboard so we keep reporting it.

The problem is what coverage actually measures. It measures whether your tests execute your code. Not whether your code works for users. Those are completely different things and treating one as a proxy for the other is where the false confidence comes from.

You can have a fully covered login flow that fails on iOS 18 with a third party keyboard installed. Every test passes. The user can’t log in. Coverage: 94%. Production experience: broken for a meaningful percentage of users who happen to have SwiftKey.

The tests aren’t lying. They’re telling you exactly what they were asked to tell you, that the code behaves the way the person who wrote the tests expected it to behave. Which is also the person who wrote the code. Who already had the same assumptions baked in.

Real reliability comes from testing things you didn’t think to test. That requires either users finding it the hard way or something that doesn’t share your assumptions doing the checking. Neither of those is a coverage metric.

The teams with the best quality I’ve worked with obsess over what isn’t covered and why. Not the percentage.

reddit.com
u/blameitonthenight34 — 17 days ago

found a critical bug two hours before a release last month. payment flow, affects a specific device + OS combo, maybe 15% of users. caught it in time. nothing shipped broken.

the response was not great job. it was why didn't we know about this sooner.

i've been thinking about that reaction for weeks. the bug existing wasn't the problem — the bug being found was the problem. finding it meant someone's code had an issue. finding it late meant someone's process had a gap. the discovery itself is the uncomfortable part, not the underlying reality that was always there.

QA is the only discipline where success looks like failure to everyone adjacent to it. a green release means nothing was found, which feels good. a held release means something was found, which feels bad. but one of those is actually better than the other and it's not the one that feels good.

i don't have a fix for this. it's a culture thing and culture is slow. i've just started framing every catch as "we saved ourselves from this" rather than "we found a problem" and it lands slightly better.

slightly.

reddit.com
u/blameitonthenight34 — 18 days ago

found a critical bug two hours before a release last month. payment flow, affects a specific device + OS combo, maybe 15% of users. caught it in time. nothing shipped broken.

the response was not great job. it was why didn't we know about this sooner.

i've been thinking about that reaction for weeks. the bug existing wasn't the problem — the bug being found was the problem. finding it meant someone's code had an issue. finding it late meant someone's process had a gap. the discovery itself is the uncomfortable part, not the underlying reality that was always there.

QA is the only discipline where success looks like failure to everyone adjacent to it. a green release means nothing was found, which feels good. a held release means something was found, which feels bad. but one of those is actually better than the other and it's not the one that feels good.

i don't have a fix for this. it's a culture thing and culture is slow. i've just started framing every catch as "we saved ourselves from this" rather than "we found a problem" and it lands slightly better.

slightly.

reddit.com
u/blameitonthenight34 — 18 days ago

maintaining cross-platform mobile tests was becoming a full-time job inside a full-time job. every time the designer moved a button or we pushed a UI update, half the locators would break. iOS and Android needed two separate sets of everything. the suite was green maybe 60% of the time and the team had quietly stopped trusting it.

i’d been looking at ways to either fix the Appium setup or replace it entirely when someone mentioned Drizz. the pitch sounded almost too simple: throw out the DOM tree completely, use computer vision instead, write tests in plain English.

i was skeptical. “plain English testing” is one of those phrases that usually means “works great in the demo, falls apart immediately in real usage.”

it didn’t fall apart.

the way it actually works is you describe the interaction the way you’d describe it to a human. “tap the checkout button.” “enter the email address.” “scroll down until you see the confirm screen.” the system visually analyzes the rendered pixels on the device and interacts with it exactly the way a user would, without caring about what’s underneath the UI at all.

the immediate practical win for us was that UI changes stopped breaking tests. if the button moves, the system finds it visually. it’s not looking for an accessibility ID that no longer maps to anything. it’s looking at the screen the same way a person would.

what i didn’t expect was how much faster it became to write new tests. describing flows in plain English takes a fraction of the time that scripting locators does, especially across two platforms. we’ve added more test coverage in the last two months than we did in the previous year.

the one real caveat: vague instructions on a complex screen will confuse it. “tap the button” on a screen with six buttons is going to give you trouble. you have to be specific, the same way you’d need to be specific if you were explaining the test to a junior QA person. once you internalize that, it becomes very natural.

we’re not fully off Appium yet for everything, but Drizz now covers all our critical user flows and i haven’t touched a locator in eight weeks.

reddit.com
u/blameitonthenight34 — 18 days ago

we had a solid unit test suite. CI was green. we felt good about releases.

then we started running drizz flows before every release and the first month alone caught three things our entire test suite had never flagged.

a checkout button that was visually there but unreachable on certain Samsung devices because the keyboard was overlapping it. our unit tests had no idea because they were testing logic, not actual screen state.

a dark mode loading screen where the text was white on a white background. had been like that for months. nobody on the team uses dark mode as their default so it just sat there invisible.

a regression in our onboarding flow that only appeared on mid-range android devices with aggressive memory management. worked perfectly on every device we owned. broke consistently on the devices our actual users had.

none of those are hard bugs. they’re all embarrassingly simple. but they’re also the exact class of bug that unit tests structurally cannot catch because they test your code, not your user’s experience.

what makes drizz different for us is the screen recording output. you can’t look at a 20-second video of your checkout flow silently failing and scroll past it. the output feels real in a way that a pass/fail dashboard never did.

if you’re new here and evaluating whether to set it up, the answer for us was yes about 3 months in. start with your two or three most critical user flows. you’ll find something within the first week that should not be in production.

reddit.com
u/blameitonthenight34 — 18 days ago

every 1-star review is a free user research session that someone else’s customer paid for. you get the exact words frustrated people use, the exact moment something broke, and the emotional state they were in when they sat down to write a public complaint.

my main competitor has a filter feature users keep calling broken in reviews. it’s not broken. it just resets when you background the app. users don’t know that. they just know they spent 3 minutes setting up filters and the next morning they’re gone.

we built our version of that feature last quarter and state persistence was a non-negotiable requirement from day one. wrote three lines of marketing copy around it. “filters that actually stick.” that copy came directly from their reviews, not from any research we ran ourselves.

the other pattern i watch for: bugs that survive multiple releases. same crash, different update, different users, same underlying problem. if something makes it through three release cycles it’s either genuinely hard to fix or nobody senior enough has personally hit it yet.

we run flows through drizz before every release now partly because of this. watching your app from the outside through someone else’s frustration changes how you think about what “working” means. your own internal testing is too warm. everyone’s being generous because they built the thing.

competitor reviews, weekly, 30 minutes. highest signal free research i’ve found.

reddit.com
u/blameitonthenight34 — 18 days ago

100% pass rate the morning we deployed. i checked twice because i was feeling good about the release.

by 2pm we had 40 support tickets.

the tests weren’t wrong. they were just testing in conditions that don’t reflect how anyone actually uses the app. everything ran on a pixel emulator, wifi, english locale, keyboard closed. clean little controlled environment.

the bug was a keyboard overlapping the confirm button on the checkout screen. on samsung devices. one ui does keyboard insets slightly differently and our layout wasn’t compensating. we had three samsung users on the team. none of them caught it because when you’re testing your own app you muscle memory through it and you’re not looking for subtle layout shifts.

what messed with me wasn’t the bug. it was how confident i was that morning. i had looked at that green dashboard and genuinely felt good about shipping.

passing tests just means the things you thought to test are working. it says nothing about the things you didn’t think to test.

we run flows across device profiles now before any release. started using drizz for this, it catches the breaks on hardware we didn’t test on stuff that emulators miss.

reddit.com
u/blameitonthenight34 — 18 days ago

we had a gesture-based interaction that kept showing up in positive reviews. felt intuitive, people mentioned it specifically. we were proud of it.

the bug report came from a power user on his commute. he said it worked perfectly sitting down but became unreliable when he was standing on a train. we thought he was describing a network issue.

he wasn’t. he was describing how he held his phone.

standing on a train you grip it lower, thumb reaches further across the screen, hits a slightly different touch region. our gesture detection had a blind spot exactly there because we’d calibrated it sitting at a desk, holding the phone the way you hold it when you’re testing, which is not how people hold it when they’re actually using it.

the users who loved the feature were the ones whose physical usage happened to match our testing conditions. everyone else was having a subtly worse experience and most of them just thought they were bad at it.

we run flows across device profiles and orientations now before any release. use drizz for this specifically. it doesn’t replicate a commute but it catches the obvious stuff we were shipping blind.

reddit.com
u/blameitonthenight34 — 18 days ago

I’ve seen this happen at every company I’ve worked at: an accessibility issue gets filed, it’s legit, but it gets pushed to the backlog because “there’s something more urgent.” It repeats for months.

Then, one of two things happens: either a user calls us out publicly, or we hit a compliance audit. Suddenly, it’s all-hands on deck, and it’s a total PR nightmare.

The public callout version is the absolute worst because you’re trying to fix it while the company’s reputation is under fire.

We finally stopped the bleeding by making accessibility part of the “Definition of Done.” It’s not a separate ticket anymore; if the PR doesn’t pass basic navigation/screen reader checks, it doesn’t get merged.

We’ve started automating our flow checks with accessibility settings enabled, because a screenshot won’t tell you if a button is physically there but TalkBack can’t focus it.

reddit.com
u/blameitonthenight34 — 18 days ago

I’ve shipped bugs that the simulator would never have caught, and at this point, I’ve stopped being surprised by it. The simulator is a useful tool for iteration, but it is not a test environment.

The list of things it fails to replicate is huge: it has no real memory pressure, so you don’t catch background kills. It has no actual GPU constraints, so frame drops on older hardware don’t show up. Touch events, camera access, NFC, cellular conditions all of it is absent.

The one that bit us hardest was a scroll performance issue on the original iPhone SE. It was buttery smooth on the simulator, but janky on the device because the simulator was using our Mac’s GPU. We didn’t catch it until we were two weeks from release. I realized this while using drizz

I’ve realized that I can’t treat the simulator as a substitute for real hardware, especially for the older, slower devices our users actually own. The median user is not on an iPhone 15.

reddit.com
u/blameitonthenight34 — 18 days ago

We had a minor error state in our checkout flow that kept getting pushed to the bottom of the backlog for eight months. It wasn’t a crash and didn’t lose data, so it felt “low severity.”

What we never connected was the support load. 15-20 tickets a week. The support team had a canned response for it, so it was “contained.” But when I actually did the math: 4 hours of support time every single week, for 8 months, on a bug that took a dev half a day to fix.

It cost us over 130 hours of labor because we were only measuring “severity” instead of “ongoing cost.”

It’s been a massive blind spot for us. Does anyone else have a process for connecting support tickets to dev prioritization? We’ve started doing a monthly review of support’s biggest time-sinks.

reddit.com
u/blameitonthenight34 — 18 days ago

We had a checkout flow where the cart would occasionally just reset to empty. No crash report, no exception, no error message. Just a clean state like the user never added anything.

After a lot of digging, we realized it was just how the OS manages memory. If a user backgrounds our app to check a confirmation email or a bank app, the OS kills our process. When they return, our restoration logic either failed silently or just gave up.

It wasn’t a technical failure in the code, it was a logic failure in how we assumed the app would stay alive forever.

reddit.com
u/blameitonthenight34 — 18 days ago