
r/AITestingtooldrizz

what nobody tells you about the top 1% of consumer apps:
it’s not about the features. it’s about the feeling.
your brand deserves a PERSONALITY. it needs to be memorable.
create your custom fully animated mascot in 10 minutes @ ZIGGLE.ART 🦄
We migrated to microservices 18 months ago
I need to write this down while the frustration is fresh.
In 2024 our CTO decided we needed to move from a monolith to microservices. The reason was "scalability." We had 14 engineers and maybe 2,000 concurrent users at peak. Nobody asked whether we had a scaling problem. We just had a monolith that felt old-fashioned. The migration took 8 months. We split the monolith into 11 services. Each one got its own repo, its own CI pipeline, its own deployment, its own monitoring. We added Kafka for async messaging between services. We added an API gateway. We added a service mesh. We added distributed tracing because debugging a request that crossed 4 services was impossible without it.
After the migration, our deploy time went from 6 minutes to 45 minutes because we had to coordinate 11 independent releases. Our incident rate tripled because network failures between services created cascading timeouts. Our engineering velocity dropped because every feature now required changes to 3 or 4 services instead of one directory in a monolith.
The CTO left 6 months after the migration was complete. New leadership brought in a consulting firm to audit the architecture. They spent 6 weeks and $90K. Their recommendation, delivered in a 60-page PDF, was to consolidate back into 3 services. Not a monolith. But close.
We're now spending Q3 merging services that we spent Q1 and Q2 of last year splitting apart. The total cost of this round trip, including engineering hours, consulting fees, and infrastructure, is somewhere north of $600K. For a company with 2,000 concurrent users. The next time someone says "microservices" in a planning meeting I'm going to show them this thread.
Our entire CI pipeline passes in 4 minutes lol
I inherited a codebase with 1,100 tests and every build is green. Deployment pipeline runs in 4 minutes flat.
Last month I started actually reading the tests. Not running them. Reading them. I wanted to understand what they covered. About 340 of them assert that a function "does not throw." That's it. The function returns garbage, returns null, returns a completely wrong value. But it doesn't throw, so the test passes and another 200 are snapshot tests that nobody has reviewed since 2024. The snapshots were auto-updated after a major refactor and committed as a batch. They're testing that the current output matches the current output. If the output changed tomorrow, someone would update the snapshot and move on.
I flagged this to my lead. He already knew. He said the pipeline speed is a feature. Product org gets nervous when builds take more than 10 minutes. The last engineer who added thorough integration tests pushed the build to 22 minutes and got asked to "optimize" it. He deleted his tests instead.
So now I sit in standups listening to people say "all tests pass" like it means something. I've started adding real assertions to the tests I touch, but I'm doing it slowly so the build time creeps up gradually. Like boiling a frog, except the frog is a deployment pipeline and I'm the one turning up the heat. I don't think we've caught a real regression from this test suite in over a year. The dashboard says 99.8% pass rate. I think the real number, if these tests tested anything meaningful, would be closer to 70%. The scariest thing is that nobody disagrees with me when I say this privately. They just don't want to be the person who makes the build slow.
Our senior engineer left. His code has no tests, no docs, and 47 comments that just say "trust me." We've been reverse-engineering it for 3 months.
He was here for 5 years. He built the core transaction engine. He's the reason the system works. He's also the reason nobody else understands how it works.
When he gave his notice, our CTO asked him to do a knowledge transfer. He scheduled three one-hour sessions. In the first session, he opened the main file 4,200 lines ,scrolled through it, and said "this is pretty self-explanatory." In the second session, he explained the caching layer by drawing a diagram on a whiteboard that he immediately erased because "you'll figure it out once you look at the code." The third session was cancelled because he took a PTO day.
After he left, three of us were assigned to own his code. We started reading it.
The variable names are single letters. Not in small utility functions. In the main business logic. p, t, x, q. One function takes 11 parameters, all single letters. There's a comment above it that says "don't refactor this. the order matters." We tested it. The order does matter. We don't know why.
There are 47 comments in the codebase that say "trust me" or some variation. "Trust me, this needs to be here." "I know this looks wrong. Trust me." "Don't remove this sleep(200). Trust me." We removed the sleep once. Production went down for 40 minutes.
We've been reverse-engineering this system for 3 months. We have a shared document that's 22 pages long called "What We Think This Does." Page 1 starts with "We are not confident in any of this."
He was a brilliant engineer. I mean that sincerely. The system handles millions of transactions and has never had a data integrity issue. He built something that works perfectly and is completely incomprehensible to anyone who didn't build it. I don't know if that's a success or a failure. It might be both.
I don’t think people realize how fast AI is changing junior level jobs.
The scary part about AI isn’t that it can outperform experts, it’s that it’s becoming good enough to replace beginners.
Junior coding, Copywriting, Support roles, Research assistants, Basic design work
The ladder people used to climb into careers feels like it’s quietly disappearing.
I deleted 40,000 lines of dead code this quarter.
I joined in January and by February I had mapped out the codebase and found that roughly 30% of it was unreachable. Feature flags that were never turned on. Entire API endpoints that nothing calls. A complete notification system that was deprecated two years ago and replaced, but the original code was never removed. It still has its own database table with 4 million rows that gets backed up nightly. I started cleaning it up. Small PRs. Each one removed a specific dead path, with evidence that nothing referenced it. I checked analytics, traced the dependency graph, confirmed there were zero callers. Every PR was reviewed and approved.
In three months I removed 40,000 lines across 87 PRs. The codebase was measurably easier to navigate. Onboarding a new engineer took less time because there were fewer wrong turns. IDE search returned relevant results instead of burying them under deprecated code.
Then my VP called a meeting. He wanted me to stop. His word was "destabilizing." He said two things. First, every deletion is a risk even if the code is dead, because "you never know what's connected under the surface." Second, the PRs were triggering too many review cycles and pulling attention from feature work. I pushed back with the data. He said "I appreciate the initiative but I need you focused on the roadmap."
The codebase now has 40,000 fewer lines of dead code and roughly 60,000 more lines of dead code that I identified but am not allowed to remove. I documented all of it. When the next engineer joins and asks "what does this code do," the answer for a third of the codebase is "nothing, but don't touch it."
The contractor billing $250/hour works 12 hours a week. He's more productive than every full-time engineer on the team
We hired a contractor to help with a backend migration. His rate was $250/hour. He billed roughly 12 hours a week. The CFO flagged it during a review: "Why are we paying someone $13K a month for part-time work?"
I pulled the numbers. In his first month he closed 34 tickets. The average full-time engineer on the same team closed 19. He didn't attend standup. He wasn't in Slack during business hours. He never went to retro or sprint planning. He just read the tickets, shipped code, and left
His code quality was consistently above average. Fewer revision requests than anyone except our most senior IC. He never introduced a regression in 6 months of work
At one point someone asked him to join the daily standup "for visibility." He said no. He said he'd rather spend that 15 minutes writing code and that he'd send an async update at the end of his working block instead
Management eventually decided not to renew his contract. The reason given was "cultural misalignment." The real reason was that he made a room full of full-time engineers with equity and benefits look slow by comparison, and it was uncomfortable for everyone involved
He now works for our competitor. Same rate. Same hours. Their loss metrics went up after he left. Ours went down
I made ViewBuddy, an iPhone app for finding what to watch through friends
I built ViewBuddy because movie/show discovery still feels oddly disconnected from the people you actually watch things with.
The app is for:
- seeing what friends are watching, rating, and reviewing
- comparing taste before you trust a recommendation
- building watchlists and playlists
- browsing as a guest before deciding whether to create an account
I'm the developer, and I'm looking for practical feedback from people who try a lot of mobile apps:
Is the purpose clear in the first minute?
Does the feed feel useful before you have a lot of friends on it?
What would make you invite one friend?
Does anything feel confusing, slow, or unnecessary?
App Store: https://apps.apple.com/us/app/viewbuddy-rate-review/id6759533775
No pressure to be nice. Specific criticism is much more useful than generic encouragement.
The best debugging session of my career was caused by a timezone bug that only existed for one hour per year.
I'm going to tell you about a bug that took 3 months to diagnose and the fix was changing a single character. We had a billing system that occasionally double-charged users. Not a lot of users. And not consistently. Support would get a handful of tickets, we'd refund them, log it, and move on. It happened maybe once a month.
I was assigned to find the root cause. I spent two weeks staring at logs. The pattern made no sense. Different users, different payment methods, different amounts. The only commonality was the timing. Every incident happened between 1 AM and 3 AM UTC. But not every night. Random nights.
I built a dashboard tracking every charge event. After a month of data I noticed something. The double-charges only happened on nights when our batch reconciliation job ran during a daylight saving time transition in a timezone that wasn't even ours. We stored timestamps in UTC but one microservice, a legacy service nobody owned, used America/New_York for its internal scheduling. During the spring-forward transition, the service would skip an hour and re-run the reconciliation. During the fall-back transition, it ran the same hour twice.
The double-charges were happening during the fall-back. The reconciliation ran, charged users who owed a balance, then "woke up" again in the repeated hour and ran the same batch because the guard clause checked the wall-clock time, not a monotonic counter.
The fix was changing < to <= in a timestamp comparison so the second run would see the charges from the first run. One character. Three months of investigation. Twelve customer refunds. A post-mortem that was 4,000 words long for a one-character diff. The next person who tells me timezone bugs are "edge cases" is getting this story read to them at full volume.
Test automation has a dirty secret: most "automated" test suites require more human hours than manual testing
I realize this sounds like an exaggeration. It's not.
I spent the last 18 months talking to mobile engineering teams about their testing workflows. Here's what I found over and over. The team adopts Appium or Espresso or XCUITest. They write 100-200 automated tests. They feel good about their coverage. Then 3 months pass.
The UI changes. Selectors break. Tests start failing for reasons that have nothing to do with bugs. QA engineers spend 20+ hours a week repairing tests. Developers stop trusting the test suite because it cries wolf too often. New tests don't get written because all the bandwidth goes to maintenance.
One QA lead told me she spent more hours maintaining automated tests than she had previously spent doing manual testing. Let that sink in. The automation made her job harder, not easier.
The root cause is the same everywhere. Tests are coupled to code-level identifiers that change whenever the UI changes. The abstraction is wrong. You're not testing "does the user see a login button." You're testing "does element with ID btn-login-primary exist in the DOM." Those are different questions and they diverge every time a designer touches the interface.
We built a tool that tests the first question instead of the second. Vision AI looks at the screen and finds elements the way a human does. The tests are written in plain English. When the UI changes, the test still passes because it was never coupled to the code.
14 paying companies are using it in production right now. Launched on Product Hunt today. But the bigger discussion I want to have is this: why did we accept for so long that "automated testing" means "maintaining a fragile test suite?" Was there ever a version of selector-based testing that actually scaled without drowning the team?
Genuinely curious what this sub thinks.
Unicorns were paying us before we even had a landing page. Today we finally have one. Live on PH.
This is kind of embarrassing to admit but here goes. For the first 8 months of our company, we didn't have a website. But we had paying customers.
Let me explain how that happened because it still doesn't make complete sense to me.
My cofounders and I quit our jobs in mid 2024 to build Drizz, a testing tool for mobile apps. We'd all worked at companies where the QA team spent more time fixing broken tests than finding actual bugs. The tests broke every release because they were built on selectors and element IDs that changed whenever a developer touched the UI. Everyone in mobile engineering knows this pain. Nobody had fixed it.
We decided to use vision AI. Instead of pointing tests at code level identifiers, our agent looks at the screen and finds elements by seeing them. Same way a human tester would. You write "tap the login button" and it finds the login button visually. When the designer moves it or recolors it, the test still works.
We built a rough version. Then instead of making a landing page, I did something that in hindsight was either smart or stupid. I started DMing QA leads directly. No pitch deck. No one pager. Just "can I show you something for 15 minutes on a screen share?"
Some of those calls went terribly. The prototype would crash. I'd scramble to restart it while making small talk about their testing setup. One time the app crashed three times in a 20-minute call. The QA lead laughed and said "well at least I can see it works when it works."
He signed up. A unicorn in India. Then two more through referrals. Then an enterprise deal.
By the time we raised our seed round, we had real customers spending 15+ hours a week on the platform. Our investor deck had revenue charts and customer quotes. We finally built a website a few months ago.
Today we're launching on Product Hunt. It still feels strange because for a year, our entire go to market was one on one conversations. No content marketing or ads. Just showing people the product and asking if it fixed their problem.
I'm not saying that's the right way to do it. It clearly doesn't scale. But I think there's something useful in being so early that you don't have a website to hide behind. Every conversation is just you, your product, and someone who has the problem you're trying to solve. You can't fake that.
If you work on mobile apps and testing is a pain point, I'd love to hear what your setup looks like. And if you want to try Drizz, the link is in my first comment.
I built a product used by 40,000 people. My revenue is $0. I cannot figure out how to charge without losing everyone
This is the most humiliating thing I've admitted publicly. I have 40,000 monthly active users. I have zero revenue. I've tried to charge three times and lost 60% to 80% of my users each time.
First attempt: $9/month. Lost 73% overnight. Rolled it back within 48 hours.
Second attempt: freemium with premium features. Only 1.2% converted. The revenue didn't cover my infrastructure costs.
Third attempt: usage-based pricing. Better conversion at 4%, but power users (the ones paying) were also the ones most likely to churn because they found alternatives.
My problem is structural. I built something that solves a small pain for many people. Not a large pain for a few. A small pain isn't worth $9/month to anyone individually. But it's worth enough to use for free.
I've been told to pivot to enterprise. To charge based on team size. To find the subset of users who would pay more. Every suggestion requires me to build a different product than the one 40,000 people actually use.
VCs won't touch me because I can't show revenue. Potential acquirers lowball me because the users are "unmonetized." My friends who launched worse products with worse metrics but $5K MRR are raising rounds I can't.
Traction without revenue is a prison. You built something people want but not enough to pay for. If anyone's been here and found the exit, I need to hear it.
My manager asked ChatGPT whether to promote me. It said no. He showed me the screenshot.
Mid year review. I walk in expecting the usual conversation. My manager turns his laptop around. There's a ChatGPT window. He'd pasted my self review, my peer feedback, and my OKR scores into it and asked: "Should this employee be promoted?"
The answer was no. "Meets expectations but lacks evidence of cross functional leadership impact."
He read it to me out loud. Like it was a diagnosis.
I asked if he agreed with it. He said "I mean, it makes some good points." This is a man who has watched me debug production at midnight and talk a panicking client off a ledge. He's outsourcing his opinion of me to autocomplete.
I asked what HIS take was, separate from the AI. Long pause. "I think you're ready but I need to build the case." He'd been using ChatGPT to build the case against me because building the case for me required actual effort.
I got promoted the next cycle. After I went over his head. Not because aii changed its mind. Because his boss still forms opinions the old fashioned way.
Somewhere in corporate America right now, your career is being discussed by a language model that has never met you. Sleep well.
I write code slower than AI now. I also catch bugs the AI doesn't. My manager only sees the first part
Since January my team started measuring output by PRs per week. Not by choice. A VP installed a tool that tracks commit frequency, PRs merged, and lines changed. Everyone pretends it doesn't affect behavior. It affects everything.
I'm a senior engineer. I read the codebase before I write. I think about edge cases. I review other people's work carefully. I catch things that would break in production but pass every test in staging. Last quarter I caught a race condition in someone else's AI-generated PR that would have corrupted user data for about 800 accounts.
My PR count is the lowest on the team. The dashboard shows this in red. Literally red. My manager mentioned it in our 1:1 last month. "Your output has been lower this quarter." I pointed to the race condition I caught. He said "right, but that doesn't show up in the metrics."
The engineer with the highest PR count ships fast and breaks things regularly. He generates code with Claude, submits it, and moves on. His reviews are superficial. His bugs get caught by people like me. Then he fixes them in another PR, which counts as more output.
He got promoted last cycle. I didn't. His dashboard is green. Mine is red. The metrics are correct. The metrics are also useless.
the entire concept of a dedicated QA team is probably going to be obsolete within 5 years and most people in the industry are not ready to talk about it
I want to be careful here because i am not saying quality goes away. quality matters more than ever. i am saying the organizational model of a separate team responsible for finding bugs after developers write code is fundamentally broken and the industry is slowly figuring that out.
the best engineering teams I have worked with or studied do not have a traditional QA handoff. quality is embedded. developers write tests as part of the work. the pipeline catches regressions automatically. the definition of done includes quality criteria from the start.
the reason dedicated QA teams exist in most companies is because historically it was hard to shift that responsibility earlier and automate it reliably. those barriers are eroding. when you can write a test in plain english and run it on real devices in a CI pipeline without a specialist maintaining it, the case for a separate quality gate staffed by humans gets weaker.
I think the QA role is going to bifurcate. some people will move into quality engineering embedded in product teams. others will specialize in security testing, accessibility, performance, the things that genuinely need deep expertise. the middle, the people doing manual regression and maintaining automation scripts, that middle is going to shrink considerably.
nobody in the industry wants to say this out loud because it is uncomfortable. but the trajectory seems pretty clear.
writing tests in code was never the right abstraction for most of what QA teams actually do and the industry is only now starting to admit it
hear me out because i know this is going to ruffle some feathers.
when automated testing took off, the people building the tooling were engineers. so naturally the interface they built was code. you write scripts. you define locators. you structure your tests like software because the people designing the tools thought in software.
but the people doing most of the testing were not software engineers and are not software engineers. they understand user behavior, edge cases, what a real person would do when something goes wrong. that knowledge is valuable and deeply human. but to translate it into automation they had to first learn to think like a programmer, learn xpath or css selectors or whatever the framework du jour was, and maintain that knowledge as the tooling evolved.
we took people whose value was in understanding user experience and made them learn infrastructure. and then we were surprised when test maintenance became a bigger burden than the testing itself.
the whole industry quietly assumed that if your QA team could not write code they were not serious professionals. that assumption filtered hiring, shaped tooling decisions, and pushed teams toward complexity that did not actually serve the goal of shipping better software.
i think we are finally in a moment where people are questioning that assumption and it is long overdue. the goal was always working software, not sophisticated test scripts.
A "vibe coder" joined our team 3 months ago. I just mass-reverted 40 of his PRs.
He was a product designer who learned to code with ChatGPT Management loved it and everyone was like See.... AI is making everyone a developer! and he shipped fast to like multiple PRs a day cause the features appeared out of nowhere and that's Everyone was impressed.
Then the bug reports started and the kind that wake you up at 2am.....
I reviewed his code, and every file was AI output pasted in with zero error handling, one API endpoint accepted any JSON payload and wrote it directly to the database, and I swear...I wish I was exaggerating.
He didn't know what he didn't know it was as simple as that, the code worked on the happy path because ChatGPT is good at happy paths but It collapsed the moment a real user did something unexpected.
I spent a week untangling the damage, reverted around 40 PRs and rewrote 3 services from scratch....
The response from the management was like can't we just pair him with a senior dev?
and I was like sure, now your free developer costs one senior engineer's full-time attention, that's not a productivity gain, that's a tax on your best people.
AI makes it easy to write code but It does not make it easy to write software, those are different things and the gap between them is where your production incidents live.....
we have collectively decided that 15% test flakiness is acceptable and that decision is costing the industry billions
think about what a 15% flakiness rate actually means in practice across an industry.
every day, millions of CI pipeline runs are failing on tests that do not represent real bugs. engineers are spending time investigating failures that are noise. builds are being blocked by phantom problems. reruns are consuming compute. and most importantly, teams are gradually losing trust in their own safety nets.
there is a number i keep coming back to. Google published research years ago saying that 16% of their tests showed flakiness and it consumed roughly 2% of total engineering time across the organization. google has tens of thousands of engineers. do the math on what 2% of that looks like in salary alone.
now scale that across the entire industry. every company running mobile CI pipelines with meaningful flakiness rates. the accumulated cost in wasted compute, wasted engineer hours, delayed releases, and bugs that slip through because people stopped trusting the results is genuinely staggering.
and the wild part is that most teams have normalized it. 15% flakiness gets described as a known issue or a quirk of the environment. it is treated like weather. something you work around rather than something you fix.
we made peace with a problem that is actively expensive and i am not sure when or why that happened.
QA is treated as a cost center because QA teams taught companies to treat them that way
this one is going to sting a little but i think it is worth saying.
for years the narrative in QA has been about proving value, justifying headcount, showing ROI on testing investment. and the way teams have typically done that is by measuring things like bugs found, test cases written, coverage percentages. vanity metrics that look good in a spreadsheet but do not actually connect to business outcomes.
leadership looks at QA and sees a team that finds bugs and slows down releases. they do not see a team that protects revenue, reduces churn, prevents the kind of production incidents that make headlines. that is a positioning problem and it belongs to QA leadership.
the teams i have seen get real investment and real respect are the ones that stopped speaking the language of testing and started speaking the language of risk and revenue. a bug in checkout that affects 3% of users on Samsung devices is not a QA metric. it is a revenue number. frame it that way and suddenly the conversation changes.
QA has a perception problem that better tooling will not fix. it is a communication and positioning problem that has been there for a long time.