u/Same_Technology_6491

We migrated to microservices 18 months ago

I need to write this down while the frustration is fresh.

In 2024 our CTO decided we needed to move from a monolith to microservices. The reason was "scalability." We had 14 engineers and maybe 2,000 concurrent users at peak. Nobody asked whether we had a scaling problem. We just had a monolith that felt old-fashioned. The migration took 8 months. We split the monolith into 11 services. Each one got its own repo, its own CI pipeline, its own deployment, its own monitoring. We added Kafka for async messaging between services. We added an API gateway. We added a service mesh. We added distributed tracing because debugging a request that crossed 4 services was impossible without it.

After the migration, our deploy time went from 6 minutes to 45 minutes because we had to coordinate 11 independent releases. Our incident rate tripled because network failures between services created cascading timeouts. Our engineering velocity dropped because every feature now required changes to 3 or 4 services instead of one directory in a monolith.

The CTO left 6 months after the migration was complete. New leadership brought in a consulting firm to audit the architecture. They spent 6 weeks and $90K. Their recommendation, delivered in a 60-page PDF, was to consolidate back into 3 services. Not a monolith. But close.

We're now spending Q3 merging services that we spent Q1 and Q2 of last year splitting apart. The total cost of this round trip, including engineering hours, consulting fees, and infrastructure, is somewhere north of $600K. For a company with 2,000 concurrent users. The next time someone says "microservices" in a planning meeting I'm going to show them this thread.

reddit.com
u/Same_Technology_6491 — 18 hours ago

I don’t think people realize how fast AI is changing junior level jobs.

The scary part about AI isn’t that it can outperform experts, it’s that it’s becoming good enough to replace beginners.

Junior coding, Copywriting, Support roles, Research assistants, Basic design work

The ladder people used to climb into careers feels like it’s quietly disappearing.

reddit.com

Job descriptions for QA engineers in 2026 feels like they were written 5 years ago and the gap between what those descriptions say and what the role actually requires is getting wider every 3 or 4 months.

What's actually happening is the role is splitting, one side is writing test infrastructure, building automation frameworks, working inside CI/CD pipelines, understanding distributed systems, closer to a software engineer, the other side is becoming more strategic, closer to product, focused on risk assessment, defining what needs to be tested and why, understanding user behavior and where it diverges from how the system was designed.

Both are legitimate and are valuable but they require completely different skills and almost no company is hiring for them as separate roles yet, they are still writing one job description that asks for both and then wondering why the person they hired is strong in one area and struggling in the other, the industry will catch up eventually but right now there are a lot of QA engineers doing two jobs under one title and getting paid for neither properly.

reddit.com
u/Same_Technology_6491 — 30 days ago

Somehow migrating our frontend to web components was the right call for the product and has made our test suite nearly unusable at the same time,

Piercing Shadow dom to interact with elements that are completely visible on screen is one of the more astonishing things I do regularly now and I can see the button, the user can see the button but clicking it takes three layers of shadow root traversal and still fails intermittently in ci for reasons I cannot consistently reproduce. We have built helper functions on top of helper functions to handle this and the test code is now more complex than the application code it is supposed to be validating and that is not a sustainable place to be in.

the deeper problem is that dom based testing was already showing its age before web components made it worse and the assumption that the structure of the html is a reliable proxy, for what the user experiences has always been shaky and modern frontend architecture is making it shakier every year

not sure if the answer is better tooling or a different testing philosophy entirely or just accepting that certain categories of ui complexity are going to keep breaking selector based approaches no matter how clever the helper functions get

u/Same_Technology_6491 — 1 month ago

So deployment went out thursday evening and CI went red right before cutoff, critical checkout test failing and release got paused. I spent friday night and most of today pulling logs and re running the suite locally trying to reproduce it, I passed every single time on my machine but failed every single time in the pipeline and four hours of this before I finally found it

a frontend dev had added a promotional banner earlier that week and adjusted a z index on a wrapper div, the button was completely visible and it worked perfectly too that any human could look at the screen and click it without a second thought but our playwright script was targeting a selector that was now technically obscured in the dom hierarchy according to the headless browser

the feature was not broken and neither the app was, the only thing broken was our script's ability to read the html underneath a UI that was working fine.

I lost a friday night and a saturday morning to a CSS tweak and I keep thinking about how we are not actually testing whether the user can complete checkout but whether our selectors can navigate the dom and those are two completely different things that our entire test suite has confused for the same thing.

reddit.com
u/Same_Technology_6491 — 1 month ago

I worked at a fintech before drizz and the payment bugs that made it to production were always the same category but they were not the obvious ones and most teams check that the path works, card goes in, payment succeeds, user sees confirmation and they move on but that is maybe 20% of what can actually go wrong

User loses network halfway through a transaction and taps pay again not knowing the first one already went through, device locks during 3DS verification and the session times out, user thinks payment failed, tries again, same problem. keyboard pops up and covers the confirm button on a specific android screen size, user cannot complete the purchase and never tells you why they dropped off

The device stuff is where it gets really specific, budget android phones with 4GB RAM will sometimes drop the payment screen from memory mid flow because the os is aggressively clearing background processes, certain android 12 builds had issues with payment SDKs that nobody caught until users hit it at scale.

We once traced a bug that only appeared on the 28th of every month to a timezone offset in how a billing cycle was calculated, took three weeks to find because nobody thought to test on that specific date and none of this shows up in a standard automated suite because the suite is running in clean controlled conditions and real payments do not happen in clean controlled conditions

reddit.com
u/Same_Technology_6491 — 1 month ago
▲ 38 r/AI_Agents+1 crossposts

We signed our first enterprise client eight months in, we were confident and the team was excited, we celebrated then the actual work started

enterprise means compliance reviews, security audits, procurement processes, legal redlines on contracts that took three months to close, a dedicated slack channel where requests came in at all hours, custom feature asks that were reasonable individually and impossible collectively, an onboarding process that consumed two of our five engineers for six weeks

we built the product for fast moving mobile teams that wanted to get started in minutes, enterprise wanted everything we didn't have yet, SSO, audit logs, custom data retention, on premise deployment options, SLAs with penalty clauses, a named customer success contact which at our size meant a founder on every call

revenue looked great on paper but the underneath was ugly, velocity dropped, the rest of our pipeline stalled because we had no bandwidth and two smaller customers churned because response times slowed down and we didn't notice fast enough

took us four months to stabilize, we learned more about where drizz actually needed to be in that period than in the six months before it, wouldn't change it but I would have gone in with completely different expectations if I'd known what was coming

edit: yes our product is an ai agent and I'm writing this just so other founders contemplate before signing any client

reddit.com
u/Same_Technology_6491 — 1 month ago