How do you actually triage 1000+ regression tests without losing your mind?
I run a large Playwright regression suite (~1000 tests, TypeScript). I've invested a lot in making the suite itself solid — strict data-testid selectors only, no flimsy CSS/XPath locators, and I've built custom tooling with Claude Code (AI-assisted skills) that runs the tests and auto-generates detailed reports on Confluence so the whole team can review results without touching the codebase.
So the test infrastructure is pretty tight. My problem isn't writing or maintaining tests — it's what happens after they finish.
Every execution gives me a wall of results and I spend a lot of time figuring out what's actually going on. For each failure I have to determine: is this a flaky test? A real test defect? An actual product bug? A one-time environment issue (slow load, timeout, whatever)?
I end up re-running tests manually just to check if the failure reproduces. When it does, I still go back and forth with manual QA or product to confirm whether it's a known behavior or a real bug. That loop alone eats hours.
Everything runs locally for now — no CI/CD yet, no historical data on pass/fail trends. Just me going through the Confluence report after each run trying to make sense of it.
For those of you dealing with large Playwright suites:
- How do you classify failures efficiently? Do you have a system for separating flaky from real, even without CI history?
- How do you handle flaky tests — retries, quarantine, tagging? Playwright has built-in retries but I'm not sure how people actually use them in practice at this scale.
- When you suspect a real bug, what's your process before escalating? Do you just file it and move on, or do you verify manually first?
- Any techniques or workflows that helped with triage specifically? Even just how you organize your review process would be useful.
Would love to hear how other teams deal with this because right now it feels like I'm doing most of it in my head and it doesn't scale.