u/Ok-Classroom-2377

How do you deal with people who keep giving unwanted advice?

I don’t mean well-intentioned help once in a while. I’m talking about people who constantly jump in with advice I didn’t ask for or nor have an proper knowledge of that thing to give an advice.

How do you deal with them??

reddit.com
u/Ok-Classroom-2377 — 4 days ago

Anyone else get that “you forgot something” feeling… and be right most of the time?

Whenever I’m heading out, I get this random gut feeling that I’m missing something. The weird part is—it’s right like 90% of the time.

It’s not like I’m consciously remembering what I forgot, just a vague sense that something’s off.

Is this just my brain picking up on patterns, or am I just paranoid lol?

reddit.com
u/Ok-Classroom-2377 — 9 days ago
▲ 9 r/sre

Not talking about production outages, but the smaller CI/CD failures that block engineers for a while: IAM / permission issues, GitHub Actions / pipeline failures, Docker / build problems

The pattern I keep seeing: failure blocks work ->

someone spends 1–3 hours debugging -> fix is found -> things move on

a similar issue shows up later and the cycle repeats

Individually these aren’t major incidents, but over time they add up and feel like a steady source of toil.

From an SRE perspective, I’m curious how teams think about this:

- Do you track these kinds of failures or treat them as background noise?

- Are there systems in place to capture and reuse fixes (runbooks, automation, policy checks)?

- At what point do you consider recurring CI/CD failures worth addressing as a reliability problem instead of just handling them reactively?

Feels like they sit in a gray area — not quite incidents, but not harmless either.

reddit.com
u/Ok-Classroom-2377 — 19 days ago

Trying to get a realistic sense of this from people running teams. Not talking about production outages — more the day-to-day CI/CD failures that block work for a while: AWS permission issues, GitHub Actions breaking, Docker builds failing for unclear reasons

The pattern I keep seeing: something fails → someone digs through logs for 1–3 hours → fix it → move on

…and then a similar issue shows up again later

I’m starting to wonder how much this actually costs in terms of team velocity, but I haven’t seen many teams track it properly.

Curious:

Do you track how often these failures happen or how long they take to fix?

When you fix one, does that knowledge actually get captured anywhere useful?

Or is it mostly “figure it out again next time”?

Feels like a lot of time gets lost here, but not sure how common that is.

reddit.com
u/Ok-Classroom-2377 — 19 days ago

As an engineering manager I'm trying to get a clearer picture of the real cost of deployment failures on team velocity.

Not production outages — I mean the CI/CD failures that block the team for 1-3 hours while someone figures out what went wrong. AWS permission errors, GitHub Actions failures, Docker build issues.

A few questions:

- Do you track how often these happen and how long they take to fix?

- When your team fixes a deployment failure, does that fix get documented anywhere? Or does it live in someone's head?

- Have you ever tried to calculate what this costs your team per month in lost engineering time?

I'm working on understanding this problem better. Happy to share what I find with anyone who responds.

reddit.com
u/Ok-Classroom-2377 — 19 days ago