▲ 2 r/sre
What do you consider a “bad” page-worthy alert?
I’ve been reviewing alert quality lately and noticed a few patterns that seem to create noise:
- alerts with no owner
- alerts with no runbook
- symptom alerts that self-resolve
- CPU/memory alerts that are not tied to user impact
- duplicate paging from app + infra layers
- short “for” windows on bursty workloads
- vague alert descriptions with no action path
For SRE teams here, what makes an alert page-worthy in your environment?
Do you use a checklist or rubric before an alert is allowed to page someone?
u/Software_Sennin — 23 hours ago