r/OReilly_Learning

Google released two early-release chapters from the SRE Book 2nd Edition this week.

>One is the new "AI for SRE" chapter. It's on O'Reilly publication behind a paywall, but a free trial works. Read it last night, sharing the takeaways for anyone who doesn't to read the full thing.

The condensed version:

AI is not a human replacement. The book is firm on this. We still need humans for the high-stakes calls and to maintain the AI itself.
Don't give AI full access on day one. Build trust the way you would with a junior engineer. Let it suggest fixes first, fix small issues next, only then expand its scope.
If the agent can take an action, it must have a rollback. If there is no undo path, the access should not be granted. This is the line I think most teams shipping agents are skipping right now.
When the agent fails or gives a bad suggestion, flag it. The chapter leans on the same principle as good postmortem culture, more feedback and more context means better future execution.
During incidents, the time-saver is not the fix, it is the searching. The chapter frames the agent as the thing that finds the right answer fast across tabs, runbooks, and prior incidents, instead of the thing that pushes the fix.
Dashboards tell you something is broken. AI is positioned as the layer that tells you why, by reading the tickets and the user feedback that the dashboards do not capture.
The framing that stuck with me most: AI does not reduce SRE workload, it raises the reliability ceiling. Cheaper reliability does not mean less work, it means higher reliability demanded across more services. Jevon's paradox applied to ops.

What I would add as a practitioner: the 5-level maturity model they propose is useful, but the gating criteria between levels is where the real engineering lives. "Agent suggested 50 fixes, 47 were good" sounds great until you ask which 3 were wrong and what they would have broken. Most teams I see skipping straight to autonomous remediation are not doing that work.

Worth a read if you are scoping AI in operations in the next year.

(Disclosure: I run Sherlocks, which builds in this space. This is not a pitch for it.)

r/OReilly_Learning

Managing AI tools on corporate machines, what are the best practices?

What’s Your Most Controversial IT Opinion?