r/OReilly_Learning

▲ 1 r/OReilly_Learning+1 crossposts

Managing AI tools on corporate machines, what are the best practices?

We're rolling out Claude Code to our dev team and sysadmin team is unsure how to manage/monitor it.

Questions for other sysadmins:
- Do you allow Claude Code on corporate machines?
- How do you monitor what it does?
- Do you have policies around what it can/can't do?
- Can you block it from accessing certain networks or APIs?
- How do you handle updates/versioning?

It feels like AI tools are growing faster than our ability to manage them. We can monitor browser activity, API calls, file transfers but Claude Code just runs and we have no visibility.

Has your org figured this out? What's your approach?

Any advice would be helpful.

reddit.com
u/OReilly_Learning — 2 days ago
▲ 254 r/OReilly_Learning+1 crossposts

What’s Your Most Controversial IT Opinion?

Fellow sysadmins, what’s your biggest unpopular IT opinion? Not the usual “users should reboot first” stuff, but the things you’ve learned after a few years in the trenches that you probably wouldn’t say too loudly in a meeting.

reddit.com
u/OReilly_Learning — 10 days ago
▲ 210 r/OReilly_Learning+2 crossposts

Google released two early-release chapters from the SRE Book 2nd Edition this week.

>One is the new "AI for SRE" chapter. It's on O'Reilly publication behind a paywall, but a free trial works. Read it last night, sharing the takeaways for anyone who doesn't to read the full thing.

The condensed version:

  1. AI is not a human replacement. The book is firm on this. We still need humans for the high-stakes calls and to maintain the AI itself.
  2. Don't give AI full access on day one. Build trust the way you would with a junior engineer. Let it suggest fixes first, fix small issues next, only then expand its scope.
  3. If the agent can take an action, it must have a rollback. If there is no undo path, the access should not be granted. This is the line I think most teams shipping agents are skipping right now.
  4. When the agent fails or gives a bad suggestion, flag it. The chapter leans on the same principle as good postmortem culture, more feedback and more context means better future execution.
  5. During incidents, the time-saver is not the fix, it is the searching. The chapter frames the agent as the thing that finds the right answer fast across tabs, runbooks, and prior incidents, instead of the thing that pushes the fix.
  6. Dashboards tell you something is broken. AI is positioned as the layer that tells you why, by reading the tickets and the user feedback that the dashboards do not capture.
  7. The framing that stuck with me most: AI does not reduce SRE workload, it raises the reliability ceiling. Cheaper reliability does not mean less work, it means higher reliability demanded across more services. Jevon's paradox applied to ops.

What I would add as a practitioner: the 5-level maturity model they propose is useful, but the gating criteria between levels is where the real engineering lives. "Agent suggested 50 fixes, 47 were good" sounds great until you ask which 3 were wrong and what they would have broken. Most teams I see skipping straight to autonomous remediation are not doing that work.

Worth a read if you are scoping AI in operations in the next year.

(Disclosure: I run Sherlocks, which builds in this space. This is not a pitch for it.)

reddit.com
u/OReilly_Learning — 12 days ago