u/therealabenezer

▲ 4 r/platform_engineering+3 crossposts

Mythos and observability: what happens after AI finds the vulnerability?

Hey folks, I work on the IBM Observability team and wanted to get your take on Project Glasswing and Claude Mythos Preview.

Mythos is being used by select partners, including IBM, to find and validate software vulnerabilities much faster than traditional workflows. IBM is also expanding tools like IBM Concert to unify application, infrastructure and network signals into a single operational view.

Curious how people think this should work in practice: if AI can surface more vulnerabilities faster, what should observability platforms show to help teams prioritize by business impact, reduce noise and move from detection to response?

reddit.com
u/therealabenezer — 3 days ago
▲ 2 r/u_therealabenezer+1 crossposts

Quick poll to set the stage. If your team is running a GenAI app at work (not solo side projects), what is it?

  • Customer-facing chatbot or support agent
  • Internal knowledge assistant (RAG over docs)
  • Industry-specific workflow automation
  • Summarization (meetings, docs, incidents)
  • Code generation or dev productivity
  • Agentic workflows

Drop your questions for Jayanth on instrumenting LLM and agent apps, tracing hybrid stacks, OpenTelemetry, sampling and cost, and where APM is heading.

reddit.com
u/therealabenezer — 22 days ago

Hey all, I'm Abenezer, a PM on the IBM Observability team. Wanted to share something from our Research group that I think is relevant to anyone thinking about where AI agents fit in IT operations.

ITBench is an open-source framework that spins up real Kubernetes environments, injects faults (service outages, compliance gaps, cost anomalies), and measures how well AI agents diagnose and fix them. It covers three domains: SRE, CISO, and FinOps.

It was presented at ICML 2025 and most recently at SRECon.

Repo: https://github.com/itbench-hub/ITBench

Curious what this community thinks. What incident types or environments would make a benchmark like this more realistic for what you deal with day to day?

If there's interest, I can bring in the IBM Research scientists behind ITBench for an AMA. Let me know.

u/therealabenezer — 1 month ago

I work on the IBM Observability team, and I will be joined by a PM who works on IBM Instana’s LLM observability feature. We are curious how folks are monitoring generative AI workloads in production. When you deploy large language models, it can be hard to see what is going on. We want to hear about the pain points around measuring the latency of each step, tracking how many tokens are processed and understanding how much cost your model is burning.

For context, Instana’s GenAI observability delivers high‑fidelity telemetry with one‑second metric granularity and end‑to‑end tracing. It collects LLM‑specific metrics such as token usage, latency and request cost, and you can instrument applications using the Traceloop SDK, exporting traces through an agent or directly to Instana depending on your environment. Instana also integrates with vLLM to provide detailed runtime metrics like throughput, latency and resource utilization. If you are also curious about Instana's LLM monitoring capabilities drop your questions below.

reddit.com
u/therealabenezer — 2 months ago

As AI-generated code becomes the norm, developers are shipping faster than ever. How are you checking AI-assisted code for security before it goes live? Are you relying on manual review, scanners, guardrails in the IDE, or something else? Have you found an approach that actually works

reddit.com
u/therealabenezer — 2 months ago