r/LLMAgenticLearning

Distributed tracing across stdio MCP: same trace_id on CrewAI client and FastMCP server (SEP-414 + OpenTelemetry + Jaeger)
▲ 2 r/LLMAgenticLearning+1 crossposts

Distributed tracing across stdio MCP: same trace_id on CrewAI client and FastMCP server (SEP-414 + OpenTelemetry + Jaeger)

I put together a short walkthrough of something that tripped me up when building agentic workflows: MCP over stdio is two processes, so your usual “single-app” tracing story breaks unless you propagate W3C context explicitly.

Problem: A CrewAI agent calls MCP tools (get_ordercheck_inventory, …) in a child process over a pipe. Logs show something failed; they don’t show which LLM round triggered which tool, or whether latency sits in the model or in a specific tools/call.

Approach: Use OpenTelemetry with MCP semantic conventions and SEP-414 trace context in params._meta, so client spans (MCP request: tools/call …) and server spans (MCP server handle request: tools/call) share the same trace_id even though transport is stdio—not HTTP.

Stack (all local, reproducible):

  • CrewAI agent + Ollama (llama3.2)
  • FastMCP incident server (synthetic slow/failing inventory for order #1842)
  • OTLP → Jaeger
  • One-command demo: ./scripts/demo.sh

What you see in Jaeger: crewai.workflow → per-round .llm spans (with gen_ai.input.messages / output when enabled) → MCP client/server spans in one waterfall. The “money shot” is opening check_inventory and reading args + backorder error on the same trace as the agent’s LLM spans.

Video (12 min, architecture + live demo):
https://www.youtube.com/watch?v=qCHK4QlPXh8

Code (MIT):
https://github.com/ekb-dev-ai/mcp-trace-demo

Fast path without Ollama: ./scripts/quick_trace_demo.sh (~5s, MCP + Jaeger only).

Happy to hear how others are handling OTel for MCP—especially HTTP vs stdio and whether you’re standardizing on _meta vs custom headers.

u/Fabulous-Art4440 — 3 days ago