
Distributed tracing across stdio MCP: same trace_id on CrewAI client and FastMCP server (SEP-414 + OpenTelemetry + Jaeger)
I put together a short walkthrough of something that tripped me up when building agentic workflows: MCP over stdio is two processes, so your usual “single-app” tracing story breaks unless you propagate W3C context explicitly.
Problem: A CrewAI agent calls MCP tools (get_order, check_inventory, …) in a child process over a pipe. Logs show something failed; they don’t show which LLM round triggered which tool, or whether latency sits in the model or in a specific tools/call.
Approach: Use OpenTelemetry with MCP semantic conventions and SEP-414 trace context in params._meta, so client spans (MCP request: tools/call …) and server spans (MCP server handle request: tools/call) share the same trace_id even though transport is stdio—not HTTP.
Stack (all local, reproducible):
- CrewAI agent + Ollama (
llama3.2) - FastMCP incident server (synthetic slow/failing inventory for order
#1842) - OTLP → Jaeger
- One-command demo:
./scripts/demo.sh
What you see in Jaeger: crewai.workflow → per-round .llm spans (with gen_ai.input.messages / output when enabled) → MCP client/server spans in one waterfall. The “money shot” is opening check_inventory and reading args + backorder error on the same trace as the agent’s LLM spans.
Video (12 min, architecture + live demo):
https://www.youtube.com/watch?v=qCHK4QlPXh8
Code (MIT):
https://github.com/ekb-dev-ai/mcp-trace-demo
Fast path without Ollama: ./scripts/quick_trace_demo.sh (~5s, MCP + Jaeger only).
Happy to hear how others are handling OTel for MCP—especially HTTP vs stdio and whether you’re standardizing on _meta vs custom headers.