▲ 0 r/Observability
Looking for honest takes from teams shipping AI to real users (not weekend projects).
Trying to cut through vendor noise. There's a lot in this space (from a parent company PoV) and I'm trying to understand if there are any ecosystem advantages that come from using bundled products (e.g., Google Vertex in Gemini Enterprise Agent Platform) vs something like Braintrust or LangSmith.
Some categories I had in mind, but feel free to mention whatever's in your stack:
- Standalone evals: Braintrust, Galileo, Confident AI, Maxim, Patronus, Coval
- Data-platform-bundled: Databricks+MLflow+, ClickHouse (acquired Langfuse)
- Cloud-native: Google (Vertex AI evals, Stax), AWS (AgentCore evals, Strands evals), CoreWeave (acquired Weights & Biases)
- Frontier labs: OpenAI's evaluations (+ Promptfoo acquisition)
- Obs incumbents: Datadog evals, Cisco/Splunk o11y (acquired Galileo)
- ML obs extending in: Arize Phoenix, Comet Opik
Things I'd love to hear:
- What did you start with, what did you end up on, and why did you switch?
- Did you go with whatever your cloud/obs vendor bundled, or pay extra for standalone?
- Anything that looked great in trial but fell apart in real usage?
- If you're on an OSS framework (Ragas, Phoenix, DeepEval, MLflow) — did you stay or move to a hosted product?
For context — please share team size + what kind of AI you ship (agents / RAG / voice / etc.)
u/BeneficialAdvice3202 — 18 days ago