Which client do you usually use to test different VLMs?
I found it surprisingly hard to find good benchmarks for evaluating AI agent transcription and meeting-summary workflows, so I built this (comment)
I’m curious whether others here have found better benchmark suites, evaluation methods, or open-source tools for comparing agent performance in this space.
u/PeriniM_98 — 14 days ago