r/deepeval

Hey r/aws! I'm one of the maintainers of DeepEval, an open-source framework to evaluate AI agents (it's like Pytest for LLMs), and I wanted to share a recent integration we released with AgentCore that you might find useful.

Long story short, we found:

AgentCore to be increasingly popular with our community, and
No easy way exist to test these agents without being coupled to AWS's platform

So we made evals for AgentCode 100% open-source by integrating it in DeepEval, it's literally 2 lines of code:

https://preview.redd.it/llfgtg1uww1h1.png?width=1366&format=png&auto=webp&s=f30adca0fa9e66ac6e85e5ed6e42e671a220886b

That's literally it. Under the hood, "instrument_agentcore" traces agentcore agents, while "invoke" calls agentcore allowing DeepEval to capture the trace. And once we have the trace, you can simply use DeepEval's metrics for evals, in this code snippet task completion.

You might also notice that we were able to use Pytest, that's because that's what DeepEval wraps.

Anyway, hope this was helpful, super curious to know whether you see yourself using this integration. Not going to drop a link here for obvious reasons but, LMK if you're interested!

Evals for AWS AgentCore