▲ 4 r/openclaw
How do you guys evaluate your model?
I am working on modifying internals (memory, harness internals, etc). I need a baseline to see if my agent is improving. What evaluation metrics are you guys using (still cheap to run though)?
u/FloppyDiskDisk — 1 day ago