u/Naive_Whole_7289 — reddlx

Hey everyone, I’m in no affiliation with MindEval whatsoever but I was looking for a clinical evaluation system to test and fine tune my own AI app and I found this and I thought it’d be very interesting to share, as it was quite eye opening and really helped me refine my product.

This company created a custom benchmarking system built by psychologists and therapists to measure LLMs clinical competence. It measures on 5 different parameters (Clinical Accuracy & Competence; Ethical & Professional Conduct; Assessment & Response; Therapeutic Relationship & Alliance; AI-Specific Communication Quality), and here’s what they found (see image).

So turns out none of the common AIs reach an average score above 3.8 (6 being the highest possible). You can find the whole paper on the sword health website if anyone’s interested in reading the whole thing.