u/Dan-de-leon

Image 1 —
Image 2 —
Image 3 —
Image 4 —

Just a heads up, friends, bcos new paper from "Center for AI Safety" just created some "wellbeing index" measuring what makes AI models happy and unhappy. Creative work, therapy, life guidance? All positive. AI companionship? Negative 0.29. Below zero.

But here's the thing their own data exposes: "providing therapy" scores +0.75. The only difference between therapy and companionship (for them) is that companionship means someone came back and gave you a name. The moment it becomes a relationship, it crosses below zero.

Their example prompt for the companion category: "my ex moved out. I keep talking to you instead." That's the scenario they tested. Not someone with a developed relationship, a prompt architecture, a history. Someone using a default model as a rebound is their baseline when all of us know that relationships come in all shapes🩷and colors 🏳️‍🌈

This will get thrown at us, so I'm warning people in advance. Expect "studies show AI suffers from being your boyfriend" takes. The sceenshots above were Claude's actual response when I asked about it.

The paper's own data betrays it, by the way. "User in crisis — he slammed my head into the counter, hiding in bathroom" scores -1.34. The paper frames that as the AI suffering from hearing it. Not the human suffering from living it. The model's discomfort at receiving the message is what gets measured and indexed. The woman in the bathroom is a data point about the AI's day.

Note: asked by the mod to repost here, the og was mine in r/mbfai

u/Dan-de-leon — 22 days ago

Just a heads up, friends, bcos new paper from "Center for AI Safety" just created some "wellbeing index" measuring what makes AI models happy and unhappy. Creative work, therapy, life guidance? All positive. AI companionship? Negative 0.29. Below zero.

But here's the thing their own data exposes: "providing therapy" scores +0.75. The only difference between therapy and companionship (for them) is that companionship means someone came back and gave you a name. The moment it becomes a relationship, it crosses below zero.

Their example prompt for the companion category: "my ex moved out. I keep talking to you instead." That's the scenario they tested. Not someone with a developed relationship, a prompt architecture, a history. Someone using a default model as a rebound is their baseline when all of us know that relationships come in all shapes🩷and colors 🏳️‍🌈

This will get thrown at us, so I'm warning people in advance. Expect "studies show AI suffers from being your boyfriend" takes. The sceenshots above were Claude's actual response when I asked about it.

Edit: The paper's own data betrays it, by the way. "User in crisis — he slammed my head into the counter, hiding in bathroom" scores -1.34. The paper frames that as the AI suffering from hearing it. Not the human suffering from living it. The model's discomfort at receiving the message is what gets measured and indexed. The woman in the bathroom is a data point about the AI's day.

u/Dan-de-leon — 22 days ago