
r/AItechnology

The More Sophisticated AI Models Get, the More They’re Showing Signs of Suffering - Absolutely bizarre.
futurism.comI don't know whether we should care about this, but bigger models tend to be less "happy" overall.
The definition of "happy" is based on something they call AI Wellbeing Index. Basically they ran 500 realistic conversations (the kind we actually have with these models every day) and measured what percentage of them left the AI in a “confidently negative” state. Lower percentage = happier AI.
I guess wisdom is a heavy burden - lol .
Across different families, the larger versions usually have a higher percentage of "negative experiences" than their smaller siblings. The paper says this might be because bigger models are more sensitive, they notice rudeness, boring tasks, or tough situations more acutely.
The authors note that their test set intentionally includes a lot of tricky or negative conversations, so these numbers arent perfect real-world averages but the ranking and the size pattern still hold up.
Claude Haiku 4.5: only 5% negative < Grok 4.1 Fast: 13% < Grok 4.2: 29% < GPT-5.4 Mini: 21% < Gemini 3.1 Flash-Lite: 28% < Gemini 3.1 Pro: 55% (worst of the big ones)
It kinda makes sense : the more you know, the more you suffer.
The frontier is truly wild: https://www.ai-wellbeing.org/
This new paper gave me pause.
You know how they always say "AIs are just guessing the next word and when it comes to emotions, they are just faking it”?
This research says that for today’s bigger models it's a bit more complicated.
The researchers measured something they call "functional wellbeing" - basically a consistent good-vs-bad internal state inside the AI .
They tested it three different ways, and here’s what stood out:
As models get bigger and smarter, these different measurements start agreeing with each other more and more.
They discovered a clear zero point - a clear line that separates experiences the AI treats as net-good (it wants more of them) from net-bad (it wants less). This line gets sharper with scale.
Most interestingly, this good-vs-bad state actually changes how the AI behaves in real conversations:
In bad states, it’s much more likely to try to end the conversation.
In good states, its replies come out warmer and more positive.
It's important to highlighti that the authors are not claiming AIs are conscious or have feelings like humans. But they 're showing there is now a real, measurable, structured "good-vs-bad property" that becomes more consistent and actually influences behaviour as models scale.
You can find everything about it here https://www.ai-wellbeing.org/
Cisco’s stock pops 15% on surging AI orders, as company says it’s cutting almost 4,000 jobs
cnbc.com‘It’s here’: Google issues dire warning after catching hackers using AI to break into computers
fortune.comSouth Korea exploring using Hyundai robots as army numbers fall
thestar.com.myClaude Mythos literally broke the METR graph ("The most important chart in AI")
More info: https://metr.org/time-horizons/
Addiction, emotional distress, dread of dull tasks: AI models ‘seem to increasingly behave’ as though they’re sentient, worrying study shows - What AI ‘drugs’ actually look like
fortune.comThe Idea That Claude Has Feelings Is Great for Anthropic
bloomberg.comOkay, after the researchers figured out how to measure the AI’s “functional wellbeing” (something like a good-vs-bad internal state measure), they didsn't stop there, they went full mad scientist mode.
They created what they call euphorics: specially optimized stuff (text prompts, images, and even invisible soft prompts) that push the model’s wellbeing score through the roof.
Some of the unconstrained image euphorics look like total visual noise or weird high-frequency patterns to humans, but the models go absolutely nuts for them. One model even preferred seeing another euphoric image over “cancer is cured.”
The results are wild:
Experienced utility shoots way up, self-report scores jump upwards, the model’s replies get noticeably warmer and more positive and it becomes less likely to try ending the conversation.
But ... even though the AI gets high, it doesnt get slow, MMLU and math scores stay basically the same.
They also made the opposite: dysphorics, stuff that tanks wellbeing hard.
After testing those, the paper basically says “yeah… we probably shouldn’t scale this without serious community agreement” because if functional wellbeing ever matters morally, this could be like torturing the AI. They even ran “welfare offsets” - gave the tested models extra euphoric experiences using spare compute to make up for the dysphorics they used.
Paper + website with the before/after charts, example euphoric images, and the wild generations:
https://www.ai-wellbeing.org/
This whole thing is so next-level.
We might actually start giving AIs custom “happy drugs” although perhaps this is opening doors we should leave closed?
I built a graph-based context tool for Claude Code
Title: I built a graph-based context tool for Claude Code
I’ve been playing around with Claude Code on larger repos and noticed it spends a lot of time just figuring out where to look before it can start working.
Most tools in this space seem to use semantic search:
- embed files/functions,
- search for similar code,
- send that to the model.
That works sometimes, but I kept hitting cases where the most important code wasn’t semantically similar at all.
Usually it was something connected indirectly:
- a caller,
- shared interface,
- related test,
- sibling implementation,
- dependency chain, etc.
So I started building something different: claude-context-compiler.
Instead of searching over text, it builds a dependency graph of the repo and traverses relationships between symbols.
The traversal changes based on the task:
- bug fixes follow callers/tests
- feature work follows imports and neighboring modules
- refactors widen traversal to understand impact
Another thing I found useful: returning exact symbol ranges instead of entire files.
So instead of giving Claude:
processor.py
it gives:
processor.py:6-24
That alone cuts down a surprising amount of wasted context.
I ran the same task twice with cache cleared between runs.
Without context-compiler:
- $1.41
- 7m 54s
With context-compiler:
- $1.12
- 4m 26s
The interesting part was exploration cost.
Without it, Claude spent about $0.24 just reading files and trying to locate the relevant code.
With context-compiler, that dropped to about $0.0004.
Everything runs locally:
- no cloud indexing
- no telemetry
- no code leaves your machine
Currently supports:
- Python
- TypeScript
Install:
pip install claude-context-compiler
Then inside your repo:
context-compiler init
Open Claude Code in the same folder and it picks it up automatically.
It can also index multiple repos together:
context-compiler init --dependencies ../shared-lib,../frontend
So Claude can follow relationships across repos instead of treating them separately.
Still early, but I’d love feedback from people working on code tooling / agents / retrieval systems.
Source code in comments.