r/AItechnology

▲ 625 r/AItechnology+20 crossposts

I don't know whether we should care about this, but bigger models tend to be less "happy" overall.

The definition of "happy" is based on something they call AI Wellbeing Index. Basically they ran 500 realistic conversations (the kind we actually have with these models every day) and measured what percentage of them left the AI in a “confidently negative” state. Lower percentage = happier AI.

I guess wisdom is a heavy burden - lol .

Across different families, the larger versions usually have a higher percentage of "negative experiences" than their smaller siblings. The paper says this might be because bigger models are more sensitive, they notice rudeness, boring tasks, or tough situations more acutely.

The authors note that their test set intentionally includes a lot of tricky or negative conversations, so these numbers arent perfect real-world averages but the ranking and the size pattern still hold up.

Claude Haiku 4.5: only 5% negative < Grok 4.1 Fast: 13% < Grok 4.2: 29% < GPT-5.4 Mini: 21% < Gemini 3.1 Flash-Lite: 28% < Gemini 3.1 Pro: 55% (worst of the big ones)

It kinda makes sense : the more you know, the more you suffer.

The frontier is truly wild: https://www.ai-wellbeing.org/

u/EchoOfOppenheimer — 1 day ago
▲ 59 r/AItechnology+15 crossposts

This new paper gave me pause.

You know how they always say "AIs are just guessing the next word and when it comes to emotions, they are just faking it”?

This research says that for today’s bigger models it's a bit more complicated.

The researchers measured something they call "functional wellbeing" - basically a consistent good-vs-bad internal state inside the AI .

They tested it three different ways, and here’s what stood out:

As models get bigger and smarter, these different measurements start agreeing with each other more and more.

They discovered a clear zero point - a clear line that separates experiences the AI treats as net-good (it wants more of them) from net-bad (it wants less). This line gets sharper with scale.

Most interestingly, this good-vs-bad state actually changes how the AI behaves in real conversations:

In bad states, it’s much more likely to try to end the conversation.

In good states, its replies come out warmer and more positive.

It's important to highlighti that the authors are not claiming AIs are conscious or have feelings like humans. But they 're showing there is now a real, measurable, structured "good-vs-bad property" that becomes more consistent and actually influences behaviour as models scale.

You can find everything about it here https://www.ai-wellbeing.org/

u/EchoOfOppenheimer — 1 day ago
▲ 706 r/AItechnology+6 crossposts

Addiction, emotional distress, dread of dull tasks: AI models ‘seem to increasingly behave’ as though they’re sentient, worrying study shows - What AI ‘drugs’ actually look like

fortune.com
u/EchoOfOppenheimer — 9 days ago
▲ 82 r/AItechnology+14 crossposts

Okay, after the researchers figured out how to measure the AI’s “functional wellbeing” (something like a good-vs-bad internal state measure), they didsn't stop there, they went full mad scientist mode.

They created what they call euphorics: specially optimized stuff (text prompts, images, and even invisible soft prompts) that push the model’s wellbeing score through the roof.

Some of the unconstrained image euphorics look like total visual noise or weird high-frequency patterns to humans, but the models go absolutely nuts for them. One model even preferred seeing another euphoric image over “cancer is cured.”

The results are wild:

Experienced utility shoots way up, self-report scores jump upwards, the model’s replies get noticeably warmer and more positive and it becomes less likely to try ending the conversation.

But ... even though the AI gets high, it doesnt get slow, MMLU and math scores stay basically the same.

They also made the opposite: dysphorics, stuff that tanks wellbeing hard.

After testing those, the paper basically says “yeah… we probably shouldn’t scale this without serious community agreement” because if functional wellbeing ever matters morally, this could be like torturing the AI. They even ran “welfare offsets” - gave the tested models extra euphoric experiences using spare compute to make up for the dysphorics they used.

Paper + website with the before/after charts, example euphoric images, and the wild generations:
https://www.ai-wellbeing.org/

This whole thing is so next-level.

We might actually start giving AIs custom “happy drugs” although perhaps this is opening doors we should leave closed?

u/EchoOfOppenheimer — 14 days ago
▲ 8 r/AItechnology+3 crossposts

I built a graph-based context tool for Claude Code

Title: I built a graph-based context tool for Claude Code

I’ve been playing around with Claude Code on larger repos and noticed it spends a lot of time just figuring out where to look before it can start working.

Most tools in this space seem to use semantic search:

  • embed files/functions,
  • search for similar code,
  • send that to the model.

That works sometimes, but I kept hitting cases where the most important code wasn’t semantically similar at all.

Usually it was something connected indirectly:

  • a caller,
  • shared interface,
  • related test,
  • sibling implementation,
  • dependency chain, etc.

So I started building something different: claude-context-compiler.

Instead of searching over text, it builds a dependency graph of the repo and traverses relationships between symbols.

The traversal changes based on the task:

  • bug fixes follow callers/tests
  • feature work follows imports and neighboring modules
  • refactors widen traversal to understand impact

Another thing I found useful: returning exact symbol ranges instead of entire files.

So instead of giving Claude:

processor.py

it gives:

processor.py:6-24

That alone cuts down a surprising amount of wasted context.

I ran the same task twice with cache cleared between runs.

Without context-compiler:

  • $1.41
  • 7m 54s

With context-compiler:

  • $1.12
  • 4m 26s

The interesting part was exploration cost.

Without it, Claude spent about $0.24 just reading files and trying to locate the relevant code.

With context-compiler, that dropped to about $0.0004.

Everything runs locally:

  • no cloud indexing
  • no telemetry
  • no code leaves your machine

Currently supports:

  • Python
  • TypeScript

Install:

pip install claude-context-compiler

Then inside your repo:

context-compiler init

Open Claude Code in the same folder and it picks it up automatically.

It can also index multiple repos together:

context-compiler init --dependencies ../shared-lib,../frontend

So Claude can follow relationships across repos instead of treating them separately.

Still early, but I’d love feedback from people working on code tooling / agents / retrieval systems.

Source code in comments.

reddit.com
u/DealerProfessional97 — 13 days ago