u/HobbesNik

▲ 15 r/antiai

I just watched this unbelievable video on another subreddit where Anthropic claims that their "natural language autoencoders" can translate Claude's next-token prediction into text.

They make a number of claims based on reading the "thoughts" of Claude that I'm having trouble wrapping my head around. Claims like, "We've found that Claude has internalized being a helpful AI model," and, "Claude knew that it was being given a test."

From what I know of next-token prediction, there is no way that you could translate it into coherent "thoughts" of English-language text. LLMs convert words into tokens and then analyze patterns between the tokens in its training data to predict what token/word is most likely to come next. It doesn't think, "I'm being given a test," rather it says, "I'm being a given test," because it's predicted those are the most likely words to come next based on the map of word associations it has from its training data. If you were to translate an LLM's analysis, the daisy chains of tokens that constitute an LLMs "thoughts," back into words, wouldn't it just be a total jumble?

So is this total anthropomorphic BS from Anthropic or am I missing something?

u/HobbesNik — 16 days ago

>A court in eastern China's Hangzhou city, an AI hub, has ruled in favor of a senior tech worker whose company replaced him with artificial intelligence (AI).

>"The termination grounds cited by the company did not fall under negative circumstances such as business downsizing or operational difficulties, nor did they meet the legal condition that made it 'impossible to continue the employment contract,'" the court said in a published article.

u/HobbesNik — 19 days ago