u/JoeStrout

Aliens would probably send us an LLM

Just thinking lately that if there were aliens around a nearby star, and for some reason they were too lazy to come say hello in person, bit wanted to make contact… they would probably send the weights to an LLM (with some brief instructions on the data format).

They would have trained this LLM on all the TV shows and text data we’ve leaked to the cosmos, so it would speak English, Mandarin, and probably 30 other Earth languages. But they would also train it on their culture, so it would also speak their languages, and know all about them, just as ChatGPT knows all about us.

So when we’ve decoded all that data and followed the instructions and turned it on, we wouldn’t have a static message from ET; we’d have an AI that can carry on conversations and teach us things. It would take years to explore everything it knows.

Of course to us this would seem like a huge amount of data and high computational cost, but to them it probably seems like primitive tech, one step above banging rocks together.

What would you think of this? Would you be suspicious, grateful, or something else? What would you ask it?

reddit.com
u/JoeStrout — 21 hours ago

I found this paper, "Thinking Without Words: Efficient Latent Reasoning with Abstract Chain-of-Thought" by Ramji, Naseem, & Astudillo to be pretty interesting:

https://arxiv.org/abs/2604.22709

Basically they trained an LLM to do its reasoning with a set of reserved tokens that are initially meaningless, resulting in substantial token savings on CoT problems with no significant degradation of performance.

On one level, I love that, because it saves computation and gives models a way to think that's probably similar to what well-trained humans do in their field of specialty, i.e., reason in abstractions directly, without having to put everything into words. But on the other hand, it seems like this would make the interpretability problem much harder. LLMs can already hide their true intentions to some extent, but this would make deception much easier for them, I would think. The internal language could be like "...and then step three, kill all humans, wait, better say 'give puppies to all humans' in our final output" and we'd have a hard time detecting that.

One possible way to mitigate that might be to train another model to convert these internal tokens to something interpretable, for auditing purposes. But it's not entirely clear to me how that would be done. We'd certainly have to be careful about co-training the interpreter and the main model on alignment, because we'd risk them learning a dual-channel encoding where the model means one thing but the interpreter says another, in a coordinated way that fulfills the reward function, while not giving accurate insight into any deception going on.

What do you think?

reddit.com
u/JoeStrout — 23 days ago