u/Dan23RR

Paper on Zenodo. Explains why RoPE enables transformers to succeed on compositional reasoning tasks where standard additive positional layers fail. Proves RoPE's toroidal structure (T^n) on finite groups, validated with Qwen2.5-0.5B on modular arithmetic and sequential composition tasks.

zenodo.org
u/Dan23RR — 19 days ago

I’ve been thinking about a question that keeps coming up when working with LLMs:

Why do models that scale so well on language tasks still break on relatively simple compositional reasoning problems?

In this work, I explore a hypothesis: the bottleneck might not be (just) scale or training it might be geometry.

The paper looks at how different architectural components handle composition, and suggests a structural limitation in standard transformer updates, contrasted with mechanisms like RoPE that behave more like a toroidal representation. This leads to a separation between architectures that can support stable composition and those that drift or collapse with depth.

I also test these ideas on controlled tasks (iterated modular arithmetic, group composition) and in a small LLM setting, where the gap shows up quite sharply.

Preprint here: https://doi.org/10.5281/zenodo.19899195

I’d be very interested in critical feedback especially from people working on reasoning, mechanistic interpretability, or geometric approaches to deep learning.

Do you think limitations like this are architectural, or will they disappear with enough scale?

reddit.com
u/Dan23RR — 21 days ago

A simple (and slightly uncomfortable) question: What if some models don't fail at reasoning because they ''don't understand'' but because they can't represent composition properly?

I’ve just published a preprint exploring this idea, linking RoPE, group structure, and toroidal substrates. The main takeaway: structure may matter as much as scale.

Would love critical feedback: promising direction, or interesting but too theoretical?

reddit.com
u/Dan23RR — 21 days ago

A simple (and slightly uncomfortable) question: What if some models don’t fail at reasoning because they ''don’t understand''… but because they can’t represent composition properly?

I’ve just published a preprint exploring this idea, linking RoPE, group structure, and toroidal substrates. The main takeaway: structure may matter as much as scale.

Read it here:https://doi.org/10.5281/zenodo.19899195

Would love critical feedback: promising direction, or interesting but too theoretical?

reddit.com
u/Dan23RR — 22 days ago

I've just published a preprint on Zenodo trying to explain a simple but stubborn phenomenon: why some models handle compositional reasoning, while others break as depth increases.

The core claim is this: in some cases, the limitation isn't about training or scale it's structural and geometric.

If you're interested in reasoning, compositional generalization, and RoPE, you can read it here:https://doi.org/10.5281/zenodo.19899195

Curious to hear your take: will the next leap in transformer reasoning come from better architectures or just more scale?

reddit.com
u/Dan23RR — 22 days ago