▲ 0 r/deeplearning
If transformers struggle with math, is the real issue model size or the fact that we’re feeding them a notation they were never built to learn?
Human math notation is full of things transformers dislike: implicit structure, overloaded symbols, non‑canonical forms, and surface‑level transformations that hide the underlying graph.
I’m exploring whether small models reason better when math is represented in a canonical, explicit, graph‑native format. something closer to a transformer’s inductive biases than traditional notation.
Curious whether anyone has experimented with structured math tokenization, graph‑encoded expressions, or transformer‑friendly symbolic IRs in local models
u/Alarmed-Poet-5722 — 22 hours ago