r/allenai

Today we're releasing OlmoEarth v1.2, the latest in our family of open foundation models for Earth observation. 🌍

OlmoEarth processes satellite images into tiles (patches), representing each as an embedding the model uses for downstream tasks—a numerical representation. Earlier versions tagged patches with a fixed position signal that surfaced as unwanted artifacts in those embeddings.

V1.2 switches to rotary positional embeddings (RoPE), which reduces artifacts in the embeddings & gives a small performance boost. Instead of adding a position signal to each patch, it rotates the vectors the model compares in attention by angles defined by each patch's position. The result is cleaner embeddings and better performance on downstream tasks: across all model sizes, we see consistent improvement on our kNN/linear-probe evals.

This update came directly from partners asking for cleaner embeddings. OlmoEarth v1.2 comes in Nano, Tiny, Small, & Base—all open source + available now.

🤗 Models: https://huggingface.co/collections/allenai/olmoearth
💻 Training & fine-tuning code: https://github.com/allenai/olmoearth_pretrain
📄 Tech report: https://allenai.org/papers/olmoearth-v1-2

Hybrid (transformer–RNN) models are fast becoming a serious alternative to the transformer, but a big question remains: how do they process tokens differently & how does this impact performance? We compared our transformer (Olmo 3) & hybrid (Olmo Hybrid) models to find out.

A transformer’s attention layers can look back at any earlier token exactly. A hybrid model swaps most of those layers for recurrent ones that excel at sequential processing. Do these differences give hybrid models and transformers different strengths?

To find out, we scored how well Olmo 3 & Olmo Hybrid predicted different kinds of next tokens across articles, books, papers, code, HTML, & LaTeX. The models are matched on data, tokenizer, & training recipe, so a gap in their predictions points to architecture differences.

We found that the hybrid model advantage is highest on meaning-bearing words—the nouns, verbs, & adjectives that say what a sentence is about. On function words like "the," "of," & "is," its advantage over transformers remains, but is more muted.

In contrast, the transformer matches the hybrid model when the next token completes an n-gram repeated verbatim from earlier in the passage. Transformers also match or beat hybrid models at predicting closing brackets like the ], but not opening brackets like [.

We find that filtering loss by token type is a promising way to spot architecture differences. Overall loss makes transformers & RNNs appear even, with hybrids ahead. Filtered losses reveal where transformers & RNNs each shine.

We hope this kind of token-level comparison helps the field build better, more architecturally diverse models. Going forward, we plan to explore applications of these findings to pretraining evals.

✍️ Blog: https://allenai.org/blog/hybrid-token-prediction
📄 Tech report: https://arxiv.org/abs/2606.20936

🌍 OlmoEarth v1.2 switches to RoPE for cleaner satellite-image embeddings

🔍 New research: Hybrid models & transformers predict different kinds of tokens better

r/allenai

🌍 OlmoEarth v1.2 switches to RoPE for cleaner satellite-image embeddings

🔍 New research: Hybrid models &amp; transformers predict different kinds of tokens better

🔍 New research: Hybrid models & transformers predict different kinds of tokens better