r/allenai

🌍 OlmoEarth v1.1: 3x cheaper to run than v1 with the same SOTA performance, fully open
▲ 55 r/allenai+1 crossposts

🌍 OlmoEarth v1.1: 3x cheaper to run than v1 with the same SOTA performance, fully open

Today we’re releasing OlmoEarth v1.1. It’s 3x cheaper to run than v1 while delivering the same state-of-the-art performance—and fully open.

Compute is the largest cost when running OlmoEarth at hundreds of thousands of square kilometers. Partners use v1 today for mangrove tracking, forest-loss classification, and country-scale crop-type mapping. v1.1 makes that work cheaper to sustain.

Where the savings come from: we feed the model about 3x fewer tokens per Sentinel-2 input. Since compute scales quadratically with token count, even modest reductions compound into real efficiency gains. Done naively, this hurts accuracy noticeably; recovering it took changes to how we pretrain the model. Read more in our tech report: https://allenai.org/papers/olmoearth_v1_1

One useful property for researchers: we held the pretraining dataset constant from v1. The differences cleanly isolate the methodological change, not the data or the architecture family.

v1.1 is available now in the same sizes as v1: Nano, Tiny, and Base. All are open weights, with open training code available. If you're running v1 and v1.1 works for your task, expect significant speedups during fine-tuning and inference.

🤗 Models: https://huggingface.co/collections/allenai/olmoearth

📝 Blog: https://allenai.org/blog/olmoearth-v1-1

u/ai2_official — 2 days ago
▲ 18 r/allenai

🧪 Introducing MyScholarQA: AI-powered personalized scientific deep research

Now available in AstaLabs in limited research preview: MyScholarQA, a personalized version of ScholarQA for scientific deep research. 👇

ScholarQA helps synthesize evidence from 12M+ open-access papers. MyScholarQA adds user profiles to tailor that synthesis to you.

AstaLabs is where we share experimental research tools from Asta, our platform for AI-assisted scientific discovery. MyScholarQA builds on ScholarQA, which powers parts of Asta, to explore how deep research systems can better understand the researcher asking the question.

Researchers bring different expertise, methods, audiences, & goals to the same literature as they compile reports. MyScholarQA uses a profile built from papers you choose so reports reflect that context, from what you know to how you prefer research framed.

We tested MyScholarQA against deep research systems including OpenScholar, Perplexity Sonar Deep Research, and OpenAI deep research powered by o3. Its reports answered research questions more completely and cited sources more accurately & consistently.

How it works in AstaLabs:

1️⃣ Add papers by pasting Semantic Scholar paper URLs or an author profile URL. MyScholarQA infers your research interests, and you can review & customize each inference.

​2️⃣ Then ask a research question. MyScholarQA proposes actions for the report—papers to look for, connections to your work, or framing to use. Adjust the plan, then generate a report grounded in ScholarQA's synthesis over millions of open-access papers.

Try MyScholarQA in AstaLabs and read the paper behind the system:

🔬 AstaLabs: https://personalized-scholarqa.apps.allenai.org/ 

📄 Paper: https://arxiv.org/abs/2603.16120 

📊 Analysis of user feedback collected in MyScholarQA: https://arxiv.org/abs/2604.23815

u/ai2_official — 9 days ago
▲ 18 r/allenai

📊 How Artificial Analysis is using Ai2's IFBench to probe frontier model instruction following

Artificial Analysis relies on our IFBench eval to test how closely models follow user prompts. 👇

Most evals in AA’s Intelligence Index saturate within months. IFBench hasn't because it measures what others miss—and what frontier models still struggle with. 

Accepted to NeurIPS 2025, IFBench tests how well language models follow precise output constraints. It asks models to do things like answer only with “yes” or “no,” mention a specific word at least three times, or hit an exact sentence, word, or character count.

Together, those constraints expose a common failure mode: a model can understand the topic and still miss part of a request. "IFBench measures instruction following in a way that feels closer to real-world use than earlier instruction following evals," says AA’s Declan Jackson.

Inside AA's Intelligence Index, IFBench surfaces where instruction-following is improving, where progress is uneven, and how models that score well overall can still struggle with precise prompts. That kind of granularity is hard to see in aggregate scores alone.

IFBench is fully open so anyone can inspect it and run it across models. Open benchmarks make adoption like this possible, and they're how the field builds shared evaluation standards. 

📝 Read more: https://allenai.org/blog/ifbench-artificial-analysis

📊 IFBench: https://github.com/allenai/IFBench

u/ai2_official — 10 days ago
▲ 38 r/allenai+1 crossposts

Today we’re releasing EMO, a new mixture-of-experts (MoE) model trained so modular structure emerges directly from data without human-defined priors.

Most LLMs are trained and deployed as one monolithic system, even when an application only needs a narrow capability like code or math. MoEs seem to break this pattern by using only a few experts per token. But across a full task, standard MoEs still rely on many experts.

EMO’s key idea: use each training document as a weak signal for shared context. Instead of letting every token route independently, EMO restricts tokens from the same document to a shared expert pool, encouraging experts to organize around coherent domains.

EMO’s expert clusters look very different from a traditional MoE—they organize around semantic domains like health, news, politics, & film/music. Traditional MoEs often cluster around surface patterns like prepositions and articles, making selective expert use tougher.

EMO is a 1B-active, 14B-total MoE trained on 1T tokens with 8 of 128 experts active per token. Without any subsequent fine-tuning, EMO remains robust when only a subset of experts is kept: with 25% of experts, it loses ~1 percentage point in overall performance; with 12.5%, it drops ~3 points. Standard MoEs degrade sharply.

We experiment on a smaller 130B token setting, where we show EMO subsets also match or outperform memory-matched models trained from scratch. Instead of training many separate small models for fixed memory budgets, one EMO model can provide many domain-specific expert subsets.

We're releasing EMO, a matched standard-MoE baseline, and training code to help the community study modularity & expert selection:

🧠 Models: https://huggingface.co/collections/allenai/emo
📝 Blog: https://allenai.org/blog/emo
📄 Tech report: https://allenai.org/papers/emo

📊 Visualization: https://emovisualization.netlify.app/

u/ai2_official — 13 days ago
▲ 15 r/allenai+1 crossposts

Today we’re bringing new NSF OMAI compute online with NVIDIA Blackwell Ultra-powered systems, turning a $152M national investment from NSF & NVIDIA into a foundation for truly open AI research.

https://preview.redd.it/y1cexymrfqzg1.jpg?width=2048&format=pjpg&auto=webp&s=1da18fbb4b000c9ba7744da210ebe54d3ab5075b

https://preview.redd.it/39twiymrfqzg1.jpg?width=2048&format=pjpg&auto=webp&s=2e8742133dae244f8144f477fbf5b943b73f17f1

https://preview.redd.it/qd0b8zmrfqzg1.jpg?width=2048&format=pjpg&auto=webp&s=39623fd2608a27dc355b49cbabeffa2fcc00cf63

Built on NVIDIA B300 systems and deployed with Cirrascale Cloud Services, the new cluster supports scaled training and experimentation across language, multimodal, and scientific AI, helping extend research directions behind models like Molmo 2 & Olmo Hybrid.

Our research estimates that in today’s model training efforts, 82% of compute goes into exploratory work. At closed labs, the output of that work stays within those labs. In an open system, models, datasets, & methods are shared, and the value compounds across the field.

With the new NSF OMAI compute now online, Ai2 is building toward open, reusable AI systems that researchers can deeply inspect, study, and customize.

→ Read more in our blog: https://allenai.org/blog/omai-compute-now-live

reddit.com
u/ai2_official — 14 days ago