u/plumb-moe

Zero-overhead MoE expert imbalance profiler for vLLM w benchmarks + why we differ from vLLM's built-in EPLB
▲ 3 r/LocalLLM+1 crossposts

Zero-overhead MoE expert imbalance profiler for vLLM w benchmarks + why we differ from vLLM's built-in EPLB

If you're running a MoE model with --enable-expert-parallel, your experts are probably imbalanced. We measured 7.93× imbalance on Layer 0 of OLMoE with one GPU doing nearly 8× the work. Plumb measures it and fixes it.

What it does:

Hooks into a running vLLM or HuggingFace process via PyTorch hooks, no fork or restart required. Captures per-layer per-expert activation counts and computes imbalance ratios, then produces an expert→GPU placement recommendation.

These are prefill benchmarks (max_tokens=1, ~11 input tokens). Full results across multiple concurrency levels and two models in the repo, including a DeepSeek-V2-Lite run where blind rebalancing made things significantly worse.

On vLLM's native EPLB:

vLLM has its own EPLB and it works. A few differences:

vLLM requires --num-redundant-experts — extra VRAM per EP rank (~2.4GB for DeepSeek-V3). If you're memory constrained it can't run. Plumb has no such requirement.

vLLM's EPLB load-balances but ignores topology — it doesn't know which GPU is closest to which expert, so cross-NUMA dispatch cost stays high. Plumb adds a NUMA fine-tuning pass that pins each layer's hottest experts to same-socket GPUs after running EPLB, which vLLM doesn't do.

vLLM's EPLB also runs unconditionally. We benchmarked blind rebalancing on DeepSeek-V2-Lite — it peaks at 1.5× imbalance because it trained with balance losses, so there's nothing to rebalance. The communication overhead alone pushes p95 +226% at c=16. Plumb checks the imbalance ratio first and won't apply anything without warning you.

GitHub: https://github.com/plumb-moe/plumb

Benchmark scripts and raw data are in the repo.

We're going to be running more benchmarks and trying more strategies over the next few weeks, hope you look forward to those results :)

u/plumb-moe — 3 days ago