u/Immediate_Ad263

▲ 89 r/rust

Every time I prepare a Dockerfile for a Rust project, I want the binary to be as fast as possible. The problem: with distributed deployments, you never know what hardware it'll run on. So you compile for generic and leave performance on the table.

One command wraps your binary for multiple CPU targets. One file ships. At startup it picks the fastest version the host can run — no extra CI pipeline, no runtime dispatch code in your app.

Benchmark on Raptor Lake with zero hand-written SIMD: 154ms vs 2771ms for generic.

Linux x86_64 + AArch64. Early but working — would love reports of the actual CPU selection on different hardware. I did my best to make selection safe and correct, but the hardware variety is huge and some processors may not be detected properly.

https://crates.io/crates/cargo-sonic

reddit.com
u/Immediate_Ad263 — 25 days ago