▲ 4 r/pytorch+1 crosspostsHas anyone quantified the actual compute waste from training divergence at scale? Trying to understand how common rollback and restart really is in practice.reddit.com u/Radianis — 3 days ago