u/Radianis

▲ 4 r/pytorch+1 crossposts

Has anyone quantified the actual compute waste from training divergence at scale? Trying to understand how common rollback and restart really is in practice.

reddit.com
u/Radianis — 3 days ago