hey guys, im looking to dive deep into inference optimization and rn i only know about high level stuff like weight-activation quantiz, using flash/sage attn and torch compile.
how do i get better to optimize models like a pro? can anyone suggest roadmap or any resources you guys might have? i guess i need to learn cuda/triton stuff for more optimizations but im really confused how and whee to start for image and video models.
u/IllustriousZone111 — 15 days ago