We built an open-source KEDA external scaler for GPU workloads - no Prometheus needed
Been running GPU inference workloads on k8s and got tired of the dcgm-exporter → Prometheus → PromQL → KEDA chain just to autoscale based on GPU utilization. 5 components, 15-30s metric lag, PromQL queries to maintain.
So I built keda-gpu-scaler — a KEDA external scaler that talks to NVML directly on each GPU node via a DaemonSet. Reads GPU utilization, memory, temperature, power and serves them over gRPC to KEDA. Sub-second metrics, no Prometheus in the loop.
Wrote about the architecture and why it has to be an external scaler (not a native one) on the CNCF blog: https://www.cncf.io/blog/2026/05/27/gpu-autoscaling-on-kubernetes-with-keda-building-an-external-scaler/
It ships with pre-built profiles for vLLM, Triton, training jobs, and batch workloads. Scale-to-zero works too.
GitHub: https://github.com/pmady/keda-gpu-scaler
Docs: https://keda-gpu-scaler.readthedocs.io
Still early (v0.1.0) so if you're running GPU workloads on k8s I'd appreciate feedback, bug reports, or contributions. Roadmap and open issues are on the repo.Been running GPU inference workloads on k8s and got tired of the dcgm-exporter → Prometheus → PromQL → KEDA chain just to autoscale based on GPU utilization. 5 components, 15-30s metric lag, PromQL queries to maintain.
So I built keda-gpu-scaler — a KEDA external scaler that talks to NVML directly on each GPU node via a DaemonSet. Reads GPU utilization, memory, temperature, power and serves them over gRPC to KEDA. Sub-second metrics, no Prometheus in the loop.
Wrote about the architecture and why it has to be an external scaler (not a native one) on the CNCF blog: https://www.cncf.io/blog/2026/05/27/gpu-autoscaling-on-kubernetes-with-keda-building-an-external-scaler/
It ships with pre-built profiles for vLLM, Triton, training jobs, and batch workloads. Scale-to-zero works too.
GitHub: https://github.com/pmady/keda-gpu-scaler
Docs: https://keda-gpu-scaler.readthedocs.io
Still early (v0.1.0) so if you're running GPU workloads on k8s I'd appreciate feedback, bug reports, or contributions. Roadmap and open issues are on the repo.
u/Aware-Ticket-5585 — 1 day ago