

Running SAM3 on NVIDIA Jetson Nano
We’ve released Embedl SAM3 for TensorRT, a fully reproducible, end-to-end deployment of facebook/sam3 on NVIDIA GPUs (Jetson AGX Orin, Nano), with INT8 post-training quantization built with Embedl Deploy that bridges the gap between hardware constraints on the edge and PyTorch.
One script (https://docs.embedl.com/embedl-deploy/latest/auto_tutorials/sam3.html) that only requires a Python package with the only dependency being PyTorch. The script takes you from a HuggingFace checkpoint to running TensorRT engine export, fusions, quantization, compilation.
Use a smaller image size for Nano.
The performance:
NVIDIA Jetson AGX Orin Image size Latency
224×224 → 40.4ms / 24.7 FPS (real-time)
448×448 → 118.5ms INT8, 10% faster than FP16
672×672 → 187.6ms INT8, 27% faster than FP16
NVIDIA Jetson Orin Nano
224×224 → 89.6ms / 11.2 FPS
448×448 → 262.6ms INT8, 20% faster than FP16
The speed-up isn’t the headline. Getting the model running reliably is. SAM3’s ViT backbone, window attention, RoPE embeddings, and FPN neck create real deployment issues: memory, quantization sensitivity, poor accuracy, export and compilation breaking down. Embedl Deploy handles all of it: hardware-aware, accuracy-preserving, out of the box. And PyTorch is the only dependency: no graph surgery, no ONNX simplification scripts, no extra calibration tooling to wrangle. PTQ and QAT in one unified workflow with only PyTorch and TensorRT.
This is not just for Jetson or NVIDIA GPUs. We are building Embedl Deploy for any edge hardware. Whatever device you’re deploying to, we solve the same problem: take your model from PyTorch to production without months of debugging.
Any comments are welcome. The same workflow applies to any Torchvision model, and more complicated models such as DinoV3 which we will release soon.
Other edge-friendly models can be found in https://huggingface.co/embedl