u/Any_Frame9721

We’ve released Embedl SAM3 for TensorRT, a fully reproducible, end-to-end deployment of facebook/sam3 on NVIDIA GPUs (Jetson AGX Orin, Nano), with INT8 post-training quantization built with Embedl Deploy that bridges the gap between hardware constraints on the edge and PyTorch.

One script (https://docs.embedl.com/embedl-deploy/latest/auto_tutorials/sam3.html) that only requires a Python package with the only dependency being PyTorch. The script takes you from a HuggingFace checkpoint to running TensorRT engine export, fusions, quantization, compilation.

Use a smaller image size for Nano.

The performance:
NVIDIA Jetson AGX Orin Image size Latency
224×224 → 40.4ms / 24.7 FPS (real-time)

448×448 → 118.5ms INT8, 10% faster than FP16

672×672 → 187.6ms INT8, 27% faster than FP16

NVIDIA Jetson Orin Nano
224×224 → 89.6ms / 11.2 FPS

448×448 → 262.6ms INT8, 20% faster than FP16

The speed-up isn’t the headline. Getting the model running reliably is. SAM3’s ViT backbone, window attention, RoPE embeddings, and FPN neck create real deployment issues: memory, quantization sensitivity, poor accuracy, export and compilation breaking down. Embedl Deploy handles all of it: hardware-aware, accuracy-preserving, out of the box. And PyTorch is the only dependency: no graph surgery, no ONNX simplification scripts, no extra calibration tooling to wrangle. PTQ and QAT in one unified workflow with only PyTorch and TensorRT.

This is not just for Jetson or NVIDIA GPUs. We are building Embedl Deploy for any edge hardware. Whatever device you’re deploying to, we solve the same problem: take your model from PyTorch to production without months of debugging.

Any comments are welcome. The same workflow applies to any Torchvision model, and more complicated models such as DinoV3 which we will release soon.

Other edge-friendly models can be found in https://huggingface.co/embedl

Real-time edge AI vision just got better.

One script (https://docs.embedl.com/embedl-deploy/latest/auto\_tutorials/sam3.html) that only requires a Python package with the only dependency being PyTorch. The script takes you from a Hugging Face checkpoint to running TensorRT engine export, fusions, quantization, compilation.

Use a smaller image size to get started faster.