
What it actually takes to build an AR overlay on a physical object in real time.
Everyone loves a clean AR demo. You put on a headset, a beanbag lands on a cornhole board, and a beautifully rendered score badge floats effortlessly right above it. It looks like magic.
But behind the scenes, AR on physical objects is roughly 80% coordinate system problems. I just broke down the technical architecture of what we're building for Quantum Caddy (a real-time AR scoring system) and how we are shifting from a fixed-camera ecosystem to head-tracked, spatial AR glasses. If you are building anything in the computer vision or spatial computing space, these are the architectural hurdles no one warns you about in the demo videos:
1. The Core Issue: 2D Pixels vs. 3D Space
A camera sees a flat 2D image, but a physical object exists in 3D. If your coordinate math is off by even two centimeters, your AR asset floats over the wrong spot. In a precision scoring or training system, that's a broken product, not a cosmetic bug.
- Phase 0 (Fixed): Right now, we use a static 2D homography via a fixed camera. We map four board corners at session start, compute a transformation matrix, and translate bounding boxes to zone coordinates. It works perfectly for screens, but it breaks the moment you move.
- Phase 2 (Spatial AR): Moving to the Everysight Maverick AI glasses completely changes the architecture. The camera moves with the wearer's head while the physical object stays put. You can no longer rely on a static matrix; you need a live, continuous world-model updating from head pose in real time.
2. The Architectural Blueprint
To tackle a dynamic environment with severe latency constraints (we need <400ms from bag-land to AR display), we mapped out a decoupled system design:
- WorldState: Holds the canonical 3D position of the physical asset.
- TrajectoryRuntime: Runs a Kalman filter on a front-facing camera to smooth out parabolic trajectory arcs.
- GlassesAdapter: Translates system game events into hardware-specific HUD commands.
- Continuous Gemma Loop: A background LLM loop that proactively generates "coaching chips" because AR glasses lack a keyboard, and voice commands fail in loud venues.
3. Edge Cases That Will Break Your Model
If you take away one thing from our calibration refinement sprints, let it be this: Your math will look beautiful in the center of the frame and completely lie to you at the edges. Lens distortion and oblique camera angles mean that a homography or spatial anchor that boasts millimeter accuracy in the center can be an entire zone off near the corners. You have to aggressively account for non-planar surfaces and lens distortion drop-offs before you ever ship a line of production code.
For those building in spatial audio, CV tracking, or smart glasses development—how are you handling dynamic spatial anchoring without overloading your hardware's compute budget?
(Full engineering breakdown with our file notes over atTruPath Labs)