u/Hamza-bkd09

I’m coming more from an NLP background and recently started digging into computer vision, so I might be missing some context here.

I’m trying to understand how realistic multi-camera person tracking systems are in practice — the kind where a person is consistently identified and followed across different cameras (like surveillance systems or what we see in movies).

From my current understanding, such a system would typically involve:

Person detection (YOLO / RT-DETR etc.)
Multi-object tracking within each camera (ByteTrack / DeepSORT / BoT-SORT)
Cross-camera re-identification using embeddings (OSNet / TorchReID / ViT-based models)

My questions are:

How mature is this field today in real-world deployments?
Is consistent identity tracking across multiple non-overlapping cameras actually reliable, or still very brittle?
What are the main failure points in practice (lighting, clothing similarity, occlusion, etc.)?
Are there any solid open-source end-to-end systems worth studying?
At what point does this stop being a “CV engineering problem” and become an open research problem again?

I’m not expecting movie-level perfect tracking — just trying to understand how close we are to a robust real-world system and what the real limitations are today.

Is multi-camera person tracking + re-identification actually feasible today? How close are we to “movie-style” systems?