u/Substantial_Border88

Recently auto-annotation has been commoditised, which means, due to the advancements in Foundation models like SAM3, Dino family and also VLMs like Gemini 3.0 Flash, T Rex + Models from IDEA Research ; it has become much easier to generate bounding boxes and use them to train domain specific models. Review and QA of AI generated annotation surely becomes a bottleneck as no model is 100% accurate in whatever it sees.

I have annotated hundreds of images manually a couple of years ago and it feels much easier than before to use AI to annotate, but the ChatGPT moment still seems really far.

The importance of the following question will be felt by everyone in this sub and everyone who trains specialised models professionally or for hobby.

Like LLMs have a huge scope of fine tuning and pre training specialised models for specific use cases, do vision models still have similar scope where people will keep training Object Detection models for their use cases? Or there will be a time where some AI lab will launch an efficient enough model which will detect anything without any pretraining or finetuning.?

Consider this an open discussions, suggest techniques or simply act on your insecurities of gradually becoming obsolete( hehe)

Did SAM3 changed the Image Annotation game completely?