r/opencv

▲ 15 r/opencv+2 crossposts

Synthetic DMS Training Data Generation with Video Models

I like spending my free time testing new AI tools and seeing where they might fit into real computer vision workflows. This time I experimented with synthetic training data generation for Driver Monitoring Systems using Seedance 2.0.

The inspiration came from Vision Banana: https://vision-banana.github.io/

The idea that really caught my attention is simple but powerful: many vision tasks can be represented as RGB outputs. A segmentation mask, an instance mask, a depth map, or another dense prediction target can all be treated as an image-like output.

So I tried to apply this thinking to video.

The workflow:

  1. Generate a realistic synthetic driver monitoring video
  2. Use the same video to generate a semantic segmentation mask
  3. Use the same video to generate an instance segmentation mask
  4. Combine the outputs into a dataset-like structure

The mosaic video shows the result:

RGB video + semantic mask + instance mask, aligned frame by frame.

The scene is a fictional driver gradually becoming drowsy behind the wheel. This kind of scenario is useful for DMS development, but difficult to collect and annotate at scale with real-world data.

Of course, generated annotations still need QA. They are not perfect ground truth.

But for prototyping, rare-case simulation, and early dataset generation, this feels like a very promising direction.

The interesting part is that the final output is not just a nice synthetic video. It can become structured training data:

  • RGB frames from the generated video
  • semantic classes from the semantic mask
  • object regions and bounding boxes from the instance mask
  • YOLO / COCO-style annotations after post-processing

I wrote a more detailed blog post about the experiment here:

https://www.antal.ai/blog/synthetic_dms_training_data.html

u/Gloomy_Recognition_4 — 2 days ago
▲ 2 r/opencv

[Question] OPENCV interview prep

It's for an intern where I'll work with a fitness org for a CV intern. I need only serious help please.

I've used yolo and opencv before, I've never had an interview tho, what questions in depth about it can I expect. I have a call tomorrow, any quick responses are genuinely appreciated! Extra points if you're open to let me ask questions in DM

They want me to be good with GPU programming (CUDA), GPU perf optimizations. Besides what else should I be ready to deal with? It's a small scale startup.

reddit.com
u/EnchantedHawk — 4 days ago
▲ 18 r/opencv+2 crossposts

[Question] Fine-tuning Gemma 4 Vision in Unsloth Studio for Medical Image Classification

Hi everyone,

I'm planning to fine-tune Gemma 4 (specifically for medical image classification/species identification) using Unsloth Studio.

My current dataset is a simple table: one column with the image and one column with the species name (label). However, I’ve noticed that Unsloth Studio’s UI doesn't seem to have a dedicated field to define the "input text prompt" (e.g., "What species is in this image?") when loading a custom dataset.

My Questions:

  1. How should I reformat my image + label dataset so Unsloth Studio recognizes it correctly for multimodal training?
  2. Do I need to convert my data into a ChatML-style messages format before uploading?
  3. Does the "instruction" need to be a hardcoded column in my CSV/Parquet file for every single row?

Setup:

  • Model: Gemma 4 (E2B or E4B)
  • Task: Medical Image Classification (Microscopic images)
  • Environment: Unsloth Studio (Local/RunPod)

Any advice on the specific dataset schema required for the Studio would be greatly appreciated!

reddit.com
u/Electrical-Ebb4002 — 9 days ago