u/Ezequiel_CasasP — reddlx

Hey! I made a minimal CustomTkinter app for captioning many images through LM Studio's local OpenAI-compatible server to create training datasets for text-to-image models such as Flux, Qwen, Z-Image, Ernie Image, etc..

Each image is sent as a separate request, so context does not accumulate across the batch.

Easy install!

The idea came from testing Gemma 4 and its vision capabilities for images inside LM Studio chat. I really liked the results!

You can use any model you have installed in LM Studio, as long as it supports Vision.

Link to repo with full instructions and a example system prompt:

https://github.com/Mixomo/LM_Studio_Server_Batch_Image_Captioner

https://preview.redd.it/97fwxcksjuzg1.png?width=1477&format=png&auto=webp&s=3c569746e2a89f54e9b9f1543d2d27a6364fc18c