u/stealth_nsk

So, for example, I can create some characters in 3D on white background, upload them to, say, Gemini and ask it to place those characters in a specific environment, and make them realistic, while preserve their clothes, poses, etc. With this request Gemini generates exactly what I asked for and the characters are put into the environment with correct lightning, shadows, etc.

When I use image to image flow in ComfyUI, I'm unable to get the same results.

I understand why it happens, LLMs use multimodal models where texts and images are processed together, while ComfyUI processes each media type separately. But is it possible to recreate similar experience in ComfyUI?

Could ComfyUI process queries like LLMs?