u/8RETRO8

Image 1 — Flux2.Klein Tile Upscaler Node (basically USDU with extra features)
Image 2 — Flux2.Klein Tile Upscaler Node (basically USDU with extra features)
Image 3 — Flux2.Klein Tile Upscaler Node (basically USDU with extra features)
Image 4 — Flux2.Klein Tile Upscaler Node (basically USDU with extra features)
Image 5 — Flux2.Klein Tile Upscaler Node (basically USDU with extra features)
Image 6 — Flux2.Klein Tile Upscaler Node (basically USDU with extra features)
Image 7 — Flux2.Klein Tile Upscaler Node (basically USDU with extra features)
Image 8 — Flux2.Klein Tile Upscaler Node (basically USDU with extra features)
Image 9 — Flux2.Klein Tile Upscaler Node (basically USDU with extra features)
Image 10 — Flux2.Klein Tile Upscaler Node (basically USDU with extra features)
Image 11 — Flux2.Klein Tile Upscaler Node (basically USDU with extra features)
▲ 10 r/comfyui

Flux2.Klein Tile Upscaler Node (basically USDU with extra features)

About 2 weeks ago, I saw a post about tile upscaling using Flux2.Klein. In the comment section, I pointed out that this was a "glorified" Ultimate SD Upscale (USDU) workflow and proposed my own alternative. Later that day, I realized my workflow had a serious mistake: it did not use the reference latent node and instead relied on a SplitSigmas node to control denoising. Therefore, it didn't utilize the Klein model's abilities to its fullest. However, the workflow from the original author wasn't producing super clean results either. While it actually utilized the reference latent, it always produced vastly different tiles on my images, making the whole image look like a grid (I wasn't using upscale or consistency LoRAs).

So, I decided to vibecode a node that would work for USDU-style upscaling, since I have always been a fan of upscalers that can both upscale images and fix details. To this day, the best tool I have tried for "creative" upscaling was SeedVR2 + SDXL tile controlnet.

And I think I achieved a very good result, considering that I don't know how to code and this node is 100% vibecoded.

Features:

  • Auto Slicing: Dynamically divides your canvas into identical, equal-sized tiles close to your target size.
  • Adaptive Tiling: Dynamically reduces denoiser steps in low-detail zones (like skies or walls) to save render time. Flat areas scale down to 50% steps (2 steps), while detailed zones keep 100% steps (4 steps).
  • Built-in Color Match: Performs linear histogram matching of each tile against the original upscaled canvas.
  • Adaptive Tiling Strategy: Analyzes the scene and processes the highly textured tiles first. Flat zones are processed last, allowing them to anchor cleanly to the finalized, sharp boundaries of the foreground details.
  • Not Only for Upscaling: You can do any type of work that Klein supports and that is applicable to a tile workflow. For example, you can change styles on large images without losing details due to downscaling.
  • VRAM Friendly (mostly): Since tiles are processed one by one, you can choose a tile size that your graphics card can handle. The only bottleneck might be the VAE encode/decode process, as the standard Flux2 VAE increased color differences between tiles during my testing.
  • LoRA Support (optional): All your LoRAs should work as expected, which is something you can't do with SeedVR2, for example.

The examples are a 2x upscale, but it can do more. The main reason for this is that a 4x upscale takes over 10 minutes for 1792x1392 px images (the resolution I got from Flux2Klein text-to-image) on 3090, and I don't want to wait a full day.

https://github.com/Gavr728/ComfyUI_KleinTiledUpscaler

u/8RETRO8 — 2 hours ago

If you post with zit sampler/shedulers test you might know that all of them produced roughly the same result. But for Ernie-Turbo it turned out to not be the case. Some of the combinations have a HUGE impact on image composition.

Generation Info:

8 steps

cfg 1

No prompt enchanter

Full model

Ideally I should have tried a different combination of steps, but that would be too much work to analyze by hand.

Link to all images:

https://drive.google.com/drive/folders/1E7Kklh-5Gh41GT6h0HpzFIxqVfKONws9?usp=sharing

All images that draw my attention are marked as "not bad" in the name. My taste is subjective so you might want to go through them. All combinations that are marked are in the table below

Sampler beta karras kl_optimal linear_quadratic normal sgm_uniform sgm_unirform simple uniform (Other) Total
ddim 1 1
dpm_2 2 1 3
dpm_2_ancestral 2 3 1 6
dpmpp_2m_sde 1 1 1 1 4
dpmpp_2m_sde_gpu 2 2 1 2 7
dpmpp_2m_sde_heun 1 1 1 3
dpmpp_2m_sde_heun_gpu 1 2 1 4
dpmpp_2s_ancestral 2 2 3 2 9
dpmpp_sde 1 1 1 3
dpmpp_sde_gpu 2 1 1 1 1 6
er_sde 1 1 2
euler 1 1
euler_ancestral 1 1
euler_ancestral_cfg_pp 2 2
euler_cfg_pp 1 1 2
exp_heun_2_x0 1 1 1 3
exp_heun_2_x0_sde 2 1 2 1 1 7
gradient_estimation 1 1
heun 1 1
heunpp2 1 1
lcm 1 2 3
res_multistep 1 1
sa_solver 2 2
sa_solver_pece 1 1 2
seeds_2 2 1 1 1 5
seeds_3 3 1 1 1 2 8
uni_pc 1 1 1 3
uni_pc_bh2 1 1 2
Total 27 1 2 19 10 20 1 1 12 1 93

So, as you can see objectively beta is the best scheduler you can use. Sgm_uniform is also fine. However, subjectively my favorite scheduler is linear_quadratic, it has a big impact on compositions and details, but at some images it can feel too "clean" for the given subject.

For samplers I think the best option is seeds_3, it looks very good on some images. As a downside it can have to much texture where it's not required, as human faces for example. If that's the case you can go with seeds_2. Also seeds_3 one of the slowest.

One of the samplers that I didn't even know existed but produced good results is exp_heun_2_x0_sde. Give it a try.

As for more traditional samplers dpmpp_2s_ancestral, dpmpp_2m_sde_gpu,dpm_2_ancestral are all fine.

List of samplers that produce garbage (at 8 steps): dpm_fast,dpmpp_2s_ancestral_cfg_pp,dpmpp_2m_ancestral_cfg_pp,dpmpp_2m_cfg_pp,dpmpp_3m_sde,dpmpp_3m_sde_gpu,,res_multistep_cfg_pp,res_multistep_ancestral,res_multistep_ancestral_cfg_pp,gradient_estimation_cfg_pp,lms

List of schedulers that produce garbage: ddim_uniform

Since I'm most interested in "stock images" type", my favorite combination is seeds_3/linear_quadratic. But it's probably not the best option for every scenario. I would like to hear what you think, maybe I missed something between the results.

All that analysis should also apply to the base models at 50 steps (side note: comfy workflow suggests only 20 steps, don't believe it all looks like shit. Use 50 steps). The problem is that at 50 steps it is slow, like, it often can produce images that are better than turbo, especially interiors with seeds_3/linear_quadratic have really good composition,texture,details. But it also takes 12 min for one picture. There is probably a better setting (steps/cfg) but I don't have plans to dig that deep.

u/8RETRO8 — 24 days ago

If you post with zit sampler/shedulers test you might know that all of them produced roughly the same result. But for Ernie-Turbo it turned out to not be the case. Some of the combinations have a HUGE impact on image composition.

Generation Info:

8 steps

cfg 1

No prompt enchanter

Full model

Ideally I should have tried a different combination of steps, but that would be too much work to analyze by hand.

Link to all images:

https://drive.google.com/drive/folders/1E7Kklh-5Gh41GT6h0HpzFIxqVfKONws9?usp=sharing

All images that draw my attention are marked as "not bad" in the name. My taste is subjective so you might want to go through them. All combinations that are marked are in the table below

Sampler beta karras kl_optimal linear_quadratic normal sgm_uniform sgm_unirform simple uniform (Other) Total
ddim 1 1
dpm_2 2 1 3
dpm_2_ancestral 2 3 1 6
dpmpp_2m_sde 1 1 1 1 4
dpmpp_2m_sde_gpu 2 2 1 2 7
dpmpp_2m_sde_heun 1 1 1 3
dpmpp_2m_sde_heun_gpu 1 2 1 4
dpmpp_2s_ancestral 2 2 3 2 9
dpmpp_sde 1 1 1 3
dpmpp_sde_gpu 2 1 1 1 1 6
er_sde 1 1 2
euler 1 1
euler_ancestral 1 1
euler_ancestral_cfg_pp 2 2
euler_cfg_pp 1 1 2
exp_heun_2_x0 1 1 1 3
exp_heun_2_x0_sde 2 1 2 1 1 7
gradient_estimation 1 1
heun 1 1
heunpp2 1 1
lcm 1 2 3
res_multistep 1 1
sa_solver 2 2
sa_solver_pece 1 1 2
seeds_2 2 1 1 1 5
seeds_3 3 1 1 1 2 8
uni_pc 1 1 1 3
uni_pc_bh2 1 1 2
Total 27 1 2 19 10 20 1 1 12 1 93

So, as you can see objectively beta is the best scheduler you can use. Sgm_uniform is also fine. However, subjectively my favorite scheduler is linear_quadratic, it has a big impact on compositions and details, but at some images it can feel too "clean" for the given subject.

For samplers I think the best option is seeds_3, it looks very good on some images. As a downside it can have to much texture where it's not required, as human faces for example. If that's the case you can go with seeds_2. Also seeds_3 one of the slowest.

One of the samplers that I didn't even know existed but produced good results is exp_heun_2_x0_sde. Give it a try.

As for more traditional samplers dpmpp_2s_ancestral, dpmpp_2m_sde_gpu,dpm_2_ancestral are all fine.

List of samplers that produce garbage (at 8 steps): dpm_fast,dpmpp_2s_ancestral_cfg_pp,dpmpp_2m_ancestral_cfg_pp,dpmpp_2m_cfg_pp,dpmpp_3m_sde,dpmpp_3m_sde_gpu,,res_multistep_cfg_pp,res_multistep_ancestral,res_multistep_ancestral_cfg_pp,gradient_estimation_cfg_pp,lms

List of schedulers that produce garbage: ddim_uniform

Since I'm most interested in "stock images" type", my favorite combination is seeds_3/linear_quadratic. But it's probably not the best option for every scenario. I would like to hear what you think, maybe I missed something between the results.

All that analysis should also apply to the base models at 50 steps (side note: comfy workflow suggests only 20 steps, don't believe it all looks like shit. Use 50 steps). The problem is that at 50 steps it is slow, like, it often can produce images that are better than turbo, especially interiors with seeds_3/linear_quadratic have really good composition,texture,details. But it also takes 12 min for one picture. There is probably a better setting (steps/cfg) but I don't have plans to dig that deep.

u/8RETRO8 — 25 days ago