r/StableDiffusion

▲ 391 r/StableDiffusion

Tifa in Krea 2 :)

LoRA used for this post:
Tifa Lockhart [Krea2] - Improved Likeness
Workflow:
Krea2 Uncensored - Image-to-Prompt + Prompt Enhancer + 4K Upscaler + CivitAI Metadata

u/Brief-Leg-8831 — 7 hours ago

▲ 16 r/StableDiffusion

Help: Can't get her head to show up

Can you help me or give me some advice on how to fix this?

I don't know what I'm doing wrong. I've tried so many prompts, but for some reason, it's always cropping the image, with her head missing.
I could use some help with this one, please.

Here's the prompt: A painting in a contemporary impressionist style featuring a woman wearing a pink summer dress with lace trim, her body positioned amidst a landscape of wildflowers and butterflies. The artwork incorporates thick oil paint textures and watercolor-style washes across the scene. The color palette is composed of teal, blue, pink, purple, and golden yellow. A shallow depth of field creates a soft blur on the distant background elements. Layered transparency effects create highlights on the edges of the wildflowers. Digital texture overlays are visible throughout the composition to simulate an aged, luminous aesthetic. The main subject is the woman, full body, uncropped, on the right side of the image,

u/KlitoriaPierce — 5 hours ago

▲ 89 r/StableDiffusion

Wan SCAIL-2 Segmentation Control (Update)

https://civitai.red/models/2699283/wan-scail-2-segmentation-control

Features:

Image Analyzer
LoRA Support
Interpolate | Upscale | Color Match
Color Correction
Sage Attention
Choose between 2 Samplers
Background Remover (RMBG) to keep the background of the input video
SCAIL-2 Identity Tracker
Load an alternative audio file for the final video output
Installation Paths & Download Links (in the Workflow)

Note:

If you have any questions, please first read the information in the red boxes within the workflow.

Additional options are available within the subgraph. Click the icon in the top-right corner of the Main Settings node to open it. Help is available by hovering your mouse cursor over the values inside the subgraph.

The workflow offers two samplers. Both deliver similar results. For testing purposes, the workflow allows you to easily switch between them. Personally, I use SCAIL-2 Infinity, This one seems to have fewer color shifts.

Wan SCAIL-2 is not perfect, but it delivers good results in most cases. If you encounter issues, setting a new seed or switching the sampler usually helps.

u/External_Trainer_213 — 6 hours ago

▲ 34 r/StableDiffusion+2 crossposts

Lovely

u/Prestigious_Dot3797 — 6 hours ago

▲ 43 r/StableDiffusion

Krea2 vs Z-Image Turbo?

I’ve been living under a rock. The last time I touched ComfyUI was 6 months ago, and Z-Image Turbo was the best model around for my hardware.

I checked on the AI world this week and it looks like Krea 2 is the new cool kid on the block. I downloaded it and ran some quick tests against Z-Image Turbo, but I found Z-Image’s results to be better, realistic, and sharper.

I have zero experience with krea2 so this is why I'm asking you guys.... I feel like I'm blind to this model's capabitlies. All I see in the sub is Krea2 images which means people no longer care about / rarely use Z image turbo?

reddit.com

u/nobody----cares — 9 hours ago

▲ 62 r/StableDiffusion

Krea V2 Understands Camera Settings

u/audax8177 — 9 hours ago

▲ 15 r/StableDiffusion

What art style is this? Is it possible to generate such images with this level of detail, depth, and visual quality locally?

The scale of the scenes is extremely detailed. I tried replicating with zimage, krea, flux but didn't success even after trying 100s of prompt variations and different loras. Can we generate similar images locally.

u/Large_Election_2640 — 7 hours ago

▲ 0 r/StableDiffusion

Flux Lora suggestions

I’m very new to local private ai workstation so I’m still learning where and how to search and understand what I’m looking for. First, I’m starting with the flux model; I’m m looking for a Lora that will help with realistic images. Scary stuff and my wife and I want to to image to video (realistic naughty situations)
Can you find folk recommend checkpoints or Lora’s that would help with this?
Tia

reddit.com

u/SnooMachines1543 — 3 hours ago

▲ 3 r/StableDiffusion

Long form WAN 2.2 videos

I'm trying to create longer video generation from an image but i can't seem to figure out (or find) a good workflow for it. I'm limited to a RTX 3060 12GB but i've heard its still possible? Could anyone offer some advice please or even better a workflow i can use? Thank you!

reddit.com

u/AnyHighway420 — 7 hours ago

▲ 1 r/StableDiffusion

unable to keep face consistency with ltx2.3 first last frame workflow

how to maintain image consistency and preserve identity of the characters?

u/wallofroy — 3 hours ago

▲ 11 r/StableDiffusion

Krea2 and eye contact. The thousand yard stare

Been using Krea2 Turbo and I must say, I'm impressed with the results. Mainly it's adherence to prompt language.

One thing I'm struggling with are expressions and eye contact. Sometimes it works and sometimes it doesn't. I can't seem to nail why!
I've tried several loras and conditional rebalances. I know eye contact and expressions can be difficult but I'm really struggling with Krea2.

Anyone have any advice / tips please?

reddit.com

u/tonyg3d — 6 hours ago

▲ 122 r/StableDiffusion+3 crossposts

I was the guy from a few months ago who released a SOTA music sample generator - Soon Ill be releasing a text-to-synth with the same rich capabilities - all free & open source.

For contest my last post is here

https://www.reddit.com/r/StableDiffusion/s/fqdfn2RUQv

I put out an update on my socials about an upcoming release so I thought you guys may get a kick out of it given the response from the first release.

The model will be a fully playable text-to-keybed exportable to any DAW with rich prompting / metadata. Ill also put together a longer video on how I did it for other researchers to replicate (training strategies and the like)

u/RoyalCities — 11 hours ago

▲ 256 r/StableDiffusion+2 crossposts

Kyutai's Pocket TTS clones a voice from 5 seconds of audio, on CPU, under MIT. Benchmarked against Kokoro, Supertonic, and Inflect-Nano for Eng. TTS

Kyutai dropped Pocket TTS a bit ago and I've been sitting on it for a benchmark. Finally ran it head to head against the three CPU TTS models that have been getting attention (Kokoro 82M, Supertonic 3, Inflect-Nano-v1). 180 timed runs, 36 audio samples, objective MOS scores via UTMOS.

Short version: Pocket TTS is the slowest of the six configs I tested, and it's still the most interesting model in the field. Here's why.

What Pocket TTS actually is:

It's a ~100M param streaming language model that generates audio tokens over Kyutai's Mimi neural codec, then decodes to 24kHz. So instead of the usual acoustic-model-plus-vocoder setup, it's more like an autoregressive LLM but for audio. Token by token.

Two consequences of that architecture:

Latency is dead flat across text lengths. Its RTF is 0.69 to 0.76 whether you feed it 12 chars or 1712 chars. No fixed overhead to amortize. Compare with Kokoro PyTorch which climbs from 0.49 on tiny text to 0.83 on long text.
It streams. Which matters if you're building anything interactive.

Zero-shot voice cloning from 5 seconds. On CPU.

This is the headline feature. Hand it a 5-second reference clip of any voice and it speaks in that voice. Accent, timbre, pacing, even the mic character of the reference. No fine-tuning. No GPU. MIT license.

None of the other CPU-friendly models can do this at all. Kokoro and Inflect-Nano ship fixed voice sets, Supertonic same. If you want a user-supplied voice on a CPU box, Pocket TTS is currently in a category of one.

I ran the benchmark with Pocket TTS pinned to a preset voice (alba) for a fair speed/quality comparison. The cloning capability isn't in the numbers below because you can't benchmark it against models that don't have it.

Full results:

Config	Mean RTF	UTMOS MOS	Params	License
Supertonic 3 (2-step)	0.121	1.53	~99M	OpenRAIL-M
Inflect-Nano-v1	0.145	3.48*	4.6M	Apache 2.0
Supertonic 3 (5-step)	0.240	4.32	~99M	OpenRAIL-M
Kokoro 82M (ONNX)	0.641	4.44	82M	Apache 2.0
Kokoro 82M (PyTorch)	0.665	4.46	82M	Apache 2.0
Pocket TTS	0.714	4.10	~100M	MIT

Hardware: Intel Xeon 8272CL, 4 cores, 16GB RAM, no GPU. UTMOS is utmos22_strong, an objective MOS predictor, so it's not just my ears this time.

The Inflect-Nano asterisk: UTMOS gave it 3.48 but to the ear it's buzzy and robotic. Known UTMOS failure mode where it over-rates small HiFi-GAN vocoders for being clean rather than natural. Also it has a hard ~15 second output cap I discovered mid-benchmark, so its RTF on long inputs is inflated.

Practical picks:

Need voice cloning on CPU → Pocket TTS, no other option in this field
Fixed voice, highest quality → Kokoro 82M
Latency-critical with acceptable quality → Supertonic 3 at 5 steps
Tiny footprint for short utterances → Inflect-Nano-v1, if you can live with the buzz and the 15s cap
Prototyping only → Supertonic 3 at 2 steps

Two things worth calling out:

Pocket TTS install is genuinely painless. pip install pocket-tts, no CUDA build, no HuggingFace-repo-plus-sys.path wiring. Downloads weights on first load. The least fussy of the six.

The MIT license is a big deal. Kokoro is Apache 2.0 (also great). Supertonic is OpenRAIL-M with commercial restrictions. Pocket TTS being MIT means you can do essentially whatever with it commercially.

Repo with raw CSV (180 rows), all 36 WAV samples, and the benchmark script is in comments below 👇

If anyone here has run Pocket TTS voice cloning with a real reference clip, would love to hear how it holds up on different voice types (accented English, non-English, singing, etc). That's the next thing I want to test but I need a clean dataset.

u/gvij — 12 hours ago

▲ 48 r/StableDiffusion

GrainScape and AnalogCore For Krea2

u/FortranUA — 10 hours ago

▲ 49 r/StableDiffusion

holy krea2 - II

https://pastebin.com/cNsTjJCL

u/9_Absurd — 11 hours ago

▲ 0 r/StableDiffusion

should I get an RTX 3080 TI ?

I’m currently saving up to buy an RTX 3080 Ti. On paper, the specs look great, but we all know real-world performance can be a different story for some GPUs. Right now, I’m running an RTX 2070 (8GB VRAM). It handles SDXL generations (base resolution, 20 steps) in about 20 seconds on average. If there are any RTX 3080 Ti owners here, I’d really appreciate some insight into its actual horsepower. Specifically, how fast does it handle SDXL generations, and how does it hold up overall today? (Quick side note for anyone wondering why I’m buying an "old gen" GPU: I live in a third-world country, and this RTX 3080 Ti is costing me about 3 to 4 months of hard work to afford. Upgrading to a 40-series just isn't realistic for me right now).

reddit.com

u/FuckUImBack — 7 hours ago

▲ 193 r/StableDiffusion+1 crossposts

I created a node for Krea2 that adds Multi-LORA support with no identity bleeding and per region bounding box control like Ideogram 4 - Workflow, Examples and Github link included

# Krea 2 Regional Multi-LoRA — Multi-Character + Bounding-Box Layout Control

Put multiple character LoRAs in a single Krea 2 image, each one locked to its own bounding box — no bleed, no merged faces, no averaging. And it's not just for LoRAs: draw and describe boxes for objects, props, backgrounds, and extra subjects too, exactly like Ideogram 4's bounding-box prompting.

## What it does

Normal LoRA loading applies everywhere, so two character LoRAs smear into each other. This node injects each LoRA's effect only into the image tokens inside its box, at forward time — outside the box the effect is multiplied by zero. It's a hard spatial guarantee, not an attention-bias nudge the model can ignore.

Pair it with an Ideogram 4-style prompt builder and every box does double duty:

- Every box places its described content via Krea 2's Qwen3-VL text encoder (a table, a neon sign, a dog on the left — Krea 2 honors the placement).

- LoRA boxes additionally lock in a specific trained identity on top of that placement.

Sketch the whole scene as boxes, describe each one, and drop LoRAs into the boxes that need a precise face. Objects and characters, all placed by the same boxes.

## Features

- Unlimited regions — 2 characters or 10, add a row per box

- Region rows auto-sync to the boxes you draw (draw a box, a row appears)

- Hard per-region LoRA masking (activation-delta injection)

- Bounding-box layout control for non-LoRA elements too

- fp8-safe — never touches quantized model weights

- Runs at Krea 2's native CFG 1

## Requirements

- ComfyUI with Krea 2 support (recent build)

- Models: krea2_turbo_bf16 (UNet), qwen3vl_4b_bf16 (CLIP, type krea2), qwen_image_vae (VAE)

- Custom node: ComfyUI-Krea2-Regional-MultiLoRA (this workflow's node)

- ComfyUI-KJNodes (for the box-drawing prompt builder)

- Character LoRAs trained against Krea 2 (e.g. via ai-toolkit)

## How to use

Write your scene prompt in the box builder (setting/lighting/camera — not the characters).
Draw one box per element, in order. Rows appear automatically in the LoRA node.
Assign a LoRA to each character box; leave object/scenery boxes as plain descriptions.
Sampler: euler / bong_tangent / 8–12 steps / CFG 1.
Queue.

## Tips

- Keep character boxes from overlapping to avoid bleed.

- If seams show, raise seam_feather a touch (0.12–0.15).

- Row order must match box order (row 1 = first box drawn).

- Character LoRAs must be Krea 2-trained — FLUX/SDXL LoRAs load but look wrong.

- Recommended LORA Strength is 1.2 - 1.6

## Node + workflow (GitHub)

Full source, install instructions, and the example workflow:

https://github.com/CliffNodes/Krea2-Multi-Character-Lora-Node-w-bounding-box-By-Fedor

Examples Images using 2 generic loras on Civit

Workflow Link - https://pastebin.com/67m8kBF2

u/tekprodfx16 — 17 hours ago

▲ 4 r/StableDiffusion

just made an open-source app for people who want ComfyUI power without the ComfyUI headache

A lot of us have been there.

You see a crazy new AI workflow.

New image model.

Video workflow.

Upscaler.

Inpainting setup.

Some wild thing someone posted on Reddit.

Then you realize it needs ComfyUI.

And suddenly the excitement drops a bit.

Not because ComfyUI is bad.

ComfyUI is amazing.

But if you are not already comfortable with nodes, model folders, custom nodes, Python dependencies, and red error boxes, it can feel like walking into a cockpit.

Powerful.

But also… where do I even start?

I kept thinking about this.

There are so many cool AI workflows being shared all the time, but a lot of people never try them because the setup wall is too high.

So I built Noofy.

The idea is simple:

Keep ComfyUI as the engine.

But put a cleaner app interface on top.

Instead of opening a giant node graph and guessing what to touch, you get a dashboard.

Prompt.

Image upload.

Strength.

Seed.

Model choice.

Style.

Preview.

Run button.

Only the controls the workflow creator wants you to use.

The rest can stay behind the curtain.

Noofy also handle the annoying setup parts.

Missing models.

Custom nodes.

Python dependencies.

Model folders.

The goal is to make trying new AI models and workflows feel much closer to:

open
run
test
tweak
share

Instead of spending the first hour fixing folders.

Once a workflow is prepared, you can reuse it like a small local app.

Open the dashboard.

Change the useful settings.

Run it again.

No need to untangle the whole graph every time.

I also added a model management page, so you can see what is installed in Noofy and in your connected ComfyUI models folder, and clean things up when your disk starts crying.

There are 32 starter workflows included too, mainly so people can test recent models without spending the first hour setting everything up.

This is not meant to replace ComfyUI.

I love ComfyUI.

Noofy is for people who want access to the power of ComfyUI, but do not want to learn the whole node system before they can try one cool workflow.

Of course, I made it open source ;)

https://github.com/menahem121/Noofy

I would love feedback, especially from people who always wanted to use ComfyUI but bounced off because it felt too complicated.

Yes this is still ComfyUI running in the background XD

reddit.com

u/Otherwise_Kale_2879 — 7 hours ago

▲ 0 r/StableDiffusion

Krea2 with ComfyUI: getting error

I have used Krea2 with Stable Diffusion cpp with no problem, but when I try to use it in ComfyUI I get the following error:

>Krea2 expects conditioning with 12x2560=30720 features (a 12-layer Qwen3-VL stack) but got 2560. Load the text encoder with CLIPLoader type 'krea2'.

I use exactly the same text encoder as with sd-cpp which is: Huihui-Qwen3-VL-4B-Instruct-abliterated.i1-Q5_K_M.gguf

What am I supposed to do?

reddit.com

u/Mordimer86 — 9 hours ago

▲ 10 r/StableDiffusion

[Update] CreaPrompt now has a built-in LLM Prompt Enhancer (Qwen3-VL, local, with image & multi-image fusion)

Hi everyone,

I just pushed a major update to ComfyUI_CreaPrompt, my prompt builder node that assembles prompts from category files (manually or randomly). It now includes a local LLM prompt enhancer designed for modern models like Flux, Krea 2, Z-Image, Qwen-Image and Wan — because these models want natural language prose, not the classic SDXL "tag soup" that random category picking produces.

What it does

Toggle Enhancer: enabled on the CreaPrompt Dynamic node and your assembled keywords get rewritten by a local Qwen3-VL model (default: hfmaster/Qwen3-VL-4B, auto-downloaded to models/LLM/ on first run) into a fluent, detailed prompt matching your target model.

Preset profiles included:

Flux (natural prose)
Krea 2 (natural clauses, no quality tokens, no negations — follows the official prompting guidelines)
Z-Image / Qwen-Image (detailed description)
SDXL (enriched tags)
Video / Wan
Or write your own instruction

Vision features

The node now has 3 image inputs + 1 video input:

Single image → the LLM describes it as a ready-to-use generation prompt in your target model's style
Multiple images (up to 3) → fusion mode: takes the main subject of each image and blends the style, lighting and mood of each into ONE coherent scene
Image(s) + categories → combines your visual reference(s) with your category keywords into a single prompt
Video input → describes the clip (frames subsampled automatically), handy for Wan prompting

Practical stuff

Everything runs locally, no API key, no cloud
fp16 by default, int4/int8 available if you have bitsandbytes (~3-5GB VRAM instead of ~9GB for the 4B)
Unload_after_generation frees the VRAM right after enhancement (on by default) — fits fine alongside your diffusion model on a 16GB card
Batch-aware: Prompt_count: 5 = 5 individually enhanced prompts, each with its own seed
If the LLM fails for any reason, the raw prompt passes through — your workflow never crashes
Enhancer disabled = zero overhead, nothing is imported or loaded

Requirements for the enhancer: transformers, accelerate (optional: bitsandbytes for int4/int8).

GitHub: https://github.com/tritant/ComfyUI_CreaPrompt

Feedback welcome, especially on the preset instructions — happy to add profiles for other models.

https://preview.redd.it/sic1detsjnbh1.png?width=2557&format=png&auto=webp&s=77dd3c6e6d5b91da3ca9f22dbeabc6418e3bae17

reddit.com

u/Away_Exam_4586 — 9 hours ago

r/StableDiffusion

Tifa in Krea 2 :)

Help: Can't get her head to show up

Wan SCAIL-2 Segmentation Control (Update)

Lovely

Krea2 vs Z-Image Turbo?

Krea V2 Understands Camera Settings

What art style is this? Is it possible to generate such images with this level of detail, depth, and visual quality locally?

Flux Lora suggestions

Long form WAN 2.2 videos

unable to keep face consistency with ltx2.3 first last frame workflow

Krea2 and eye contact. The thousand yard stare

I was the guy from a few months ago who released a SOTA music sample generator - Soon Ill be releasing a text-to-synth with the same rich capabilities - all free &amp; open source.

Kyutai's Pocket TTS clones a voice from 5 seconds of audio, on CPU, under MIT. Benchmarked against Kokoro, Supertonic, and Inflect-Nano for Eng. TTS

GrainScape and AnalogCore For Krea2

holy krea2 - II

should I get an RTX 3080 TI ?

I created a node for Krea2 that adds Multi-LORA support with no identity bleeding and per region bounding box control like Ideogram 4 - Workflow, Examples and Github link included

just made an open-source app for people who want ComfyUI power without the ComfyUI headache

Krea2 with ComfyUI: getting error

[Update] CreaPrompt now has a built-in LLM Prompt Enhancer (Qwen3-VL, local, with image &amp; multi-image fusion)

What it does

Vision features

Practical stuff

I was the guy from a few months ago who released a SOTA music sample generator - Soon Ill be releasing a text-to-synth with the same rich capabilities - all free & open source.

[Update] CreaPrompt now has a built-in LLM Prompt Enhancer (Qwen3-VL, local, with image & multi-image fusion)