
Krea 2 - Multi-Character Lora and LOKR (That last one will suprise you) - My personal holy-grail is at finger-reach...
hehe... click bait title..
Disclaimer: I don't want to pass this over ChatGPT for correction, so bare with my rushed grammar and spelling.
Objective: Multi-character lora for 2 characters, ttprz and rgpz
Based on/Inspired by: https://www.youtube.com/watch?v=v6h_zbFW_XY <== This here explains a Flux 1 multichar LORA strategy, I basically took it with me and tried in Krea 2 as below.
1st test
Approach
- Hardware: RTX3090, Windows 10 (yes... I know), 64GB RAM
- AI-Toolkit (config file below). Model: Krea Raw
- Dataset, One unified datase, 15 photos of husband, 15 photos of wife, 5 photos together, Resolution 512/768
- Tokens: One for the Lora in general (cpnl), and then each character their own tokens (ttprz, rgpz)
- Descriptions: They all start with the Lora token (cpnl, ) then describe the character (Description instructions for AI-toolkit embedded Qwen3 VL). Example of descriptions below
- Training: Scheduler: Automagic2, LR: 0.0001 (Lower than my regular Automagic2 LR 0.001), 5K steps (due to lower LR), Lower VRAM Yes, Layer offloading Yes (15% and 15%)
- LORA: Linear 96 (I wanted to try a large LORA, ends ~660MB, I know, single chars I use network of 64 or 32, may reduce it in next test, large LORA comes with it's other set of downsides), Saved last 40 states (meaning all of them basically)
- Samples: 3, one ttprz, one rgpz , and one together
- Everything else pretty much unchanged
Training run:
- Training goes over 4hrs or so, but sampling, and 3 samplers each, and at every 250 steps, adds like 2.5 hrs in itself, sampling is painfully slow always
- Loss goes down slowly
- Samples are kind of messy, you start seeing good identity cloning around 2K, the samplers are way worse than the actual LORA once finished.
LORA performance in Comfy (latest version, overnight):
- Pretty solid, first time I'm able to actually call out two characters from a home-made LORA.
- Best LORA based on number of steps: Between 3.5 and 4.5 K steps.
- Other Loras: You can stack LORAS but you have to play with the strenght and also your sample and workflow
- Artifacts? A few, but I'd say 80% of images come out Ok
- Other comments: Not sure if it's Krea as I also experience this in single-char LORA, but passing from the Photo-based LORA to illustration absolutely requires other Style-LORAs, else there is no resemblance
- My workflow: Modified KREA 2 ComfyFlow, FlowMatch Euler Discrete Sigma (Dynamic Shifting, .5/1.15), SamplerEulerAncestralCFG++ 1/1, found it way better than default Comfy Flow
- Prompt strategy: Using Qwen 8B VL via llama.cpp on a separate RTX 3060 12GB to expand the prompt with node "LLM Chat"
- Sample below, Datasets had no photos of characters in formal attire, sillyness added to showcase GenAI role. Tokens : ncpl (general Lora trigger), ttprz and rgpz.
​
Prompt: ncpl, Award-winning high-resolution photograph featuring a ttprz latina wearing a luxurious night gown seated elegantly next to an rpgz middle-aged man with a beard and glasses dressed in a formal tuxedo, sharing an intimate fine dining experience. The scene centers on a whimsical contrast: a large, vibrant bowl of colorful cereal is placed prominently on an elegant mirrored table, surrounded by sophisticated dining ware, soft golden ambient lighting, and blurred background details of an upscale restaurant interior to emphasize the call of luxury. The composition captures a moment of playful luxury with crisp details on the texture of the cereal and fabrics, using a shallow depth of field to keep the subjects and the colorful bowl in sharp focus while creating a dreamy, high-end atmosphere.
multi-character Image generation with LORA size 96, 4K steps, Prompt included in post.
2nd test: LOKR. Same as first training approach, same dataset, same captioning. Changes below.
Learning rate lowered to .0005
LOKR, left Size as for LORA network, 96 but AI-Tookit doesn't care as it calculates maximum size
Training run: added 2 hours, like 2 seconds per iteration
LOKR Safetensor size: 6 MB... no joke, carries.. 95% appearance of origin. This is the surprise that came out of it. I thought this was both the network and embeddings, need to understand more. I need to further test as I think that the internal consistency of the model is a bit impacted, but from say a Network of say 64~250MB/LORA (I know I'm testing first with 96) but down to 6MB.. some powerful stuff right there
Dataset caption examples: Photo with both: ncpl, rgpz with glasses and a beard holding an umbrella, wearing a white shirt with a blue collar and a white scarf, smiling slightly. ttprz wearing a red shirt with Mickey Mouse designs and a white headscarf with polka dots, smiling broadly. rgpz is on the left, ttprz is on the right. Photo of individual char: ncpl, rgpz with a graying beard and mustache, smiling slightly, wearing a dark gray t-shirt, positioned in front of reflective spherical sculptures.
AI-Toolkit config file for reference, LORA experiment
job: "extension" config: name: "cpl_v1" process: - type: "diffusion_trainer" training_folder: "xxxxxxxxxxxxxxx" sqlite_db_path: "./aitk_db.db" device: "cuda" trigger_word: null performance_log_every: 10 network: type: "lora" linear: 96 linear_alpha: 96 lokr_full_rank: true lokr_factor: -1 network_kwargs: ignore_if_contains: [] save: dtype: "bf16" save_every: 250 max_step_saves_to_keep: 40 save_format: "diffusers" push_to_hub: false datasets: - folder_path: "xxxxxxxxxxxxxxxxxx" mask_path: null mask_min_value: 0.1 default_caption: "" caption_ext: "txt" caption_dropout_rate: 0.05 cache_latents_to_disk: false is_reg: false network_weight: 1 resolution: - 512 - 768 controls: [] shrink_video_to_frames: true num_frames: 1 flip_x: false flip_y: false num_repeats: 1 train: batch_size: 1 bypass_guidance_embedding: false steps: 5000 gradient_accumulation: 1 train_unet: true train_text_encoder: false gradient_checkpointing: true noise_scheduler: "flowmatch" optimizer: "automagic2" timestep_type: "linear" content_or_style: "balanced" optimizer_params: weight_decay: 0.00005 unload_text_encoder: false cache_text_embeddings: true lr: 0.0001 ema_config: use_ema: false ema_decay: 0.99 skip_first_sample: false force_first_sample: false disable_sampling: false dtype: "bf16" diff_output_preservation: false diff_output_preservation_multiplier: 1 diff_output_preservation_class: "person" switch_boundary_every: 1 loss_type: "mse" logging: log_every: 1 use_ui_logger: true model: name_or_path: "krea/Krea-2-Raw" quantize: true qtype: "qfloat8" quantize_te: true qtype_te: "qfloat8" arch: "krea2" low_vram: true model_kwargs: {} compile: false layer_offloading: true layer_offloading_text_encoder_percent: 0.15 layer_offloading_transformer_percent: 0.15 sample: sampler: "flowmatch" sample_every: 250 width: 1024 height: 1024 samples: - prompt: "ncpl, solo photo of ttprz latina with red hair" - prompt: "ncpl,solo photo portrait of rgpz holding a coffee cup, in a beanie, sitting at a cafe" - prompt: "ncpl, photo portrait of ttprz and rgpz next to each other, smilling to the camera" neg: "" seed: 42 walk_seed: true guidance_scale: 4 sample_steps: 30 num_frames: 1 fps: 1 meta: name: "[name]" version: "1.0"