u/Brojakhoeman — reddlx

looks like Runexx made that dub lora for ltx turn any silent video into speaking

Video-2-Video/LTX-2.3_-_V2V_Just_Talk_dub_any_silent_video_multilanguage.json · RuneXX/LTX-2.3-Workflows at main

Apologies for my earlier post i should of tested it first! doh! - I just did not want to stop lora training as i have an issue and it takes 2 hours nearly to resume at 55k steps, ._. - my bad. wont happen again

Video breakdown
- First few seconds, default. str 1.0 video 1.0 audio 1.0
- wednesday different voice Str 1.0 video 1.0 audio 0.0
- Blond Wednesday Str 1.0 video 0.0 audio 1.0

How it works

LTX-2.3 is an audio-visual model — it generates video and audio simultaneously from a single transformer. Inside that transformer, the weights are split into two completely separate branches: a video branch (attn1, attn2, ff) that handles all the visual generation, and an audio branch (audio_attn1, audio_attn2, audio_ff) that handles sound.

When you load a LoRA, both branches get applied together by default. This node loads each LoRA and splits the weights before applying them, letting you scale each branch independently.

STR is the master strength — works exactly like any normal LoRA loader.

V× multiplies only the video branch weights. Set to 0.0 and the LoRA contributes nothing visual.

A× multiplies only the audio branch weights. Set to 0.0 and the LoRA contributes nothing to audio.

The key count display (V:1152 A:2112) scans each LoRA on load so you know upfront whether its audio branch is worth using — a LoRA trained on silent footage will show A:0 and audio controls will do nothing.

Important: this controls the LoRA's contribution to audio, not the base model's output. The base LTX-2.3 model generates audio on its own — this node only controls what each LoRA adds on top of that.

Lora loader - Link <

more information and images in the link.

u/Brojakhoeman — 20 days ago

▲ 9 r/StableDiffusion

Github link

What it does

LTX-2.3 is unique because it generates video AND audio from a single model. The transformer has completely separate branches for each — video keys (attn1, attn2, ff) and audio keys (audio_attn1, audio_attn2, audio_ff, cross-modal attention). This node exploits that architecture.

10 LoRA slots, each with independent control over:

STR — master LoRA strength, works exactly like any normal LoRA loader
V× — video branch multiplier (0.0 = visuals completely off)
A× — audio branch multiplier (0.0 = audio completely off)

Why does this matter?

Say you have a celebrity LoRA trained on video footage of them speaking. That LoRA learned their face AND their voice in separate branches. With this node you can:

Load it with V:0.0 A:1.0 — get their voice only, on your own character
Stack a different celebrity LoRA with V:1.0 A:0.0 — their face, someone else's voice
Fix a LoRA with hissy/crackling audio by setting A:0.7 and leaving visuals at full
Mix 10 LoRAs at once without getting a jumbled mess of competing audio

Key count indicator

Each loaded slot automatically scans the LoRA file and shows V:1152 A:2112 — so you know immediately whether a LoRA even has audio keys worth using before you waste a generation. A:0 means it was trained on silent data and audio mode does nothing.

6 themes — Jade, Neon, Studio, Chrome, OLED, Wood. Click the theme button in the node to cycle live, no restart. Saves with your workflow.

TDLR - Can't test if it works at the moment, but it should it was built from my other node - Training, cant be paused right now for current lora.
Any issues i'll remove it temporality

u/Brojakhoeman — 20 days ago