
u/Brojakhoeman

Apologies for my earlier post i should of tested it first! doh! - I just did not want to stop lora training as i have an issue and it takes 2 hours nearly to resume at 55k steps, ._. - my bad. wont happen again
Video breakdown
- First few seconds, default. str 1.0 video 1.0 audio 1.0
- wednesday different voice Str 1.0 video 1.0 audio 0.0
- Blond Wednesday Str 1.0 video 0.0 audio 1.0
How it works
LTX-2.3 is an audio-visual model — it generates video and audio simultaneously from a single transformer. Inside that transformer, the weights are split into two completely separate branches: a video branch (attn1, attn2, ff) that handles all the visual generation, and an audio branch (audio_attn1, audio_attn2, audio_ff) that handles sound.
When you load a LoRA, both branches get applied together by default. This node loads each LoRA and splits the weights before applying them, letting you scale each branch independently.
STR is the master strength — works exactly like any normal LoRA loader.
V× multiplies only the video branch weights. Set to 0.0 and the LoRA contributes nothing visual.
A× multiplies only the audio branch weights. Set to 0.0 and the LoRA contributes nothing to audio.
The key count display (V:1152 A:2112) scans each LoRA on load so you know upfront whether its audio branch is worth using — a LoRA trained on silent footage will show A:0 and audio controls will do nothing.
Important: this controls the LoRA's contribution to audio, not the base model's output. The base LTX-2.3 model generates audio on its own — this node only controls what each LoRA adds on top of that.
Lora loader - Link <
more information and images in the link.
What it does
LTX-2.3 is unique because it generates video AND audio from a single model. The transformer has completely separate branches for each — video keys (attn1, attn2, ff) and audio keys (audio_attn1, audio_attn2, audio_ff, cross-modal attention). This node exploits that architecture.
10 LoRA slots, each with independent control over:
- STR — master LoRA strength, works exactly like any normal LoRA loader
- V× — video branch multiplier (0.0 = visuals completely off)
- A× — audio branch multiplier (0.0 = audio completely off)
Why does this matter?
Say you have a celebrity LoRA trained on video footage of them speaking. That LoRA learned their face AND their voice in separate branches. With this node you can:
- Load it with
V:0.0 A:1.0— get their voice only, on your own character - Stack a different celebrity LoRA with
V:1.0 A:0.0— their face, someone else's voice - Fix a LoRA with hissy/crackling audio by setting
A:0.7and leaving visuals at full - Mix 10 LoRAs at once without getting a jumbled mess of competing audio
Key count indicator
Each loaded slot automatically scans the LoRA file and shows V:1152 A:2112 — so you know immediately whether a LoRA even has audio keys worth using before you waste a generation. A:0 means it was trained on silent data and audio mode does nothing.
6 themes — Jade, Neon, Studio, Chrome, OLED, Wood. Click the theme button in the node to cycle live, no restart. Saves with your workflow.
TDLR - Can't test if it works at the moment, but it should it was built from my other node - Training, cant be paused right now for current lora.
Any issues i'll remove it temporality