u/Brojakhoeman

Apologies for my earlier post i should of tested it first! doh! - I just did not want to stop lora training as i have an issue and it takes 2 hours nearly to resume at 55k steps, ._. - my bad. wont happen again

Video breakdown
- First few seconds, default. str 1.0 video 1.0 audio 1.0
- wednesday different voice Str 1.0 video 1.0 audio 0.0
- Blond Wednesday Str 1.0 video 0.0 audio 1.0

How it works

LTX-2.3 is an audio-visual model — it generates video and audio simultaneously from a single transformer. Inside that transformer, the weights are split into two completely separate branches: a video branch (attn1, attn2, ff) that handles all the visual generation, and an audio branch (audio_attn1, audio_attn2, audio_ff) that handles sound.

When you load a LoRA, both branches get applied together by default. This node loads each LoRA and splits the weights before applying them, letting you scale each branch independently.

STR is the master strength — works exactly like any normal LoRA loader.

multiplies only the video branch weights. Set to 0.0 and the LoRA contributes nothing visual.

multiplies only the audio branch weights. Set to 0.0 and the LoRA contributes nothing to audio.

The key count display (V:1152 A:2112) scans each LoRA on load so you know upfront whether its audio branch is worth using — a LoRA trained on silent footage will show A:0 and audio controls will do nothing.

Important: this controls the LoRA's contribution to audio, not the base model's output. The base LTX-2.3 model generates audio on its own — this node only controls what each LoRA adds on top of that.

Lora loader - Link <

more information and images in the link.

u/Brojakhoeman — 20 days ago

Github link

What it does

LTX-2.3 is unique because it generates video AND audio from a single model. The transformer has completely separate branches for each — video keys (attn1, attn2, ff) and audio keys (audio_attn1, audio_attn2, audio_ff, cross-modal attention). This node exploits that architecture.

10 LoRA slots, each with independent control over:

  • STR — master LoRA strength, works exactly like any normal LoRA loader
  • — video branch multiplier (0.0 = visuals completely off)
  • — audio branch multiplier (0.0 = audio completely off)

Why does this matter?

Say you have a celebrity LoRA trained on video footage of them speaking. That LoRA learned their face AND their voice in separate branches. With this node you can:

  • Load it with V:0.0 A:1.0 — get their voice only, on your own character
  • Stack a different celebrity LoRA with V:1.0 A:0.0 — their face, someone else's voice
  • Fix a LoRA with hissy/crackling audio by setting A:0.7 and leaving visuals at full
  • Mix 10 LoRAs at once without getting a jumbled mess of competing audio

Key count indicator

Each loaded slot automatically scans the LoRA file and shows V:1152 A:2112 — so you know immediately whether a LoRA even has audio keys worth using before you waste a generation. A:0 means it was trained on silent data and audio mode does nothing.

6 themes — Jade, Neon, Studio, Chrome, OLED, Wood. Click the theme button in the node to cycle live, no restart. Saves with your workflow.

TDLR - Can't test if it works at the moment, but it should it was built from my other node - Training, cant be paused right now for current lora.
Any issues i'll remove it temporality

u/Brojakhoeman — 20 days ago