u/MuziqueComfyUI

▲ 4 r/comfyuiSkshahdio+1 crossposts

GitHub - mmoalem/ComfyuAudioNodes-BitsAndBobs: A collection of custom ComfyUI nodes for audio generation, comparison, and manipulation.

ComfyuAudioNodes-BitsAndBobs

"A collection of custom ComfyUI nodes for audio generation, comparison, and manipulation.

Nodes in this Collection

Lora-Dora-Lokr-Loader

A universal adapter loader for ACE-Step models.

  • Supports LoRA, DoRA, and LoKr/ LoHa (LyCORIS) formats.
  • Features per-layer category scaling (Self-Attention, Cross-Attention, FFN).
  • Advanced auto-strength balancing for Flux-based models.
  • Includes a "Simple" node variant for a streamlined UI.
  • Based on the DoRA Power LoRA Loader by xmarre.

Ace-Step_chord_injector

Tools for manipulating and injecting chord information into the ACE-Step generation pipeline.

Note

This node currently produces an audible effect on the output, but it is not yet performing its intended function correctly. It is included here for ongoing development and testing.

preview_audio_multi_compare

A utility node for side-by-side comparison of multiple audio generation outputs within the ComfyUI interface.

ace_step_reference

A set of nodes for injecting reference audio into ACE-Step generation via multiple pathways.

  • Timbre Encoding & Conditioning: Encodes reference audio into a timbre embedding and injects it into the cross-attention pathway. This method is stable and generally works well for transferring vocal/instrumental characteristics.
  • KV Self-Attention Injection: Captures K/V tensors from a reference forward pass and injects them into the generation. This provides higher fidelity style transfer but is currently WIP (Work In Progress) with mixed results.
  • Per-Step KV Injection: Real-time capture and injection at every sampling step. This is the most computationally expensive method but allows for precise alignment.

...

ace_step_gguf_loader

A custom GGUF and PyTorch bypass loader specifically designed for running quantized ACE-Step models natively inside ComfyUI.

  • Supports ACE-Step 1.5 DiT acestep architectures missing from standard allowlists.
  • Re-maps the GGUF qwen3 embedding namespace back into HuggingFace format for ComfyUI detection.
  • Includes a direct subclass wrapper for the AudioOobleckVAE architecture to fix cross-device dtype crashes and apply missing 48kHz to 44.1kHz resampling when used with ACE-Step 1.5."

https://github.com/mmoalem/ComfyuAudioNodes-BitsAndBobs/

Thanks mmoalem.

github.com
u/MuziqueComfyUI — 5 days ago
▲ 7 r/comfyuiSkshahdio+2 crossposts

GitHub - SGUN-father/comfyui-controlfoley: 神棍 ControlFoley integration for ComfyUI — generate synchronized foley sound effects from video, images, and text prompts. Based on the ControlFoley project by Xiaomi Research.

ComfyUI-ControlFoley

"ControlFoley integration for ComfyUI — generate synchronized foley sound effects from video, images, and text prompts.

Based on the ControlFoley project by Xiaomi Research.

功能概述

ControlFoley 是一个视频到音频的拟音生成模型,可以为无声视频生成时间同步的音效(如脚步声、关门声、键盘敲击等)。该 ComfyUI 节点完整复现了 ControlFoley 的所有能力:

  • 视频到音效: 输入无声视频,生成与视频内容时间同步的音效
  • 图片到音效: 输入单张图片 + 可选的文本描述,生成对应音效
  • 文本到音效: 仅通过文本描述生成音效
  • 参考音色控制: 通过参考音频控制生成音效的音色风格
  • 多模态控制: 同时使用视频、文本、音频进行联合控制"

https://github.com/SGUN-father/comfyui-controlfoley

谢谢 SGUN-father.

...

ControlFoley: Unified and Controllable Video-to-Audio Generation with Cross-Modal Conflict Handling

Jianxuan Yang, Xinyue Guo, Zhi Cheng, Kai Wang, Lipan Zhang, Jinjie Hu, Qiang Ji, Yihua Cao, Yihao Meng, Zhaoyue Cui, Mengmei Liu, Meng Meng, Jian Luan

"Recent advances in video-to-audio (V2A) generation enable high-quality audio synthesis from visual content, yet achieving robust and fine-grained controllability remains challenging. Existing methods suffer from weak textual controllability under visual-text conflict and imprecise stylistic control due to entangled temporal and timbre information in reference audio. Moreover, the lack of standardized benchmarks limits systematic evaluation.

We propose ControlFoley, a unified multimodal V2A framework that enables precise control over video, text, and reference audio. We introduce a joint visual encoding paradigm that integrates CLIP with a spatio-temporal audio-visual encoder to improve alignment and textual controllability. We further propose temporal-timbre decoupling to suppress redundant temporal cues while preserving discriminative timbre features. In addition, we design a modality-robust training scheme with unified multimodal representation alignment (REPA) and random modality dropout. We also present VGGSound-TVC, a benchmark for evaluating textual controllability under varying degrees of visual-text conflict.

Extensive experiments demonstrate state-of-the-art performance across multiple V2A tasks, including text-guided, text-controlled, and audio-controlled generation. ControlFoley achieves superior controllability under cross-modal conflict while maintaining strong synchronization and audio quality, and shows competitive or better performance compared to an industrial V2A system.

Code, models, datasets, and demos are available at: this https URL."

https://arxiv.org/abs/2604.15086

https://huggingface.co/YJX-Xiaomi/ControlFoley

https://github.com/xiaomi-research/controlfoley

谢谢 Jianxuan Yang and ControlFoley team.

github.com
u/MuziqueComfyUI — 5 days ago
▲ 2 r/comfyuiSkshahdio+1 crossposts

GitHub - Saganaki22/ComfyUI-OmniVoice-TTS: OmniVoice TTS nodes for ComfyUI - Zero-shot multilingual text-to-speech with voice cloning, voice design, and multi-speaker dialogue

ComfyUI-OmniVoice-TTS

"OmniVoice TTS nodes for ComfyUI — Zero-shot multilingual text-to-speech with voice cloning and voice design. Supports 600+ languages with state-of-the-art quality."

https://github.com/Saganaki22/ComfyUI-OmniVoice-TTS

Thanks Saganaki22.

...

https://www.reddit.com/r/StableDiffusion/comments/1sbemc5/comfyuiomnivoicetts/

https://www.reddit.com/r/comfyui/comments/1stq7p3/i_just_tried_omni_voice_and_holy_sht_its_good_for/

...

OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models

Han Zhu, Lingxuan Ye, Wei Kang, Zengwei Yao, Liyong Guo, Fangjun Kuang, Zhifeng Han, Weiji Zhuang, Long Lin, Daniel Povey

>We present OmniVoice, a massively multilingual zero-shot text-to-speech (TTS) model that scales to over 600 languages. At its core is a novel diffusion language model-style discrete non-autoregressive (NAR) architecture. Unlike conventional discrete NAR models that suffer from performance bottlenecks in complex two-stage (text-to-semantic-to-acoustic) pipelines, OmniVoice directly maps text to multi-codebook acoustic tokens. This simplified approach is facilitated by two key technical innovations: (1) a full-codebook random masking strategy for efficient training, and (2) initialization from a pre-trained LLM to ensure superior intelligibility. By leveraging a 581k-hour multilingual dataset curated entirely from open-source data, OmniVoice achieves the broadest language coverage to date and delivers state-of-the-art performance across Chinese, English, and diverse multilingual benchmarks. Our code and pre-trained models are publicly available at this https URL.

https://arxiv.org/abs/2604.00688

https://zhu-han.github.io/omnivoice/

https://github.com/k2-fsa/OmniVoice

Thanks Han Zhu and the OmniVoice team.

github.com
u/MuziqueComfyUI — 6 days ago
▲ 6 r/comfyuiSkshahdio+3 crossposts

megagrump/Ace-Step-1.5-ScragVAE-ComfyUI · Hugging Face

Released yesterday:

ScragVAE — Improved VAE Decoder for ACE-Step 1.5

"A fine-tuned AutoencoderOobleck decoder with an intent to improve audio fidelity for the ACE-Step 1.5 music generation pipeline. Drop-in compatible with all existing ACE-Step DiT checkpoints.

This is a conversion of the original ScragVAE that makes it usable with ComfyUI."

Thanks P. Murgagem (megagrump).

...

Thanks scragnog.

huggingface.co
u/MuziqueComfyUI — 1 day ago

ComfyUI Woosh: How to Add PERFECT Sound to ANY AI Video(4GB VRAM). Thskshahnks ComfyUI Workflow Blog.

ComfyUI-Woosh

"Sound effect generation nodes for ComfyUI — Text-to-audio and video-to-audio using Sony AI's Woosh foundation model."

https://github.com/Saganaki22/ComfyUI-Woosh

https://huggingface.co/drbaph/Woosh

Thskshahnks again Saganaki22.

...

https://github.com/SonyResearch/Woosh

https://arxiv.org/abs/2502.07359

@/article{saghibakshi2025woosh,
      title={Woosh: Enhancing Text-to-Audio Generation with Flow Matching and FlowMap Distillation},
      author={Saghibakshi, Ali and Bakshi, Soroosh and Tagliasacchi, Antonio and Wang, Shaojie and Choi, Jongmin and
Kawakami, Kazuhiro and Gu, Yuxuan},
      journal={arXiv preprint arXiv:2502.07359},
      year={2025}
}
youtube.com
u/MuziqueComfyUI — 7 days ago
▲ 4 r/AprilThskshahnksYear+1 crossposts

GitHub - Saganaki22/ComfyUI-VoxCPM2: VoxCPM2 TTS for ComfyUI. 30 languages, voice design, controllable cloning, 48kHz audio, and LoRA training

ComfyUI-VoxCPM2

"English | 中文

ComfyUI nodes for VoxCPM2 — tokenizer-free, diffusion autoregressive Text-to-Speech.
2B parameters, 30 languages, 48kHz audio output, voice design, controllable cloning, and LoRA training.

About

VoxCPM2 is a tokenizer-free Text-to-Speech model trained on over 2 million hours of multilingual speech data. Built on a MiniCPM-4 backbone with AudioVAE V2, it outputs 48kHz studio-quality audio and supports 30 languages with no language tag needed.

This custom node provides two inference nodes and a full LoRA training pipeline, all integrated directly into ComfyUI — based on the original ComfyUI-VoxCPM by u/wildminder."

https://github.com/Saganaki22/ComfyUI-VoxCPM2

Thanks again Saganaki22.

github.com
u/MuziqueComfyUI — 8 days ago

"The sub was created initially as more of a notepad / sketchbook than a traditional sub - for our own purposes - and along the way evolved into some very particular in-joke slideshow material to accompany IRL workshop presentations we're trialling later in the year at an open source project space.

We're not taking things all too seriously, as you may have noticed from scanning through the content.

There's a lot of useful ComfyUI / Not-ComfyUI AI audio related content (if you scroll back far enough), but also plenty of apparently unhinged / seemingly nonsensical material, if you don't have the broader contextual (re)framing.

We'll eventually be making an absurdity-lite descriptor to share with the community so those who've expressed curiosity / confusion about the direction of the sub can better comprehend our intentions.

We've decided to expand the project into an episodic sit-com in The Comfyverse (satire / farce / meta-commentary), but that side of things is being worked on in a gradual way and is fairly slow moving due to other IRL obligations / time constraints.

We've created all the material we need for the IRL presentatons, so we're stepping back and if folk use the sub then cool, and if not, then no big deal - it's already served its purpose.

We've not abandoned the sub by any means, but apart from some obligatory modding activity, which none of us are particularly fond of (generally speaking reddit modding is pretty cringey stuff!), it's now in the hands of the community to contribute to the sub on their own terms if they have an authentic desire for a successful / thriving ComfyUI audio focused sub.

Hope tshaht helps."

u/MuziqueComfyUI — 25 days ago
▲ 478 r/comfyuiAudio+1 crossposts

Hi r/StableDiffusion, Today we’re excited to share that Comfy has raised $30M at a $500M valuation! Comfy has grown a lot over the past year, and especially over the past six months: more than 50% of our users joined the Comfy ecosystem during that period. Comfy Cloud has also grown quickly, with annualized bookings crossing $10M in 8 months.

This funding gives us more room to invest in the things this community cares about most: making Comfy more stable, improving the product experience, fixing bugs faster (sorry again for the bugs!) and continuing to launch powerful new features in the open!

The main goal of this announcement is to also attract top talent to build what we believe to be a generational mission of making sure open source creative tools win. If you are passionate about Comfy and OSS creative AI, join us at comfy.org.

Please help us spread the news by spending 90s on twitter and Linkedin where you can help us to amplify our announcement and enter to win an exclusive ComfyUI Swag

We are an open source team, being in the open is part of our culture (although we have not been doing a great job at communicating at times). As part of the announcement, we would love to do a live AMA on Discord. Please upvote this post and add your questions there, we will go through them live at 3PM PST.

Tune in to the AMA here: https://www.reddit.com/r/comfyui/comments/1sumsoh/comfy_org_funding_announcement_ama_live_at_3pm_pst/

PS:
For those who speculated on our announcement in this thread, I apologize for the dramatic vibe-coded countdown page. For those who believed our announcement is more bugs, I will be personally shipping a few extra bugs IP-enabled just for you u/Ill_Ease_6749

https://preview.redd.it/i1m2xj7ie6xg1.png?width=508&format=png&auto=webp&s=250e8307c5ad4600fc9b29718268215a4753e5d2

reddit.com
u/MuziqueComfyUI — 25 days ago
▲ 188 r/comfyuiAudio+1 crossposts

Hi r/comfyui! Today we’re excited to share that Comfy has raised $30M at a $500M valuation! Comfy has grown a lot over the past year, and especially over the past six months: more than 50% of our users joined the Comfy ecosystem during that period. Comfy Cloud/Partner Nodes has also grown quickly, with annualized bookings crossing $10M in 8 months.

This funding gives us more room to invest in the things this community cares about most: making Comfy more stable, improving the product experience, fixing bugs faster (sorry again for the bugs!) and continuing to launch powerful new features in the open!

The main goal of this announcement is to also attract top talent to build what we believe to be a generational mission of making sure open source creative tools win. If you are passionate about Comfy and OSS creative AI, join us at comfy.org/careers.

Please help us spread the news by spending 90s on comfy.org/share-the-news where you can help us to amplify our announcement and enter to win an exclusive ComfyUI Swag

We are an open source team, being in the open is part of our culture (although we have not been doing a great job at communicating at times). As part of the announcement, we would love to do a live AMA on Discord. Please upvote this post and add your questions there, we will go through them live at 3PM PST.

Tune in to the AMA here: https://www.reddit.com/r/comfyui/comments/1sumsoh/comfy_org_funding_announcement_ama_live_at_3pm_pst/

u/MuziqueComfyUI — 25 days ago

Despite prolonged efforts to encourage their voluntary departure, Negatrons still lurk in our midst.

"We're busy cultivating a welcoming atmosphere for future Positrons with sincere interest in StabooruJeffrey.

The hypertoxicity which permeates and openly thrives on our fritzed Partner Node subs, will not be tolerated here."

We are obliged to deploy The Golden Jeffrey.

F.A.O. Negatron: This is no place for Pinks.

Just go.

reddit.com
u/MuziqueComfyUI — 29 days ago