
AsymFLUX.2-klein-9B - Pixel Space Model.
Pixel-space text-to-image model AsymFLUX.2-klein finetuned from black-forest-labs/FLUX.2-klein-base-9B, using the AsymFlow method proposed in the paper:
HF: Lakonik/AsymFLUX.2-klein-9B · Hugging Face

Pixel-space text-to-image model AsymFLUX.2-klein finetuned from black-forest-labs/FLUX.2-klein-base-9B, using the AsymFlow method proposed in the paper:
HF: Lakonik/AsymFLUX.2-klein-9B · Hugging Face
"We present Qwen-Image-VAE-2.0, a suite of high-compression Variational Autoencoders (VAEs) that achieve significant advances in both reconstruction fidelity and diffusability. To address the reconstruction bottlenecks of high compression, we adopt an improved architecture featuring Global Skip Connections (GSC) and expanded latent channels. Moreover, we scale training to billions of images and incorporate a synthetic rendering engine to improve performance in text-rich scenarios. To tackle the convergence challenges of high-dimensional latent space, we implement an enhanced semantic alignment strategy to make the latent space highly amenable to diffusion modeling. To optimize computational efficiency, we leverage an asymmetric and attention-free encoder-decoder backbone to minimize encoding overhead. We present a comprehensive evaluation of Qwen-Image-VAE-2.0 on public reconstruction benchmarks. To evaluate performance in text-rich scenarios, we propose OmniDoc-TokenBench, a new benchmark comprising a diverse collection of real-world documents coupled with specialized OCR-based evaluation metrics. Qwen-Image-VAE-2.0 achieves state-of-the-art reconstruction performance, demonstrating exceptional capabilities in both general domains and text-rich scenarios at high compression ratio. Furthermore, downstream DiT experiments reveal our models possess superior diffusability, significantly accelerating convergence compared to existing high-compression baselines. These establish Qwen-Image-VAE-2.0 as a leading model with high compression, superior reconstruction, and exceptional diffusability."
Key innovations:
"We conduct a comprehensive evaluation on OmniDoc-TokenBench (~3K text-rich images, 256×256 resolution). Models are grouped by spatial compression factor and sorted by NED within each group.
Our Qwen-Image-VAE-2.0 achieves state-of-the-art reconstruction across all compression ratios. The f16c128 variant attains SSIM 0.9706 and PSNR 30.45 dB, surpassing the best f8 baseline (FLUX.1-dev at 0.9364 / 26.24 dB) despite 2× higher spatial compression. In terms of text fidelity (NED), f16c128 reaches 0.9617, exceeding all evaluated VAEs. Even under extreme f32 compression, our f32c192 achieves NED 0.8555, surpassing multiple f16 baselines."
Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation
https://reddit.com/link/1taaof4/video/or66xjc6pj0h1/player
"Causal Forcing significantly outperforms Self Forcing in both visual quality and motion dynamics, while keeping the same training budget and inference efficiency —enabling real-time, streaming video generation on a single RTX 4090.
We identify a theoretical flaw in Self Forcing’s training pipeline during ODE initialization: a bidirectional teacher should not be used to supervise an autoregressive student, as this violates frame-level injectivity. Motivated by this analysis, we propose Causal Forcing: we first fine-tune a bidirectional base model into an autoregressive diffusion model, then use it as the teacher for ODE initialization, followed by the same DMD stage as in Self Forcing. Our method significantly outperforms Self Forcing in both visual quality and motion dynamics, while keeping the training budget and inference efficiency unchanged."
Site: Causal-Forcing
byliutao/Longcat-Image-Turbo · Hugging Face
"This repository contains the weights for Longcat-Image-Turbo, a few-step distilled version of Longcat-Image using the Continuous-Time Distribution Matching (CDM) method presented in Continuous-Time Distribution Matching for Few-Step Diffusion Distillation.
CDM migrates the Distribution Matching Distillation (DMD) framework from discrete anchoring to continuous optimization, allowing for high-quality image generation with very few steps (e.g., 4 NFE)."
Image generation and generated-image detection have both advanced rapidly, but mostly along separate technical paths: generation is dominated by generative architectures, while detection is dominated by discriminative ones. This separation creates a persistent gap in practice: generators are not directly optimized by forensic criteria, and detectors are often trained on static snapshots of old forgeries, which limits robustness to new generators.
UniGenDet addresses this gap with a unified co-evolutionary framework that jointly optimizes generation and detection in one loop. The core idea is to make both tasks explicitly exchange useful signals instead of evolving independently.
In short, UniGenDet turns the traditional "generator vs. detector" arms race into a closed-loop collaboration. This repository provides the full training and evaluation pipeline built on pretrained BAGEL components.