u/Independent-Date393

Two days with Omni Flash and Seedance 2.0 still owns motion. Change my mind.

I went into this wanting Google to win. Multimodal SOTA, conversational editing, I/O hype was real. Two days of A/B testing later, motion is still Seedance 2.0's house.

Specifically:

backflip, fight choreo, anything that needs follow-through on momentum. Omni does the first frame perfectly. Then the limbs forget what they were doing.

Seedance does the same shots and it just looks like physics happened.

Where Omni does win:

audio sync. Conversational re-edit. Image-to-video starting frame. The any-to-any framing is genuinely the headline feature, not a hype reach.

But every reddit thread I'm seeing on motion comparisons lands the same way. Even the JSFILMZ tweet (the side-by-side one) — not a fair fight on action shots.

If anyone has an Omni gen that holds motion through a full 5s of complex movement, post it. Want to be wrong on this.

reddit.com
u/Independent-Date393 — 1 day ago

Two days with Omni Flash and Seedance 2.0 still owns motion. Change my mind.

I went into this wanting Google to win. Multimodal SOTA, conversational editing, I/O hype was real. Two days of A/B testing later, motion is still Seedance 2.0's house.

Specifically:

backflip, fight choreo, anything that needs follow-through on momentum. Omni does the first frame perfectly. Then the limbs forget what they were doing.

Seedance does the same shots and it just looks like physics happened.

Where Omni does win:

audio sync. Conversational re-edit. Image-to-video starting frame. The any-to-any framing is genuinely the headline feature, not a hype reach.

But every reddit thread I'm seeing on motion comparisons lands the same way. Even the JSFILMZ tweet (the side-by-side one) — not a fair fight on action shots.

If anyone has an Omni gen that holds motion through a full 5s of complex movement, post it. Want to be wrong on this.

reddit.com
u/Independent-Date393 — 1 day ago
▲ 0 r/AtlasCloudAI+1 crossposts

GPT Image 2 + Seedance 2.0 is the most efficient short video pipeline I've run

how it works:

  • paste a script or story in, kimi2.6 breaks it into 9 key beats
  • GPT Image 2 generates a 3×3 comic storyboard in a single call, all 9 panels on one canvas
  • then Seedance 2.0 I2V takes the grid as a first-frame reference and generates a 15s cinematic clip

the whole thing runs through AtlasCloud.ai's API, one key for both models

all 9 panels are on the same canvas so GPT Image 2 naturally keeps character appearance, costume, and lighting consistent. Seedance uses the grid as reference and outputs. cost is around $1.5–2 per clip.

n8n node: https://github.com/AtlasCloudAI/n8n-nodes-atlascloud

comfy node: https://github.com/AtlasCloudAI/atlascloud_comfyui

u/Independent-Date393 — 25 days ago

Used Seedance 2.0 to directly turn the ARPG game image generated by GPT Image 2 into a trailer

It's not perfect but has really nice visual effects, it seems definitely enough material here to generate content for a puzzle-solving game.

Used both on AtlasCloud.ai

Vid prompt:

A cinematic third-person RPG game interaction scene set in the desert capital city of Solaris. The video starts with a smooth camera pan through the high-tech solar architecture under a golden sunset. The screen features a minimalist game HUD (heads-up display) with a quest objective in the corner. The player character approaches a female 'People of the Sun' NPC wearing white and gold desert robes and a hood. As the player gets closer, a "Talk" prompt icon appears. Upon clicking, a translucent dialogue box pops up at the bottom, showing the NPC talking with subtle facial expressions and hand gestures. In the background, solar-powered vehicles fly by and energy pillars glow with golden light. High-definition, 4k, Unreal Engine 5 style, immersive game UI, smooth character animation.

u/Independent-Date393 — 28 days ago
▲ 8 r/HappyHorse_AI+1 crossposts

HappyHorse-1.0 quickly climbed to the top of all four Artificial Analysis video leaderboards without any prior announcement, and was later officially confirmed by Alibaba’s ATH division.

The official statement says the model will be fully open-sourced and the API is planned to be opened to the public on April 30, AtlasCloud.ai will integrate it once it's available. making HappyHorse the next strong open-source model after Wan 2.2 and LTX 2.3.

As of noon on April 13, HappyHorse-1.0 has an Elo score of 1384 in text-to-video without audio, which is 111 points higher than Seedance 2.0, and reaches 1413 in image-to-video without audio, the highest score ever recorded on the platform.

In the Elo system, a difference of more than 60 points already indicates a clear preference, so a gap of 111 points means users almost overwhelmingly choose HappyHorse in blind tests.

However, once audio is included, the gap narrows to 1–2 points, which is effectively a tie between HappyHorse and Seedance in terms of audio-visual synchronization and sound quality.

HappyHorse-1.0 and Seedance 2.0 represent two different technical routes.

HappyHorse-1.0 follows an open-source approach, uses a unified Transformer architecture, generates audio and video in a single step, natively supports lip-sync in 7 languages, has 15 billion parameters, and takes about 38 seconds on a single H100 to generate a 5-second 1080p video.

Seedance 2.0 is a closed commercial system that uses a Bidirectional Diffusion Transformer (DB-DiT), supports multimodal input including text, images, video, and audio, can generate up to about 60 seconds of 2K video, and supports lip-sync in more than 8 languages.

At the architecture level, HappyHorse uses a 40-layer unified self-attention Transformer that jointly models text, video, and audio tokens within a single sequence.

This means that sound and image are in the same semantic space from the beginning of generation.

The model uses DMD-2 distillation and full-graph optimization via MagiCompiler, resulting in about 38 seconds to generate a 5-second 1080p video on a single H100.

It natively supports lip-sync for English, Mandarin, Cantonese, Japanese, Korean, German, and French, and achieves a very low word error rate among open-source models.

Participants in the Artificial Analysis blind tests report that HappyHorse performs well in character rendering, especially in skin texture and motion smoothness, while leaked videos also reveal issues such as rippling artifacts, stripe artifacts in fast motion, and quality degradation on large screens.

At present, the model has not yet been officially listed; the team is still working on it, and community members have uploaded several converted checkpoints, all of which are unofficial versions.

From a broader perspective, the emergence of HappyHorse came two weeks after OpenAI announced it would stop further development of Sora. At a time when there were doubts about the future of AI video, HappyHorse effectively picked up the baton.

For developers concerned about local deployment, the team points out that a 15-billion-parameter video model has high computational requirements: on a single H100, generating a 5-second 1080p video takes about 38 seconds.

Consumer GPUs like the RTX 4090 with 24GB of VRAM require quantization or model offloading to run; FP16 inference is very likely to exceed 24GB of VRAM, and while 4-bit quantization is feasible, it will lead to some degradation in image quality.

Therefore, for serious production scenarios, a more practical solution is to use cloud GPUs with more than 40GB of VRAM or wait for the official API release on April 30.

Source:Official blog

u/Independent-Date393 — 1 month ago