Why isn't there a video model specifically made for anime?
Most current video models are completely focused on realism. The few that try to handle anime usually end up producing results that look like a weird mix of 3D and realism instead of something that actually feels 2D.
Wouldn't it actually be easier to create a smaller model similar to Anima, but trained exclusively on anime datasets? In theory, excluding realism and other styles should reduce compute requirements and simplify training quite a bit.
Personally, I'm already tired of almost every video model chasing the exact same goal: cinematic realism. There are dozens of models doing that already; some better, some worse, but in the end they all feel pretty similar.
Meanwhile, there’s barely anything that truly understands 2D anime physics, exaggerated expressions, or the way traditional animation moves. Or at least I don't know of any open-source model that comes close.
Back then, Sora was probably the best AI model for anime-style video because it understood 2D expressions and physics surprisingly well. Right now, Seedance seems to be the closest thing to that, with Grok somewhere behind it, but on the open-source side I still don't see anything remotely similar.
Maybe instead of trying to build one massive all-in-one model that does every style imaginable, it would make more sense to have smaller specialized models focused on specific styles.
I don't know, maybe I'm completely wrong and anime-style video generation is actually harder or more computationally expensive than realism. It's just something I've been wondering about for a while.