u/Fun_Walk_4965

DeepSeek R1 keeps inventing pandas methods that don't exist. Ran 50 tasks against Qwen3.6 last week — wasn't close.

Last week DeepSeek R1 confidently generated a pandas method that doesn't exist. Took me 20 minutes to figure out why my pipeline was throwing AttributeError. I've been on R1 since January and this isn't the first time it's happened.

So I ran a head-to-head against Qwen3.6 35B. 50 tasks pulled from my actual workflow — python refactoring, SQL optimization, edge case debugging. Same prompts, same temperature.

Qwen3.6 won 31. DeepSeek took 14. 5 were basically a wash.

The hallucination gap was the part I didn't expect. DeepSeek kept generating pandas methods that don't exist, confidently. Same for SQL — invented postgres functions that aren't real. Qwen3.6 caught itself maybe 6-7 times across the run and said "I'm not 100% sure about this syntax, verify it." Which sounds soft until you've shipped code that hallucinated a method name into production.

Where DeepSeek still wins: pure chain-of-thought stuff. I threw in some proof-style math problems and DeepSeek handled them more cleanly. So for reasoning/math I'm still routing to R1. But for daily "this function is broken, fix it" coding work, Qwen3.6 is my default now.

Latency felt faster on Qwen3.6 too but I didn't formally measure it — and the routing could be biasing things either way, so not building a story on that.

Not saying DeepSeek is bad. Still my go-to for reasoning. But the hallucination gap was big enough that I'm not letting it touch library-specific code anymore until it tightens up.

Happy to share the task list if anyone wants to replicate. Both are on atlas if you want a single-endpoint A/B, otherwise direct works too.

reddit.com
u/Fun_Walk_4965 — 1 day ago

DeepSeek R1 keeps inventing pandas methods that don't exist. Ran 50 tasks against Qwen3.6 last week — wasn't close.

Last week DeepSeek R1 confidently generated a pandas method that doesn't exist. Took me 20 minutes to figure out why my pipeline was throwing AttributeError. I've been on R1 since January and this isn't the first time it's happened.

So I ran a head-to-head against Qwen3.6 35B. 50 tasks pulled from my actual workflow — python refactoring, SQL optimization, edge case debugging. Both through the same provider (atlas, single endpoint) so routing wasn't a confound. Same prompts, same temperature.

Qwen3.6 won 31. DeepSeek took 14. 5 were basically a wash.

The hallucination gap was the part I didn't expect. DeepSeek kept generating pandas methods that don't exist, confidently. Same for SQL — invented postgres functions that aren't real. Qwen3.6 caught itself maybe 6-7 times across the run and said "I'm not 100% sure about this syntax, verify it." Which sounds soft until you've shipped code that hallucinated a method name into production.

Where DeepSeek still wins: pure chain-of-thought stuff. I threw in some proof-style math problems and DeepSeek handled them more cleanly. So for reasoning/math I'm still routing to R1. But for daily "this function is broken, fix it" coding work, Qwen3.6 is my default now.

Latency felt faster on Qwen3.6 too but I didn't formally measure it — and the routing could be biasing things either way, so not building a story on that.

Not saying DeepSeek is bad. Still my go-to for reasoning. But the hallucination gap was big enough that I'm not letting it touch library-specific code anymore until it tightens up.

Happy to share the task list if anyone wants to replicate. Both are on atlas if you want a single-endpoint A/B, otherwise direct works too.

reddit.com
u/Fun_Walk_4965 — 1 day ago
▲ 3 r/GeminiOmni_AI+1 crossposts

Hit Omni Flash rate limit after 5 generations on Pro — is this just me?

New to Omni. Got the launch hype, paid Pro thinking I'd have room to play around, ran 5 short clips, locked out. The screen says "try again in 4 hours and 23 minutes."

Is 5 the actual cap? Or is there some quota refresh I'm missing?

Asked because I was about to upgrade to Ultra but if Ultra is also gonna 5-and-out I'd rather wait for whatever API access drops in "the weeks ahead" lol.

reddit.com
u/Fun_Walk_4965 — 2 days ago

Two weeks into AI short drama, the wall isn't picking a model, it's the asset workflow

Been experimenting with AI-generated short dramas (vertical 60-second story clips for TikTok and Shorts) for a couple weeks. Sharing what's gotten me past the obvious failure modes.

The thing people keep asking is "which software should I install?" The real bottleneck isn't tools. It's workflow. I was platform-hopping for the first week, writing scripts in one place, generating characters elsewhere, scenes in a third tool, rendering somewhere else. Context switching killed momentum more than any single tool ever did.

https://preview.redd.it/eqkmuzydgn0h1.jpg?width=2048&format=pjpg&auto=webp&s=7c209d788ca8246f9a9a30bc460585a6ece2f719

Step 1. Script before any tool opens.

Not a vague idea. An actual breakdown with:

  • Characters involved
  • Conflict or setup
  • Dialogue
  • Shot list

I use an LLM to expand a single theme into a 30-second script in about two minutes. The script becomes the blueprint for every asset you generate after. Start with one 30-second scene before attempting a 50-episode arc.

Step 2. Character images in 9:16 portrait.

These are casting photos, the visual anchor every later scene refers to. Character consistency issues are almost never the video model's fault. They usually come from inconsistent reference images, changing prompt styles between scenes, or swapping reference photos mid-project because "this one looks slightly better." Pick the look once and never change it.

Step 3. Scene backgrounds in 16:9 landscape.

https://preview.redd.it/023yx1rohn0h1.png?width=1080&format=png&auto=webp&s=17df47e4dcca07b40be31daeacb54b17c637c818

Classrooms, offices, streets, whatever the script needs. Match the style to the character style. Photorealistic characters in CG-looking scenes reads as fake immediately.

Step 4. Video generation with Seedance 2.0.

For the 15-second clips, Seedance 2.0 held up best on multi-camera shots and character motion for the kind of beats short drama needs. Tried a few alternatives, none worked out.

One thing nobody flags: if your characters are photorealistic humans, you'll hit content review. The fix is to import your character reference images to the asset library first and let them get reviewed before video generation tries to use them. Skip this and the real-person video calls keep failing without an obvious reason.

Prompt template I use:

u/image1 as [character name], u/image2 as [character name], u/image3 as [scene]. [Character 1] walks toward [Character 2], angry expression, medium shot, cinematic lighting.

Label assets clearly. The model isn't going to guess who's who.

Step 5. Multi-clip assembly.

https://preview.redd.it/full8ynygn0h1.jpg?width=2048&format=pjpg&auto=webp&s=a0183b2a897a3ee4b06e431d1fed0784cd643d29

Don't generate one long 60-second take. Four stable 15-second segments stitched together beats one shaky 60-second video every time. After the clips are in, add subtitles, layer in sfx, cut awkward transitions. Generation is roughly half the work. The edit is the rest.

About the model orchestration. By the time you've followed those five steps you've called four different model families: an LLM for the script, two image generators (one for characters, one for scenes), and a video model. Each one has its own SDK, its own API key, its own quota meter. That accounted for half of my context-switching pain in week one.

I ended up consolidating to one API host that exposes all four model families under a single key and a single dashboard. Doesn't make generation faster, but the operational friction drops a lot. Anyone stitching a multi-model pipeline together hits this eventually.

Per-episode cost runs around $5-7 once you account for the four video segments at roughly a dime a second on Seedance, plus a few cents each for image gen and the script LLM call. Compared to outsourcing the same minute of edited short drama (anywhere from $500 up), the cost gap is two orders of magnitude.

Seedance 2.0 prompt templates I've been collecting for short drama beats: https://github.com/AtlasCloudAI/awesome-seedance-2-prompt

Last note: if you're sitting on whether to start, just write the script first. Everything after that gets figured out on the way.

reddit.com
u/Fun_Walk_4965 — 11 days ago

tested GPT Image 2 on AtlasCloud.ai, its text rendering is way better now. scene understanding is also noticeably better. complex multi-object scenes with layered elements used to fall apart, now they hold together. response speed is solid, image-to-image editing feels more coherent than 1.5

But background detail still gives itself away, better than six months ago tho

during my tests, i found that for camera angle, it keeps defaulting to something slightly unconventional. not always bad, sometimes interesting, but not what I asked for, and the visuals look a bit off, the resolution seems kind of low

but overall, it's great, it might be as good as nb pro imo, or even better for some use cases.

u/Fun_Walk_4965 — 29 days ago