u/BullBullGo

▲ 9 r/Agent_AI+4 crossposts

Hardcore Benchmark: Gemini 3.5 Flash vs OpenAI Codex API in Agentic Workflows

Just ran extensive tests on the newly released Gemini 3.5 Flash ($20/mo Google One AI Pro) using a desktop browser automation Agent (Anti-gravity compiler). Verdict: Fast but fundamentally broken.

1️⃣ Speed Over Substance: Throughput is incredibly high with crystal-clear step-by-step logic outputs, but it fails to close the loop and actually solve the problem. The gap from the promo video is massive.

2️⃣ Data Corruption: When managing website translations and typesetting, the frontend gets flooded with garbled text and heavy noise data, likely triggered by over-tuned safety layers.

3️⃣ The Codex Alternative: Reverting to OpenAI Codex API with the same data payload successfully mapped the sample article perfectly.

Google's recent sharp stock increase does not reflect actual model capability. The utility and value of Gemini 3.5 & 3.1 Flash have severely degraded compared to 3 months ago.

👇 Fellow devs, are you seeing similar text corruption in your agentic pipelines?

u/BullBullGo — 23 hours ago

【硬核实测】Gemini 3.5 Flash 对比 OpenAI Codex API:大模型能力的严重倒退?

昨晚针对新发布的 Gemini 3.5 Flash(Google One AI Pro $20/月)进行了桌面浏览器自动化 Agent(反重力编译器)的重度测试。结论:金玉其外,败絮其中。
1️⃣ 速度狂飙,但逻辑已死:响应极其敏捷,步骤输出也足够清晰,但最终无法闭环解决问题。实际体验与发布会差距明显。
2️⃣ 数据噪点过载:在处理网站翻译、文章排版及视频流时,前端输出夹杂大量干扰乱码,疑似过度密集的安全层(Safety Layers)把正常逻辑给过滤了。
3️⃣ OpenAI Codex 对比:同样的工作流切回 OpenAI Codex API,它能完美处理示例文章并产出一致的成果。

谷歌目前的股价飙升无法掩盖其产品性价比的暴跌。相较于3个月前的体验,Gemini 3.5/3.1 Flash 退化严重。

👇 各位工程师,你们在实际自动化场景中踩雷了吗?

#Gemini35Flash #OpenAICodex #AIAgent #LLM #GoogleAI #TechTruth

u/BullBullGo — 1 day ago

Anyone else feels Gemini 3.5 pro is getting heavily nerfed lately?

I’m paying $20/month for Gemini 3.5 pro.
Back then I could run prompts for 2–3 hours continuously without issues.
Now?
Less than ONE hour and it tells me:
“Come back in 2 days.”
Seriously?
Feels like Google is silently reducing token limits / compute access while keeping the same subscription price.
Is anyone else noticing:
worse reasoning?
lower context performance?
more aggressive throttling?
shorter usage windows?
At this point I’m seriously considering switching back to ChatGPT or Claud code

u/BullBullGo — 14 days ago