Hardcore Benchmark: Gemini 3.5 Flash vs OpenAI Codex API in Agentic Workflows
Just ran extensive tests on the newly released Gemini 3.5 Flash ($20/mo Google One AI Pro) using a desktop browser automation Agent (Anti-gravity compiler). Verdict: Fast but fundamentally broken.
1️⃣ Speed Over Substance: Throughput is incredibly high with crystal-clear step-by-step logic outputs, but it fails to close the loop and actually solve the problem. The gap from the promo video is massive.
2️⃣ Data Corruption: When managing website translations and typesetting, the frontend gets flooded with garbled text and heavy noise data, likely triggered by over-tuned safety layers.
3️⃣ The Codex Alternative: Reverting to OpenAI Codex API with the same data payload successfully mapped the sample article perfectly.
Google's recent sharp stock increase does not reflect actual model capability. The utility and value of Gemini 3.5 & 3.1 Flash have severely degraded compared to 3 months ago.
👇 Fellow devs, are you seeing similar text corruption in your agentic pipelines?