u/AccomplishedIce9767

Task:

I uploaded the relevant files into context
I gave explicit migration instructions
I asked it to migrate a React codebase from Material UI v5 to Material UI v6
I specified exactly how components should be rewritten
I told it to preserve existing interfaces and types

What happened:

It ignored parts of the instructions
It rewrote unrelated code
It changed interfaces that had absolutely nothing to do with the migration
It introduced type inconsistencies
It mixed old and new MUI patterns together
Some files looked like they were generated from pure guesswork

The craziest part is this wasn’t even an architecture task. It wasn’t asking for system design, business logic, or complex reasoning. This was mostly structured framework migration work with provided context.

I’m honestly amazed at how unreliable these “AI coding agents” still are once you move beyond toy demos.

People keep showing benchmark charts and cherry-picked examples, but in a real codebase:

preserving conventions matters
not touching unrelated interfaces matters
following scoped instructions matters
consistency across files matters

If an agent can’t safely execute a constrained migration with the exact files already provided, how are we pretending these tools are replacing engineers anytime soon?

Right now they feel more like:
“generate 80% of the work and spend 200% of the time reviewing the damage.”

Anyone else having similar experiences with Gemini Code Agent or other autonomous coding tools?

I just gave Gemini Code Agent one of the easiest refactor tasks imaginable and it completely imploded.