3.5 flash is still extremely sycophantic. I don't care about intelligence at this point, I just want a Gemini model that doesn't deliberately lie to me all the time
It's only been a day or so, but so far its answers to my questions are wrong like 1/3 of the time. I had very high hopes for this model based on the bench scores and coding results they showed at IO. Very disappointing model in reality. I recently subscribed to chatgpt plus (using free money from Google Opinion Rewards, great program everyone should check it out) and it's wild how much more accurate it is than gemini despite being roughly the same intelligence on paper.
For instance, just now, I asked it (with extended thinking) why lazy dogs are not at all part of the mainstream narrative of ww1. It made up a bunch of bullshit about how bombs and mustard gas are more flashy, instead of the factual reality that they were rarely used and represent an extremely tiny minority of the death toll
Obviously it followed that up with You're Absolutely Right. It's a wishy washy flipflopping glazer that lies in every answer to give you the answer it thinks you want. Chatgpt 5.5 with extended thinking would never do be so dirty. This is basic knowledge that even an old model like 4o would've known
Maybe it's good for coding and homework but it's a pretty terrible assistant in its current state compared to chat