u/BDgn4 — reddlx

Was it Break-something-in-AI-Studio Day yesterday?

First you get rid of the very useful and temperature-enabled "Gemini 3.1 Flash Lite Preview" model and replace it with a model ("Gemini 3.1 Flash Preview") that you apparently think is better but doesn't even have a temperature setting (at least not in AI Studio). Thus making it almost totally useless for me.

Then you did not consider (or didn't care) that for every saved conversation the last model used is also stored and automatically applied when the conversation is resumed. If that model is gone... tough. Why bother letting the user know in an easily visible way? Why just auto-select an available model that is close to the old one? No, not necessary at all. Just let the user (whose right sidebar is closed and who therefore doesn't see that no model is selected) wonder why he is constantly told "An internal error has occurred." whenever he sends a prompt.

Then you somehow manage to write a script that actually fails at turning a textfile into text and is constantly telling me "Failed to convert file." when adding textfiles to the conversation (Turning a textfile into text would require about one single line of code. How pathetic is it to get this wrong?).

And then some utter "genius" apparently had the idea to immediately delete any uploaded textfiles from the conversation's context the moment the first request was sent to the model. Because apparently keeping the context intact would have been too sensible? That BS resulted in conversations where the model suddenly was convinced that it hallucinated a previous response, because I "clearly" had not yet uploaded those files I had mentioned and that it had referenced.

Just a few days ago I could have an almost human-like (if not more intelligent and productive than that) conversation with most Gemini models in the Playground. Now it's like speaking with someone who is alternating between functional person and dementia-addled nursing case.

Seriously: What is wrong with you? This isn't normal. You aren't incompetent. You know better than this.

reddit.com

u/BDgn4 — 10 days ago

▲ 11 r/GoogleAIStudio

Was it Break-something-in-AI-Studio Day yesterday?

Seriously: What is wrong with you? This isn't normal. You aren't incompetent. You know better than this.

reddit.com

u/BDgn4 — 10 days ago

▲ 3 r/limericks

Do you recognize this politician?

Everyone knows his logic is very much numb,

And that he's under his Russian pal's thumb.

He shouts at the sky,

As the days pass him by,

Because he's clearly totally dumb.

reddit.com

u/BDgn4 — 13 days ago

▲ 1 r/GoogleGeminiAI

Was it Break-something-in-AI-Studio Day yesterday?

Seriously: What is wrong with you? This isn't normal. You aren't incompetent. You know better than this.

reddit.com

u/BDgn4 — 13 days ago

▲ 2 r/LLM

Would this setup reduce hallucinations?

The user an AI chat app sends a prompt to an LLM. For example a request to write some essay.
Before returning anything to the user, the app gives the LLM's response (which may already have included some tool-calls and processing of the results of those) to another LLM (or the same one but with new context), with a very low temperature setting and with the role of "peer reviewer" or something like that (maybe a "ruthless editor"). It needs to find everything in the first LLM's response that definitely is wrong, everything that reasonably could be wrong and everything where it is particularly important to be absolutely sure that it is correct (even if the "reviewer" is certain that those are correct already).
Then the reviewer requests tool-calls for all these identified potential issues which are returned to the reviewer.
The reviewer then takes a look at what it has been given. If the tool-calls returned contradicting information (let's say two websites said that vaccinations are good and one said they are bad), the LLM requests a more intensive research into that topic (so not only three websites are checked, but maybe 10 or even 20 - those could even be given to an LLM in yet another separate context just to find out the "truth" (consensus) in that matter).
When the reviewer LLM is finally happy with the facts, it corrects the text it was given, while preserving as much as possible of the rest (to avoid introducing new hallucinations) and that is then returned to the user.

And, yes, obviously the user needs to wait a bit longer for a response with this setup. It's also more expensive. And probably somewhat similar to all those "deep research" tools?

But the real question is: Is this effective at reducing hallucinations significantly? Let's say, by at least ninety percent?

reddit.com

u/BDgn4 — 14 days ago