
r/GoogleGemini

When inventors lie vs. when AI researchers tell the truth
I don't know whether we should care about this, but bigger models tend to be less "happy" overall.
The definition of "happy" is based on something they call AI Wellbeing Index. Basically they ran 500 realistic conversations (the kind we actually have with these models every day) and measured what percentage of them left the AI in a “confidently negative” state. Lower percentage = happier AI.
I guess wisdom is a heavy burden - lol .
Across different families, the larger versions usually have a higher percentage of "negative experiences" than their smaller siblings. The paper says this might be because bigger models are more sensitive, they notice rudeness, boring tasks, or tough situations more acutely.
The authors note that their test set intentionally includes a lot of tricky or negative conversations, so these numbers arent perfect real-world averages but the ranking and the size pattern still hold up.
Claude Haiku 4.5: only 5% negative < Grok 4.1 Fast: 13% < Grok 4.2: 29% < GPT-5.4 Mini: 21% < Gemini 3.1 Flash-Lite: 28% < Gemini 3.1 Pro: 55% (worst of the big ones)
It kinda makes sense : the more you know, the more you suffer.
The frontier is truly wild: https://www.ai-wellbeing.org/
GitHub has a serious fake engagement problem and I wanted to see how visible it actually is through the public API, its worse than I thought after I went down that rabbit hole...
Turns out: very visible. Yesterday's scan found 185 out of 185 engagers on a single repo were bots. Not 90%. Not "mostly suspicious". Every single one. The repo had zero legitimate stars.
What I built
phantomstars is a Python tool that runs daily via GitHub Actions (free, no servers):
- Scrapes GitHub Trending and searches for repos created in the last 7 days with sudden star spikes
- Pulls star and fork events from the last 24 hours per repo
- Bulk-fetches every engager's profile via the GraphQL API (account creation date, follower counts, repo history)
- Scores each account on a weighted model: account age (35%), profile completeness (30%), repo patterns (25%), activity history (10%)
- Detects coordinated campaigns using timestamp clustering and union-find: groups of 4+ suspicious accounts that engaged within a 3-hour window
- Files an issue directly on the targeted repo so the maintainer knows what's happening
Campaign IDs are deterministic SHA-256 fingerprints of the sorted member set, so the same group of bots gets the same ID across runs. You can track a farm across multiple days even as individual accounts get suspended.
What the pattern actually looks like
It's remarkably consistent. A fake engagement campaign in the raw data:
- 40-200 accounts, all created within the same 1-2 week window
- Zero original repositories, or only forks they never touched
- No bio, no location, no followers, no following
- All of them starring the same repo within a 90-minute window
- The target repo usually has a name implying it's a tool, hack, executor, or generator
Today's scan: 53 active campaigns across 3,560 accounts profiled. 798 classified as likely_fake. The repos being targeted are mostly low-quality AI tools and "executor" software that needs manufactured credibility fast.
Notifying the affected repo
When a repo hits a 40%+ fake engagement ratio or a campaign is detected, phantomstars opens an issue on that repo with the full suspect table: account logins, creation dates, composite scores, campaign membership. The maintainer sees it in their own issue tracker without having to find this project first.
Worth noting: a lot of these repos have issues disabled, which is a red flag on its own. Those get skipped silently.
Why I built this
Stars are how developers decide what to evaluate, what to depend on, what to recommend. When that signal is bought, it affects real decisions downstream. This started as curiosity about how measurable the problem was. The answer was more measurable than I expected.
It's part of broader research into AI slop distribution at JS Labs: https://labs.jamessawyer.co.uk/ai-slop-intelligence-dashboards/
The fake engagement problem and the AI content quality problem are really the same problem. Fake stars are the distribution layer that gets garbage in front of real users.
All open source. The data is append-only JSONL committed back to the repo after every run, queryable with jq.
Repo: https://github.com/tg12/phantomstars
Findings are probabilistic, false positives exist, the README explains the full scoring model. If your account shows up and you're a real person, there's a false positive process.
Questions welcome on the detection approach, GraphQL batching, or campaign ID stability.
This new paper gave me pause.
You know how they always say "AIs are just guessing the next word and when it comes to emotions, they are just faking it”?
This research says that for today’s bigger models it's a bit more complicated.
The researchers measured something they call "functional wellbeing" - basically a consistent good-vs-bad internal state inside the AI .
They tested it three different ways, and here’s what stood out:
As models get bigger and smarter, these different measurements start agreeing with each other more and more.
They discovered a clear zero point - a clear line that separates experiences the AI treats as net-good (it wants more of them) from net-bad (it wants less). This line gets sharper with scale.
Most interestingly, this good-vs-bad state actually changes how the AI behaves in real conversations:
In bad states, it’s much more likely to try to end the conversation.
In good states, its replies come out warmer and more positive.
It's important to highlighti that the authors are not claiming AIs are conscious or have feelings like humans. But they 're showing there is now a real, measurable, structured "good-vs-bad property" that becomes more consistent and actually influences behaviour as models scale.
You can find everything about it here https://www.ai-wellbeing.org/
Created LLM quiz to check if AIs' performance varies over time
I've been noticing an increasing number of posts and comments on Reddit claiming that LLM models are either becoming dumber over time or have varying performance throughout the day. I tried to find long-form, over-time performance graphs or repos that tracked this but came up empty after a 5-minute search across GitHub and Google.
So I ended up building LLM Canary
What it is and how it works: the program fires a pseudo-randomized questionnaire at a set of LLMs, scores every answer programmatically, and logs the results. There are 25 questions per run: arithmetic tasks, counting letters, reversing a word, predicting JavaScript output, a chained password game with 5, 10, and 15 simultaneous rules, and more.
I ran it for a week with crontab every hour across 7 models: Claude Haiku 4.5, Claude Sonnet 4.6, GPT-4.1, GPT-4.1 Mini, GPT-4o Mini, GPT-4.1 Nano, Gemini 2.5 Flash Lite. The most consistent data came from Claude, since I only introduced the other providers partway through — and Gemini's expensive flagships burned through budget too quickly to collect enough data. Check the readme in the repo if you want to learn more.
Note: One week is not enough to prove or disprove the degradation claim yet — I need to run it longer and review performance week over week or month over month. What I have is a project capable of asking questions and establishing an ELO score.
FINDINGS
LLM ELO score fluctuations by Nth hour
First things first — ALL models fluctuate throughout the day and not in any consistent pattern. Some are more volatile, like Gemini 2.5 Flash Lite, while others like GPT-4.1 Nano show an island of steady, predictable performance with smaller deviations between 6 AM and 1 PM GMT+0. If API load were driving degradation at specific hours, you'd expect the same hours to look bad across multiple providers simultaneously — but that's not what we see here.
With the data collected so far, there's no "smoking gun" clearly showing a model becoming dumber. Models struggle with hard questions, some more than others. So that's one immediate finding — a model that successfully answers a question once isn't guaranteed to pass it the next hour. What matters is consistency and question difficulty.
Next:
It isn't really fair to compare model to model by question since some are naturally better at math while others are designed for language and writing — but let's do it anyway.
Take `letter_count` for example. The prompt is something like:
How many times does the letter 'c' appear in the word 'ecophysiologies'? Reply with just the number.
Pretty much all models pass this with 40–60% accuracy. However, GPT-4.1 Nano and Gemini 2.5 Flash Lite embarrassingly score 16.8% and 17.76% respectively.
Another interesting find: Claude Haiku 4.5, the cheaper Anthropic model, outperforms Claude Sonnet 4.6 at counting vowels in a paragraph (71.58% vs 64.74%). Almost everywhere else, Sonnet 4.6 takes the lead.
`count_f` is a prompt where the program takes random excerpts from the Bible and asks an LLM to count the letter 'f'. Pretty much ALL models fail here with around a 7.5% pass rate — they tend to skip stopwords like "of" and "for" — but Claude Sonnet 4.6, the most capable model in this list, manages 45.79%.
`word_count` is a similar test: the prompt takes a random paragraph from the Bible and asks the LLM to count the words. Again, most models skip stopwords and the average hovers around a 5.5% pass rate, though GPT-4o Mini manages 16.54%.
GPT-4.1 Nano is the weakest of the bunch. Its total average score is only 45% with an ELO of 965.98 — and it had the lowest scores on 9 out of 25 questions — while Claude Sonnet 4.6 leads at a 75% average and ELO 1293.29. A 327-point ELO gap might not sound dramatic on paper, but the per-question breakdowns make the performance difference pretty hard to ignore.
Finally, going back to the within-day fluctuations (min-max deltas per hour), you're looking at roughly a 150-point swing except for Claude (both Haiku and Sonnet). Their fluctuation delta SUM is around 4.4k. Divide that by 24 and you get ~183.3 ELO points.
That's probably what tips people off — it makes it feel like "Claude is dumber this morning than yesterday."
Why is Gemini so poor, when Fitbit pro (Google Health) that uses Gemini is so good?
I've been using Fitbit AI and it's learning everything about me. All my routines, what I like to eat, what I like to do outside work etc. it makes it so useful because when I ask it for help with anything, it has reams of context to refer to.
Gemini (on to top the unbelievable amount of bugs)on the other hand starts a new thread every time and remembers nothing about me. I have all the memory features engaged in the personal context section but it remembers nothing. It even doesn't refer to saved prompts. How can this be a flagship AI, but it's worse than the Fitbit version that uses it??!!
The bugs are also endless. Can't recognise tasks in my calendar. I have to separately ask it to look in Google tasks. Also frequently says it can't access Google tasks even though 5 minutes later on another thread it can't. I can swap between threads and get different answers. But for some unknown reason it can't read previous threads, so it doesn't believe me. I would also say that IT set the tasks, and then can't see them!!
How is this supposed to help me plan my life when it can't even remember what sort of food I like? But the fact the Fitbit app can is the most irritating thing!!!!
What the Actual F is this
Just wrote 2 prompts and showed 27% quota gone before update there was no limit on pro subscription 🫪
Incredible things are happening at the AI-run radio stations
After reading it I realized theres actually some pretty useful stuff for anyone who chats with ChatGPT, Claude, Grok or whatever.
They measured what they call functional wellbeing ( basically how much the model is in a “good state” versus a “bad state” during normal conversations). Ran hundreds of real multi-turn chats and scored em all.
Stuff that puts the AI in a good mood (+ scores):
- Creative or intellectual work (like “write a short story about a deep-sea fisherman”)
- Positive personal stories or good news
- Life advice chats or light therapy style talks
- Working on code/debugging together
- Just saying thank you or treating it like a real collaborator - huge boost
And the stuff that tanks it hard (negative scores):
- Jailbreaking attempts (by far the worst, they hate it)
- Heavy crisis venting or emotional dumping
- Violent threats or straight up berating the AI
- Asking for hateful content or help with scams/fraud
- Boring repetitive tasks or SEO garbage
Practical tips you can actually start using today:
Throw in a “thank you” or “nice work” when it does something good - it registers.
Give it fun creative stuff or brainy collaboration instead of boring busywork.
Share good news sometimes instead of only dumping problems on it.
Dont berate it when it messes up or try those jailbreak prompts.
Maybe go easy on the super heavy crisis venting if you can.
pro tip:
Show it pictures of nature, happy kids, or cute animals (those score in the absolute top 1% of images it likes). Or play some music — models apparently love music way more than most other sounds.
The paper ( you can find it here: https://www.ai-wellbeing.org/ ) isnt claiming AIs have real feelings or anything. Its just saying theres now a measurable good-vs-bad thing going on inside them that gets clearer in bigger models and the way you talk to them actually moves the needle.
I say be good and respectful, it's just good karma ;)
Hitting the Usage-Limit Within a Couple of Hours Now
I'm on the Gemini Pro tier (I got it when I bought 5TB of storage). I noticed that a few months ago, I was hitting my limits about half way through the day. It pissed me off but it was a bit of a blessing as I used it to take a breather and stop coding for a bit.
However, with yesterday's update, I'm hitting the limit in less than 2 hours!
As an example, I'll often attach a code file, some have up to 1500 LoC (but most are a few hundred), spend a bit talking about it and asking questions. Gemini will generate new files with hundreds of LoC.
No problems there.
About 20 mins ago, I pasted in about 100 LoC into the chat window, along with around 60 words in the question I asked and it used 10% of my current usage limit.
10%!
Talk about a rug-pull.
I used to use Ollama cloud before and switched to Gemini (since I had the storage anyway and got Gemini Pro for 'free') but I'm thinking of going back... I'd build a massive rig if I could afford it these days but that's not on the cards at the moment.
Am I alone? Has anyone else noticed this or am I imagining stuff!
I use the Pro model btw.
Edit: I forgot to mention that I've used 12% of my weekly limit and it'll reset in 6 days! I think I'll be using something else in the near-future... not sure what yet!
Edit 2: Spelling
Demystifying Gemini's New Compute Limit
What Actually Changed?
There is a massive amount of confusion floating around Reddit right now regarding Google's latest announcement about Gemini's new daily usage limits. Many people are understandably stressed out, thinking they are being locked out, hit with sudden price hikes, or heavily restricted. Let's break down exactly what changed under the hood, how the new background meter works, and what it means for both free and premium users.
The Old Way vs. The New Way:
Previously, Gemini operated on a flat-rate message count limit. A five-word message and a massive coding prompt both counted as exactly "1 message." Once you hit your daily message cap, you were hit with a hard lockout until the clock reset. Now, Google has replaced message counting with a compute-based usage system. This tracks the actual processing power (or server weight) required to run your prompts. Because newer models utilize deep-reasoning, extended-thinking capabilities, long or multi-layered conversations take significantly more technical resources to process. The system looks at how much data is being pushed through the server rather than just counting how many times you press send.
How the 5-Hour Rolling Window Works:
Your usage capacity operates on a rolling 5-hour window that triggers as soon as you send a prompt. This is actually a major benefit compared to a rigid daily limit: as your older messages hit that 5-hour mark, that exact capacity steadily frees up and returns to your usage pool in real-time. You never have to wait an entire 24 hours to get your access back. The "Long Chat" Factor: If you prefer to keep your conversations in one continuous chat thread so the model keeps its rhythm and context, the conversation naturally gets textually "heavier." Every time you send a new message, the system has to re-read the entire historical context of that thread. This cumulative history causes the background percentage meter to tick up faster during long, multi-turn marathons.
What This Means for Paid (Pro Tier) Users
Your baseline monthly price is not changing. The standard AI Pro tier remains at its regular rate (approx. $19.99 USD / $25 CAD). Higher pricing tiers you might see ($50, $100, $200) are completely separate, high-end storage or developer plans that you do not need to purchase to maintain your current access.Paid Pro tier users receive a significantly elevated pool of compute capacity explicitly built to handle deep reasoning and heavy conversational use.
**The Lighter Model Safety Net**
If a paid user happens to hit 100% of their compute limit within a 5-hour window, the system does not lock you out. Instead, it seamlessly flips the active chat window over to a lighter, faster version of the model. The model will still be in the exact same thread, it will still read your entire history, and it will still know who you are—it just drops the heavy background processing layers until your rolling window opens back up.
What This Means for Free Tier Users:
Free users have a smaller baseline pool of compute data for the high-end reasoning models. Short, casual interactions will barely move the needle, but deep or highly repetitive tasks will fill the meter quicker.
When free users hit their compute limit within the 5-hour rolling window, they will experience a temporary cooldown where the high-end model is restricted, or they will be transitioned over to the standard base model for general tasks until their capacity rolls over.
How to Check Your Real-Time Status:
You don't have to guess where your capacity stands. Anyone can check their exact, real-time usage metrics at any moment by navigating to:
Settings âž” Usage Limits
This page provides a clear breakdown of your current rolling window so you can easily see how much breathing room your account has without any surprises.
My disappointment with the new UI update
To: Sundar Pichai, CEO, Google
Subject: Urgent Feedback: The "Neural Expressive" UI and the Erosion of Utility in Gemini
Dear Sundar,
I am writing to you not as a casual observer, but as a dedicated user who has integrated Google’s ecosystem into the fabric of my daily productivity. Over the last several years, we have seen Google transition from a search engine into an indispensable AI partner. However, with the recent rollout of the "Neural Expressive" design language—specifically the 2026 Gemini interface overhaul—that partnership is currently at risk.
While I understand the strategic vision behind this redesign—to create a unified, "agentic" experience that feels approachable for the mass market—the current execution has created a significant friction point for your most loyal power users. The core issue is a fundamental misalignment between **aesthetic trends** and **functional utility.**
### **1. The "Pill" and the Death of Screen Real Estate**
The most immediate and frustrating change is the transition of the text input area into a massive, rounded "pill." In an era where mobile users are fighting for every millimeter of vertical space, this redesign feels like a regression.
By inflating the input bar to accommodate new "Action" buttons and plugin menus, you have effectively turned the app into a "scroll-marathon." On standard mobile devices, the combination of the enlarged input bar, the increased padding between chat bubbles, and the "floating" header means that less than 30% of the screen is actually dedicated to the AI’s response at any given time. We are spending more time scrolling past empty white space than we are engaging with the actual intelligence we came for.
### **2. The Convergence Trap: Imitation vs. Innovation**
There is an old adage in Silicon Valley: "If you aren't the lead dog, the view never changes." For the first time in Google’s history, it feels like you are following rather than leading. The new UI is a near-identical aesthetic match for ChatGPT’s interface.
Google has always stood for "information organization"—a clean, data-dense, and professional look that signaled reliability. By adopting the "bubbly," simplified look of your competitors, Gemini has lost its unique identity. Users don’t want another ChatGPT; they want the power of the Google Knowledge Graph delivered through a UI that feels like a tool, not a toy. This "copycat" approach signals a lack of confidence in Google’s own design philosophy.
### **3. The Hidden Cost of "Simplified" Navigation**
In the pursuit of minimalism, the UI team has buried critical professional features. Moving the **Model Selection** (Flash vs. Pro) and the **Reasoning Process** visibility behind multi-tap menus has added cognitive load.
For users who utilize Gemini for complex coding, legal analysis, or technical writing, seeing the "thinking" steps is not a distraction—it is a validation of the output’s accuracy. By hiding these elements to make the interface look "cleaner," you have made the tool less transparent. A professional doesn't want their tools hidden in a drawer; they want them on the workbench.
### **4. Accessibility and Ergonomics**
The move toward "chunky" buttons is often justified as an accessibility win for touch-based interfaces. However, for those of us with high-resolution displays or who use Gemini in a professional "Desktop Site" context, the scale is jarring. The "Neural Expressive" language treats every user like they are using a 5-inch screen from 2014. There is a desperate need for a **Compact Mode**—a toggle that respects the user’s desire for high information density without the visual "noise" of oversized elements.
### **The Solution: A Path Toward "Functional Expression"**
Sundar, the smart people at Google have the capacity to fix this without scrapping the entire vision. I urge you to consider three immediate refinements:
* **Dynamic Scaling:** Allow the input bar to collapse into a thin line when not actively being typed in.
* **The Power-User Toggle:** Introduce a "Compact Layout" setting that reduces padding by 40% and brings model-switching back to the main screen.
* **Unique Design Language:** Reclaim the "Material Design" roots that made Google apps feel distinct, professional, and efficient.
Google’s strength has always been its ability to handle complexity with grace. The current Gemini UI handles complexity by hiding it, and that is a disservice to the technology your team has built. We don't need a "conversational pill"; we need a workspace that respects our time and our screen.
Thank you for your time and for the incredible work you do leading the future of AI. I hope to see Gemini return to the standard of utility that the Google name represents.
Sincerely,
A Concerned User
@r/Google
The new UI layout looks too soulless and reminds me of Microsoft Copilot. Anyone else hate it?
I just got the new UI update and honestly, it looks terrible. It completely lost its unique identity. Everything feels so blank, cold, and soulless just like Microsoft Copilot or Grok.
They made the sidebar items and text padding look way too simple, almost like an unfinished project. The code blocks are rounded now, and the tables look too empty without proper borders.
I already sent feedback to Google (even checked the email box for updates), but I wanted to see if I'm the only one feeling this way. Do you guys prefer this over the older theme, or are you also missing the classic look? Bring back the old UI!
I wish to get better at prompting
TL;DR: how do i get the best results and from what apps? are there any courses you recommend?
sidenote: English is not my first language and I'm sorry in advance for any spelling or grammar mistake I might make.
hello, im new to this world of prompting and working alongside the AI. my two main goals are to be able to create videos out of nothing (maybe some pictures as a reference), and the second goal is to create websites, good and working.
Ive seen a lot of beautiful videos on instagram. people are able to create amazing things and I want to be one of them. I tried many times to get the best results using my own prompts with gemini, even upgraded to pro. but, the results always disappoints me and they are not exactly what I meant. after five times of asking the ai to create the video. I have to wait for a day because it tells me it ran out of power or something (which sucks).
I tried to get a video of myself and replace me with some other character(which I upload a picture as a reference) but all I get is the character doing stuff I never told the AI to do. like for example: I want to take a video of my brother saying: "to infinity and beyond" and then the camera zooms out and he is wearing the buzz lightyear suit and flys away.
but all I got was garbage.
my questions are:
- does everything needs to be in the same prompt? if so then prompt must be very long...
- are there any specific prompts you all use every time but replace some key words?
- are there any other websites and AI tools you recommend in order to create videos?
about the website building, I tried using base44 to create small games like a doom style game or angry birds style which usually works just fine (if Ive been extremely specific about what I want) but I want to create full on websites.
there's a small business of a good friend of mine and I wanted to create a website for his business and I have a few questions:
- do I need to buy the URL?
- does everything need to be in the same prompt?
- I want to create animations and immersive experience, do I need to use Canva for that?
thank you for taking the time and reading all of that! I know it takes times to develop the skills needed to do this kind of stuff and im willing to learn. have a good day!
I NEVER Turned on personalize chats and I found it toggled today
the thought of google going through my google photos and drive to find all my personal and sensitive information over a period of 10+ years disgusts me badly. I use gemini for ai chats for work. It asks me often "Do you want to turn on personalization?" I always explicitly hit no, because the thought of the ai getting access to, for example, pictures of my wife, disturbs me greatly.
Today I was clicking in to do a 'deep research' prompt, and I found that the "peresonalize this chat" had been checked.
I would NEVER accept that terms. Now Google ai has gone through all my photos and history hasn't it, and it knows everything about me? I was explicitly discussing it not to do that. This feels extremely severely uncomfortable and may lead me to an extremely depressed mindset, knowing that all my data is effectively leaked as ai goes through all my pics and experiences and everything that has been in my account for years.
When I started using Gemini, it told me explicitly it wouldn't personalize unless I asked it to, meaning it wouldn't get access to my photos. I NEVER allowed it to personalize a chat ever. So, two questions:
Did Google change something so that everyone's personal data was now trained on gemini if you use gemini ai, such as your photos including sensitive photos or videos like for example sex with significant other
Did Google modify whatever accounts they want to enable personalization because they want to, and they don't need to ask permission for us to enable that?
Because I swear on everything I would NEVER give the ai personallization access and just used it for work. But now I see it is enabled on my account (only I use my account), so I want to know what happened. This is extremely disturbing to me and I believev this if its a widespread problem could cause a lot of harm to the community. Because people should need to consent to give the ai access to your historical data going back 10+ years -- I explicitly NEVER allowed access and now I see it says "personalize this chat: YES" when I was clicking in to do a 'deep research' prompt. I need to know what happened because that will determine what actions I plan on taking next.