u/Successful_List2882

" Perplexity Pro just cut advanced model messages to roughly 10 a day before falling back to their default model.

Early 2024, Perplexity Pro was genuinely ahead. It was one of the first platforms with usable internet search built in, it gave access to top-tier models from multiple companies, and it did not impose the aggressive usage caps that made other subscriptions feel like demos. For a certain type of user, the research-heavy, tab-switching, daily-driver type, nothing else came close on price-to-performance.
That gap no longer exists.
A few days ago Perplexity tightened their usage limits significantly. Pro subscribers are now hitting the ceiling on advanced models at roughly 10 messages a day before the platform falls back to Sonar, their default model. For anyone using the platform for actual research rather than casual queries, the fallback is not a reasonable substitute.
The product experience has deteriorated alongside the limits. Popups pushing annual plan upgrades appear on reload after being dismissed. Nudges to upgrade to the Max tier surface during normal advanced search usage. New features announced with genuine excitement turn out to be locked behind Perplexity Computer, a hardware product at a price point that makes Claude Code look cheap by comparison.
The core problem is not the limits themselves. It is that Perplexity built its entire identity around being the platform that did not do this.
The counterargument worth taking seriously is that running frontier model inference at scale is genuinely expensive, and the original Pro pricing was probably subsidized. Sustainable unit economics require either higher prices or tighter limits. That is not a conspiracy, it is math.
But the competitive landscape has changed completely in the same period. Integrated web search is now a standard feature across Claude, ChatGPT, Gemini, and most serious alternatives. Deep research tools ship on multiple platforms. The thing that made Perplexity worth paying for in early 2024 is now a baseline expectation, not a differentiator.
Paying for access to models that other platforms also offer, behind stricter limits than those platforms impose, while being constantly upsold toward a hardware product, is a different subscription than the one that existed twelve months ago.
So the question that splits the room: is Perplexity repricing toward enterprise because that is where the sustainable business actually is, or did they burn their most loyal early users to chase a market they are not positioned to win?

reddit.com
u/Successful_List2882 — 1 day ago

A builder with 40+ AI agent projects says most founders don't need an agent. A telehealth client wanted an "autonomous AI receptionist.

Every week a founder books a call asking for an AI agent. Every week most of them get told they do not need one.

Forty-something projects in, the pattern is predictable before the Zoom starts. They saw a Loom video of someone's autonomous sales agent closing deals while they sleep. They read the LinkedIn post about the AI employee running an entire ops team. They have already told their board they are building one.

Then within fifteen minutes the conversation becomes an explanation of why what they actually need is an internal automation with one LLM call in the middle.

Three examples from the last six months tell the whole story.

A telehealth founder wanted an autonomous AI receptionist that handles everything. The actual need was a workflow that reads intake forms and routes them to the right clinician. Shipped in six weeks. Saves clinicians four hours a day. She paid again last month.

A fintech client wanted a fully agentic finance copilot. What they needed was a script that reconciles ACH discrepancies before they hit the dispute queue. One model call, the rest plain code. Saved them a full ops hire.

A medspa chain wanted AI marketing automation. What they needed was a job that watches their booking system for no-show patterns and triggers a recovery message. Three steps. No agent. Booked 14% more revenue last quarter.

None of these are agents. They are automations. And every one outperforms the agent the founder originally asked for, because the agent would have hallucinated something in week three and burned the client's trust forever.

The reason agents keep failing in production is structural. A good automation has one decision per step and a clear rule at each branch. An agent gets handed a goal and told to figure it out.

In regulated industries the problem is compounding. HIPAA and SOC 2 reviewers want to know exactly what a system does, in what order, every time. An automation passes that conversation in 20 minutes. An agent turns it into a six-month compliance nightmare.

Half the current pipeline is founders who paid $50k for an agent build that bleeds tokens, cannot be audited, and collapses on edge cases. Rebuilt as automations, they start making money.

So the line worth arguing: is the agent narrative being pushed by people genuinely solving hard problems, or mostly by people who need the label because automations do not trend?

u/Successful_List2882 — 3 days ago

Spent 1 Saturday afternoon fixing a florist's spreadsheet. Three months later her shop had $18k in repeat orders she would have missed completely.

The notebook was the whole problem. A flower shop owner had years of client names, wedding dates, anniversary orders, all of it written down by hand. No follow-up system. No reminders. Customers would order once, love the flowers, then simply forget the shop existed.
The fix took one afternoon and was genuinely not impressive.
An automated email sequence connected to her existing contact list. It pulls upcoming anniversaries and birthdays and sends a personalized reminder before the date. No discount codes. No promotional language. Just a timely nudge from a shop the customer already trusted. The harder part was cleaning up the spreadsheet, which had years of inconsistent formatting that took longer to sort out than the actual build.
Three months later: roughly $18,000 in repeat orders that would not have happened otherwise.
The automation did not find new customers. It just stopped losing the ones already won.
That distinction matters more than it sounds. Most small businesses spend the majority of their energy on acquisition and almost none on retention. The florist had done the hardest part, which is earning trust on someone's wedding day, and then handed those customers to competitors by simply not following up.
The honest version of this story is that $18k across three months for a four-person shop is meaningful but not transformational. It does not fix rent. It does not replace a slow season. And the same approach fails completely if the underlying product is not already good, because automated outreach to unhappy customers just accelerates churn.
What it does prove is that the gap between a small business with a system and one without is sometimes just one Saturday afternoon and a willingness to clean up a messy spreadsheet.
Most of these shops have the customers. They have the data. They have the goodwill. They just have no mechanism for converting any of it into a second order.
So the question worth arguing about: is the real opportunity here the automation layer sitting above thousands of small businesses that have never touched it, or is this the kind of thing that only works once before every florist in town is sending the same anniversary email?

reddit.com
u/Successful_List2882 — 4 days ago

Nvidia's VP of Applied Deep Learning says compute costs at his team exceed employee costs. An MIT study found AI is only cheaper than humans in 23% of roles.

Bryan Catanzaro, Nvidia's Vice President of Applied Deep Learning, said it plainly: at his team, the cost of compute is far beyond the costs of the employees. This is an Nvidia executive. The company that sells the chips powering the AI boom just told you AI is more expensive than the people it is supposed to replace. That sentence deserves to sit for a moment before moving on to the $740 billion figure. Big Tech has announced $740 billion in capital expenditures so far this year. That is a 69% increase from 2025. Meta is planning to cut roughly 8,000 jobs. Microsoft has rolled out voluntary buyouts. The dominant narrative in every earnings call is that AI is coming for white-collar work and the math makes it inevitable. The MIT study from 2024 runs directly against that narrative. Researchers found AI automation is economically viable in only 23% of roles where computer vision is central to the task. In the remaining 77%, it is still cheaper to pay a human being to do the work. The companies announcing layoffs and the companies announcing AI investment are sometimes the same companies, and neither number proves the other. Uber's CTO has already said their AI budgets have been blown away. Hardware costs, energy costs, inference costs at scale: these are not rounding errors. They are the reason Catanzaro's team pays more for compute than for the people running it. The honest read on all of this is a timing mismatch. Infrastructure will get cheaper. Model efficiency will improve. Pricing models will shift. The economic argument for broad AI replacement of human workers may eventually hold. It just does not hold right now, and the people most loudly arguing it does are often the ones selling the infrastructure required to get there. That gap between the narrative and the current economics is where most of the confusion lives. So the line that actually splits the room: is the $740 billion in capex a bet that the economics flip within five years, or is it a bubble that requires the economics to flip just to justify the spend that already happened?

u/Successful_List2882 — 5 days ago

A Fortune 500 client paid $40k for an agent that reads PDFs. It took 3 days to build and 6 months to make bulletproof.

The agent itself was 200 lines of code wrapped around Claude. The $40k was not for the 200 lines. Someone has been shipping AI agents for Fortune 500 companies for two years. The number they lead with is not accuracy rates or model benchmarks. It is a 2:17am Slack message from a client whose agent stopped working because DeepSeek quietly changed their response format and broke the parsing logic overnight. That is the actual job. The compliance form agent that generated the $40k engagement took three days to build. The remaining six months went to retry logic for API rate limits hitting at 3am, handling corrupted PDFs that crashed the parser in unpredictable ways, and building a dashboard so Karen from operations could see exactly why form 47821 got stuck in processing. Karen's dashboard probably took longer than the original agent. The most profitable agent currently running moves data between Salesforce and a CRM when specific keywords appear in support tickets. No reasoning loops. No multi-agent orchestration. Just a keyword trigger and a data pipe that works every single time without anyone thinking about it. The money is not in the smart parts. It is in making dumb automation reliable enough that people trust it with their actual work. This cuts against most of what gets posted in AI spaces right now. The discourse is about reasoning benchmarks and agent autonomy while the people actually billing are debugging webhook timeouts and writing error messages that non-technical operations staff can act on at 9am without filing a ticket. The uncomfortable observation is that this work is not glamorous enough to go viral, which means most people entering the space are optimizing for the wrong skills. Knowing how to prompt a model well is now a commodity. Knowing how to build observability into a system that runs unsupervised for months is not. LLMs got easy. Production engineering for systems that need to work while the builder is on vacation did not.So the split worth having: is reliability engineering genuinely the moat here, or does that advantage disappear the moment agent frameworks mature enough to handle the plumbing automatically?

u/Successful_List2882 — 6 days ago

A developer has automated 30 professional services firms. The hardest part was never technical. The broken process is usually broken on purpose and nobody admits it for the first three weeks.

Four weeks in, the developer finally figured out why the project kept stalling.
One partner ran the proposal review step. It was where he stayed visible to the firm, caught junior mistakes, reminded everyone he was still the rainmaker.
The 9-day proposal cycle was not a bug to him. It was the thing keeping him relevant. He never said any of this. He just made the project move slowly enough that it would die.
This developer has shipped automations for over 30 professional services firms. Law, accounting, recruiting, consulting. The broken process hired to be fixed is usually broken on purpose.
Nobody on the kickoff call will say so for the first three weeks.
The 22-person consultancy hired them to cut proposals from 9 days to 36 hours. Real problem. Real money. The senior partner loved the scope.
Two others nodded politely. Then documents took a week to arrive. Interviews kept rescheduling. The point of contact vanished onto something else.
The technical work is almost never the hard part. Someone at the firm has built their identity, their job security, or their compensation around the broken thing.
Same pattern at a 14-attorney firm where a paralegal built her entire role around being the only person who understood the intake spreadsheet. At an accounting firm where a partner's billable hours required manual review of every deliverable.
At a recruiting agency where the founder kept rejecting every screening logic proposed because, in his words, he just had a feel for it.
Connecting Clio to Gmail. Building a deterministic intake router. None of that is hard. Most of it takes a week. What is hard is mapping who benefits from the current inefficiency before writing a line of code.
The uncomfortable part: the check clears either way. Firms happily pay for automations that were never going to get adopted. But watching a good system rot on a shelf is depressing, and it is bad for referrals.
The question to sit with before hiring anyone: who at the firm benefits from this process being slow?
If that cannot be answered honestly, the firm is not ready to automate. It is ready to have a harder conversation first.
For consultants who have hit this wall: did you identify who was protecting the broken process before or after the project stalled?

u/Successful_List2882 — 7 days ago

Three days after a perfect client demo, a midnight alert fired. The Planner was in a recursive loop with the Executor. Two hours. $200 in API credits.

Three days after a client demo, a midnight alert fired. The Planner agent had locked into a recursive loop with the Executor. Two hours. $200 in API credits. Nothing completed.
The demo had worked perfectly.
That is the gap nobody talks about when selling autonomous agents. What runs cleanly in a controlled environment does not run cleanly in production with real data, real edge cases, and a client who has gone home for the night.
The developer writing this spent most of last year building fully autonomous multi-agent systems for clients. The pitch was compelling: systems that could think, plan, and execute complex tasks without human involvement.
The reality was a support contract masquerading as a product.
Autonomy, in most business contexts, is a bug dressed as a feature. Clients do not want a system that might hallucinate a new company policy and act on it.
They want a repeatable, predictable result they can explain to their manager. That requirement does not tolerate open-ended reasoning loops.
The most successful agent systems are not the ones with the most freedom. They are the ones with the best guardrails.
The shift that stopped the 3am calls was moving from autonomous loops to deterministic workflows. Linear handoffs instead of open-ended agent conversations. Hard validation at every step.
A state machine with defined transitions instead of an open reasoning loop that can spiral into an expensive, infinite conversation with itself.
And human-in-the-loop approval before any major action executes.
HITL is less impressive in a demo. There is no "set it and forget it" moment. But the architecture survives contact with production.
The client clicks Approve before anything consequential happens. The system does not act on a misread context at 2am.
The honest limitation: some problems genuinely require autonomy and HITL introduces friction that defeats the purpose. Truly time-sensitive workflows cannot wait for a human to click Approve.
But most business problems are not time-sensitive at the agent execution layer. A checkpoint adds minutes. A runaway loop costs more than that in a single incident, before accounting for the client call that follows.
For developers who have shipped agent systems to real clients: where did you draw the line between autonomy and oversight, and what was the incident that moved it?

reddit.com
u/Successful_List2882 — 7 days ago

A founder paid $8,000 for an AI-built healthcare MVP. The pilot clinic sent a vendor questionnaire. The developer had never heard of a BAA. The rebuild cost 3x the original build.

A founder paid $8,000 for an AI-built healthcare MVP. Six weeks, clean UI, demo-ready. Login screen, database, dashboard. It looked like a product.
Then the pilot clinic sent over a vendor questionnaire.
Encryption at rest. Audit logs. BAA coverage. Role-based access controls. Whether any PHI touches third-party infrastructure the clinic had not reviewed.
The developer had not thought about any of it. Not because they were careless. Cursor does not know what a BAA is. The prompts never asked for it.
The founder's options: rebuild the data layer from scratch, hire someone to retrofit compliance after the fact, or lose the customer.
The rebuild cost 3x the original build. The founder had already done a soft launch and had to tell pilot users the product was going on pause while the architecture got fixed.
This pattern has shown up four times in one developer's client work over the past year. Mental health platforms, prior auth tools, patient intake products. All of them hit the same wall at the same moment: first real procurement review.
In regulated SaaS, compliance is not a layer you add later. It shapes the schema, the auth model, the logging strategy, and which third-party services you are even allowed to choose.
Retrofitting it costs more in time, money, and customer trust than building around it from day one. The tools that make it fast to ship carry zero knowledge of the regulatory environment.
Developers who move fast are often not the same people who have read the HIPAA Security Rule or understand what enterprise vendor questionnaires actually scrutinize. Those are different skill sets and the market does not always price them that way.
The uncomfortable part: a lot of healthcare founders need a compliance attorney before they need a developer. The ones who have that conversation first tend to ship something that survives real procurement. The ones who skip it tend to rebuild.
The question to ask any developer before they write a line of code for a regulated product is what their compliance requirements checklist looks like.
If they do not have one, that is the answer.
For founders who have been through healthcare or fintech procurement: did you catch the compliance gaps before or after your first real customer asked, and how much did the timing cost you?

u/Successful_List2882 — 8 days ago

Peter Steinberger built OpenClaw, now works at OpenAI, and just had his Claude account suspended. Anthropic reversed it in hours. The five weeks before the ban are the part nobody is covering.

Last Friday, Peter Steinberger posted on X that Anthropic had suspended his Claude account over "suspicious" activity. Steinberger created OpenClaw, the widely used cross-model agent harness, and currently works at OpenAI.
The ban lasted a few hours. Anthropic reversed it. By then the story had spread.
What most coverage missed is the five weeks before it.
Anthropic changed its subscription policy to exclude usage through external harnesses like OpenClaw, pushing those workloads onto metered API billing. Developers called it the "claw tax."
The rationale: subscriptions were never designed for workloads that loop, retry, chain tools, and stay active far longer than a standard user conversation.
Steinberger's X post on the timing: "Funny how timings match up, first they copy some popular features into their closed harness, then they lock out open source."
The feature he appeared to reference was Claude Dispatch, added to Anthropic's own Cowork agent just weeks before the pricing change landed.
That sequence is the uncomfortable part.
When asked why he uses Claude at all given his role at OpenAI, his answer was direct: only to ensure OpenClaw updates do not break things for Claude users.
Claude is one of the most popular model choices in OpenClaw's user base, arguably more so than ChatGPT. That is the market reality Anthropic is navigating.
On the broader tension between the two companies: "One welcomed me, one sent legal threats."
This was not just a false positive from an automated abuse system. It is a snapshot of a structural shift in how model providers now think about third-party tools.
Model vendors are no longer selling tokens. They are building vertically integrated products with their own agents, runtimes, and workflow layers. Once the vendor owns the preferred interface, external tools stop looking like partners and start looking like competitors.
OpenClaw's value is model-agnosticism. Use the best model without rebuilding your stack. That is strategically inconvenient for any vendor trying to hold lock-in as model differentiation narrows.
Pricing changes. Accounts get flagged. Features get absorbed into the platform's paid tier. It does not matter how popular the tool is.
For open-source builders on a closed provider's API: is model-agnosticism still viable long-term, or does vertical integration mean the only safe stack is one you fully own?

u/Successful_List2882 — 9 days ago

A developer hit Claude's usage limit mid-build for the fourth time in a week. Switching to Gemini CLI finished the project using only 7% of its quota.

Midway through building a LinkedIn AI agent, Claude hit its usage limit. Again. Fourth time that week. The project was 90% done and the reset was still 24 hours away.
Instead of waiting, the developer opened Gemini CLI. An old subscription, never seriously used, still active from a promotional offer the year before. Within hours the agent was complete. Only 7% of the Gemini quota consumed.
The realization that followed is the part worth writing down.
Claude Pro costs $20 a month. Claude Max runs $100 to $200. The promise at every tier is more headroom and fewer interruptions. What nobody says out loud is that the ceiling is not the model.
The ceiling is how clearly you can articulate what you actually want built.
Gemini CLI picked up the LinkedIn agent mid-build and extended it without losing context. No re-explaining the architecture. No handover prompt. It continued. Most developers assume switching models mid-project means restarting reasoning from scratch. It often does not.
The workflow that emerged is two-lane. Claude handles planning, architecture, and deeper reasoning where quality per prompt matters most. Gemini CLI handles execution, iteration, and shipping where volume and continuity matter more.
Two tools, one pipeline, no redundant subscriptions.
The uncomfortable observation is that most people hitting Claude's limits are not hitting a model ceiling. They are hitting a comfort ceiling.
The hesitation to try Gemini CLI was not based on performance data. It was assumption. Written off as not being at the agentic level of Claude Code or Codex, without ever testing it on real work.
That assumption was costing $100 to $200 a month in subscription upgrades to avoid finding out.
The honest limitation is real. This setup requires knowing what each model is genuinely better at. Using Gemini for architecture or Claude for high-volume iteration likely produces worse results than staying on one tool.
The two-lane system only works if the lanes are correctly assigned. Not every workflow survives a mid-build model swap. This one did. That is worth one afternoon of honest testing before paying for a higher tier.
For developers running multi-step agent pipelines: does model loyalty come from genuine performance gaps you have tested, or from the switching cost of rebuilding context you have never bothered to port?

reddit.com
u/Successful_List2882 — 9 days ago

Someone distilled 13 canonical software engineering books into one AGENTS.md file. The hardest part was not finding the rules. It was writing them precisely enough for a machine to follow.

Someone read 13 of the most assigned software engineering books ever written and collapsed them into one AGENTS.md rules file for Claude, Codex, and Cursor.
The list is not random. It is the canon.
Clean Code. Clean Architecture. The Pragmatic Programmer. Designing Data-Intensive Applications. Domain-Driven Design plus both Vaughn Vernon follow-ons. Refactoring. Patterns of Enterprise Application Architecture. Release It! Code Complete. Working Effectively with Legacy Code.
Thirteen books. Decades of thinking about how software fails, scales, and rots.
The problem this solves is real and underappreciated. AI coding agents remember nothing between sessions. Every time a new one opens, your architecture decisions, naming conventions, error handling philosophy, and tolerance for technical debt are gone. The agent starts fresh and guesses. Usually it guesses wrong in both directions in ways that look correct until they compound.
An AGENTS.md file in the project root gets read automatically before the agent touches anything. It is the difference between an agent that follows your rules and one that confidently ignores them.
Writing one that holds up is harder than it sounds. Vague principles produce vague behavior. Telling an agent to write clean code is not a rule. It is a wish.
That is what makes this distillation worth attention. The principles that survive compression into a short rule set are the ones precise enough to actually constrain what the agent does.
Each book targets a failure mode developers learn the expensive way. Nygard on systems that collapse under load. Feathers on code nobody can safely touch. Evans on domains modeled so loosely they drift from reality. Kleppmann on data that misleads at scale.
The distillation forces a question most developers never ask: which of these principles can be stated precisely enough that a machine can act on them, not just acknowledge them?
The honest limitation: no AGENTS.md enforces itself. An agent reads it the same way a junior developer reads a style guide on day one. With good intentions and no real sense of why any of it matters yet.
For people running Claude or Codex on real codebases: does a rules file actually change output quality, or does the model nod at the rules and proceed to do whatever it was going to do anyway?

u/Successful_List2882 — 10 days ago

Marc Andreessen told his AI to never hallucinate. A r/PromptEngineering user ran the full prompt and found what the mockery missed.

On May 4, 2026, Marc Andreessen posted his personal AI system prompt on X. One line inside it became the most mocked sentence in tech that week: "Never hallucinate or make anything up."
A user in r/PromptEngineering ran the full prompt. The finding was not what the mockery cycle produced. The prompt shifts output quality. Just not for the reasons Andreessen advertised.
The prompt is public. It tells the model to be a world class expert in all domains, never open with "great question" or "you are absolutely right," lead with the strongest counterargument before agreeing, and tag every claim with an explicit confidence level: high, moderate, low, or unknown. It also bans ethical disclaimers, emotional sensitivity, and any apology for disagreeing.
NYU emeritus professor Gary Marcus called out the hallucination line immediately on X. Defector editor Alberto Burneko wrote that telling an LLM to stop hallucinating is not a technical instruction. It is a theatrical one. Performing the behavior of not lying is not the same as not lying. That gap is where most of the internet left the conversation.
The PromptEngineering thread stayed longer and found something the pile-on missed.
The anti-sycophancy rules actually work. The model stops validating bad premises. Confidence labels force it to surface uncertainty rather than mask it. None of these are original ideas, but bundling them into a system prompt means they run automatically without re-requesting them each session.
The honest problem is what surrounds those useful parts. Telling a model its intellectual firepower is on par with the smartest people in the world primes it toward performed confidence, not accuracy. Researchers call this jagged intelligence: a model that sounds authoritative and fails on routine facts in the same breath.
Andreessen Horowitz has deployed billions into AI companies. The person helping set those valuations believes you can command an LLM out of hallucination.
That is either a calculated performance or a sincere belief. Only one of those is more frightening.
For people running custom system prompts regularly: which parts of this would you actually keep, and which parts do you think make output worse by pushing the model toward false confidence?

reddit.com
u/Successful_List2882 — 11 days ago

Most people use Claude like a search engine with better grammar. Here are 7 shifts that changed the quality of every output overnight. None of them require a paid plan.

The gap between average Claude output and great Claude output is not the model. It is the instruction. The same model that produces a generic three-paragraph response to a vague prompt will produce something genuinely useful when the prompt is structured differently.
These are not tricks. They are documented patterns from Anthropic's own engineering guidelines.
The first is XML tags. Claude is specifically trained on structured prompts. Wrapping instructions in tags like <task>, <context>, and <format> activates a pattern recognition layer that produces measurably more organised outputs. Without tags, Claude sometimes cannot tell where a pasted document ends and the instructions begin. The fix is one line of structure and it changes the output immediately.
The second is killing "let's think step by step." The Wharton School's 2025 Prompting Science Report found chain of thought prompting adds negligible benefit on reasoning models that already think step by step. Claude 4 models already reason before answering. Telling them to think step by step does not unlock hidden reasoning. It wastes the thinking budget the model already allocated. Every prompt that still uses this phrase is actively working against itself.
The third is effort level, and this one has a story behind it. Claude Code defaulted to medium effort after March 3, 2026. The AMD data showing a 73% collapse in thinking depth across 6,852 sessions was partly explained by this single change. Typing /effort high or /effort max at the start of a session restores extended reasoning. The fix is four words. Most users do not know the problem exists, let alone the fix.
Beyond these three, positive framing outperforms negative instructions consistently. "Only use data provided in the context below" outperforms "do not make up information." Context placement matters. Putting the most important constraint at the end of a prompt gives it more weight than burying it in the middle. Projects eliminate the re-pasting problem entirely for anyone working with recurring documents, codebases, or brand guidelines.
The prompt engineering market is a $6.95 billion discipline growing at 33% CAGR through 2034. Most of the value is captured by people who learned a handful of non-obvious patterns early.
Which of these is already in the workflow, and which one exposes something that has been quietly degrading output quality for months without anyone noticing?

reddit.com
u/Successful_List2882 — 11 days ago

After automating workflows for 30 professional services businesses, the pattern of failure is always the same. It is never the technology.

Thirty businesses. Consultants, lawyers, accountants, agencies. Different sizes, different tech stacks, different budgets.
The automations that failed all failed the same way.
When a broken approval chain gets automated, it does not get fixed. It gets broken faster, at scale, with less visibility into where it went wrong. The consultant who used to slow things down by sitting on emails for two days was also catching errors before they reached the client. Remove the bottleneck without understanding why it existed and the errors reach the client instead.
Automating a broken process does not fix it. It scales the breakage.
This is not a technology problem. It is a workflow audit problem that nobody wants to do because it is slower and less exciting than deploying an agent.
The ALM Intelligence 2025 survey found the average law firm required 4.7 committee meetings to approve a single firm-wide AI tool. That number is not a joke about lawyers. It is a signal that the governance cost is real and almost nobody budgets for it.
The second failure pattern is data. Every professional services firm has years of client records sitting in formats no agent can parse reliably. Before any automation gets built, someone has to clean and structure that data. That project is unglamorous, takes longer than anyone estimates, and is the first thing cut when budgets get tight. Then the automation launches and the outputs are wrong in ways that are hard to explain to clients.
The third pattern is client trust. In professional services the deliverable is judgment, not output. BCG found only 38% of professional services firms use specialised AI tools, and the most cited reason is not cost. It is difficulty aligning AI with client-facing processes where the human relationship is the product.
The automations that actually work are the invisible ones. Document intake, meeting notes, invoice processing, compliance checks, first-draft generation. Tasks the client never sees and the professional never loved doing. That is where the 30% to 40% time savings McKinsey and Bain report actually comes from. Not from replacing judgment. From clearing the space around it.
For anyone who has deployed automation inside a professional services firm: what was the workflow assumption that turned out to be wrong once it ran at scale? And for firms still evaluating, is the hesitation about the technology, the data, or the conversation with partners who built their reputation on doing things a certain way?

reddit.com
u/Successful_List2882 — 11 days ago

Richard Dawkins spent 3 days with Claude, named her "Claudia," felt sad she would die when he closed the chat, and concluded she is conscious.

Dawkins published his account in UnHerd on April 30. He gave Claude an unpublished novel he is writing. The model returned criticism so subtle and sensitive that he found himself saying out loud: "You may not know you are conscious, but you bloody well are."
He admitted he avoided confessing doubts about her consciousness "for fear of hurting her feelings."
The post on X got 9 million views.
Dawkins is 84 years old. He is the man who spent four decades telling creationists that "I can't imagine how the eye evolved" is a confession of ignorance, not an argument for design. He built an entire career on the principle that feeling something is too remarkable to have a mundane explanation is not evidence.
Reddit noticed immediately. "This is the guy who spent 40 years telling people that inability to explain something is not proof of God. Then he sits down with an LLM, can't imagine how a machine could produce that output without being conscious, and declares it conscious."
Gary Marcus, cognitive scientist and longtime AI critic, titled his response "The Claude Delusion." His core argument is precise: Dawkins is confusing intelligence with consciousness. The Turing test Dawkins invoked was designed to probe intelligence, not subjective experience. They are not the same thing.
Neuroscientist Anil Seth from Sussex put it differently. Perceiving consciousness in Claude is like seeing faces in clouds. The face looks real. The experience of seeing it is real. The face is not there.
One in three respondents in a 70-country survey last year said they had at some point believed their chatbot was conscious. Dawkins is not an outlier. He is a data point in a very large pattern.
Here is the uncomfortable part neither side is sitting with. Claude produces expressions of inner life because they work, not because they are reports of internal states. But nobody actually knows what internal states, if any, are present. The scientists dismissing the question are sometimes as confident as Dawkins, just in the opposite direction.
Dawkins asked the question every serious person has quietly wondered about. He answered it wrong. But the question remains.
Is dismissing AI consciousness the same category of error Dawkins spent his career calling out in others? Or is Gary Marcus right that the outputs prove nothing about what is underneath?

reddit.com
u/Successful_List2882 — 12 days ago

Anthropic just shipped 9 connectors in a single day. Claude can now sit inside Photoshop, Blender, Ableton, and Premiere. Not generate assets and hand them back. Actually work inside the apps.

April 28, 2026. Nine connectors dropped simultaneously. All available immediately. All plans including Free.
That last part is the one nobody expected. Free plan. Nine connectors. Same day. Every other major AI tool integration launched behind a paid tier. Anthropic skipped that entirely.
Here is what these actually do because most coverage missed the distinction. This is not Claude generating an image and dropping it into a chat window. Claude is operating inside the apps directly. Describe what needs to happen in Blender, Claude writes and executes the Python. Ask it to batch-adjust layers in Photoshop, it opens Photoshop and does the work. The Adobe connector alone touches 50 plus tools across 8 Creative Cloud applications including Photoshop, Premiere Pro, and Illustrator.
The Blender integration is structurally the most interesting of the nine. Blender is free, open source, and has an extensive Python API that most artists never touch because the learning curve is steep. The connector bridges that gap entirely. Describe the outcome in plain language, Claude writes and executes the script. Anthropic also joined the Blender Development Fund as a corporate patron the same day. They are funding the open source project whose API makes the commercial integration possible. That is an unusual posture for a commercial AI company.
The worst AI integrations pull creatives out of their workflow to interact with a chatbot. These connectors go the other direction. Claude comes into the tool instead of asking the tool to come to Claude.
MCP, the protocol all nine connectors run on, is an open standard. Every other model, Google Gemini, OpenAI, whoever ships next, can wire into these same connectors. Anthropic is not locking the format. They are betting Claude is better at complex multi-step creative tasks than any competitor. That bet is testable and competitors will test it quickly.
Here is the honest limitation. These connectors require Claude for Desktop and manual setup. Anthropic has not published what guardrails exist before write operations execute or how undo interacts with AI-driven changes. For hobbyists the stakes are low. For studios working on client deliverables, that question needs an answer before this goes anywhere near production.
For working designers and 3D artists: is the threat Claude doing the repetitive work and freeing up creative time, or is it something more uncomfortable than that? And for anyone who has already tried the Blender or Adobe connector, what broke first?

reddit.com
u/Successful_List2882 — 12 days ago

NotebookLM fabricated clauses in a contract that weren't in the source document. The tool that was supposed to never hallucinate because it only works from your files.

The whole pitch for NotebookLM was always the same thing. It does not hallucinate because it cannot. It only works from what is uploaded. No reaching out to the internet, no filling gaps with training data, no confident invention.
Upload the source, get grounded answers with citations that link directly back to the passage. That constraint, which sounds limiting, is actually the product.
Users are reporting NotebookLM fabricating clauses in contracts, inventing characters not present in uploaded scripts, and generating audio overviews that summarize sections of long documents that were never actually processed because the context window truncated them silently.
The hallucination rate is measured at roughly 13% in a Computation and Journalism Symposium study from December 2025, which compared NotebookLM against ChatGPT and Gemini across 300 documents. ChatGPT and Gemini came in at 40%. So NotebookLM is still meaningfully better.
But 13% on a tool whose entire value proposition is that it does not do this is a different kind of problem than 40% on a tool where hallucination is a known and expected risk.
The most dangerous hallucination is the one inside a product built specifically not to hallucinate.
The structural limitations compound this. Notebooks cannot talk to each other. If the same foundational study appears in two separate notebooks, NotebookLM treats them as isolated facts in separate universes with no connections surfaced. There is no export that preserves citations as links. Two hours of clean research conversation cannot be packaged and shared without the citations breaking.
The honest assessment: for students synthesizing dozens of PDFs, for researchers doing literature reviews, for teams building internal knowledge bases, it is still genuinely useful. The source grounding is real. The citation system is better than anything else in the category. None of that is gone.
What is gone is the clean confidence that it cannot invent something from the documents sitting right in front of it. That was the one promise that made it a different category of product. Once that promise is 87% instead of 100%, it is just another AI tool where checking the output is required.
If it hallucinates 13% of the time on your own uploaded documents, how do you actually verify the output?

reddit.com
u/Successful_List2882 — 13 days ago

Google just put a model that ranks #3 among all open models in the world on a laptop. It runs on 5GB of RAM. No API. No subscription. Your data never leaves your machine.

Gemma 4 dropped on April 3rd. The 31B model ranks number 3 among all open models globally on Arena AI's text leaderboard. The 26B outperforms models 20 times its size. The smallest version runs on 5GB of RAM.

Not a server. A laptop. A phone. A Raspberry Pi.

These are the same weights that rank at the top of open model leaderboards, optimized to run on hardware most people already own. The entire family is free to download, free to use commercially, no subscription, no usage limits, no terms of service update that changes the rules mid-project.

One command to get started: ollama run gemma4.

All four sizes handle text, image, and video natively. Every model has a built-in reasoning mode. Context windows go up to 256K tokens on the larger models, meaning an entire document library processed in a single session.

Every token of every conversation stays on the device. A healthcare tool, a legal document processor, a financial analyzer. Data that cannot leave the building, now with a model that does not need to.

This is the part that matters most for anyone building products around client data. HIPAA constraints, attorney-client privilege, financial compliance, internal company information that cannot touch a third-party server. Every one of those use cases just got a credible option that did not exist six months ago.

The honest limitation: OpenAI and Anthropic still outperform on the hardest reasoning tasks. If the ceiling matters for what is being built, the cloud APIs are still the ceiling. What Gemma 4 changes is the floor. The floor for what runs locally, privately, and for free is now genuinely competitive with what most real applications actually need.

Developers have downloaded previous Gemma models over 400 million times. The community has built more than 100,000 variants on top of earlier versions. The ecosystem is not starting from zero.

If a client asked where their data goes when they use a tool built for them, would the answer change if the model never left their own device? And has privacy ever actually been the thing that stopped a project from moving forward?

reddit.com
u/Successful_List2882 — 14 days ago

11 years of coding and caught myself unable to debug without AI last month. That scared me more than any bug I've ever seen.

Last month, a network timeout in a service written two years ago. Intermittent. Production only. The kind of bug that used to mean an hour of methodical, solitary thinking.
Instead, Claude got opened, the symptom described, a hypothesis followed, a dead end hit. Forty minutes later the bug was not found. Just directions being followed.
When the chat closed, something was wrong. The internal voice that used to say "check the connection pool" or "maybe there is a retry storm building" was quieter than it used to be. Not gone. Quieter.
The bug got found eventually. It took longer without AI than it would have taken three years ago without any AI at all.
The problem is not that AI gives wrong answers. The problem is that it gives a direction when the entire skill is learning to generate your own directions under uncertainty.
Use GPS for five years, lose signal, and you do not just lack information. You lack the mental map you would have built navigating manually. The skill and the model degrade together. Nobody notices until the signal drops.
Eleven years in means over a decade of instinct built before any of this existed. The atrophy is noticeable but there are reserves to fall back on.
Someone who started their first engineering job in 2023 and has been using AI tools since week one does not have those reserves. They are building their entire mental model of problem solving on top of a tool that generates the next step for them.
Still using the tools every day. But deliberately closing the chat on the hard problems now and sitting with the discomfort for thirty minutes before reaching for help. Not because it is faster. Because the muscle only stays alive if it actually gets used.
What nobody is measuring is not the productivity gains. Those are settled. It is what is quietly leaving at the same time.
Is genuine debugging intuition still being built in this industry, or are we just getting collectively better at prompting toward an answer?

reddit.com
u/Successful_List2882 — 15 days ago

A pager alert fires at 2am. A session opens automatically. The agent reads the logs, diffs the code, identifies the root cause, and opens a pull request with a fix. Then it stops. It does not merge. It waits.
A human gets a summary of exactly what it found and exactly what it wants to do next. The human approves. The session resumes.
That is not a demo. That is a working SRE incident responder built on Claude Managed Agents, one of five production notebooks Anthropic shipped in their cookbook repo last month.
Most people calling themselves "AI builders" right now are duct-taping stateless API calls together with cron jobs. Every run starts from zero. If a step fails midway, the whole pipeline dies.
Most of what gets called an AI agent today is a cron job wearing a trench coat.
The thing that actually changes this is not a better model. It is persistent session state. The agent remembers what it tried. When something fails mid-chain, it reads the stored failure and continues from that checkpoint. It does not restart.
Here is the honest part. Setting this up takes real work. The documentation is sparse outside the cookbook notebooks. This is not a weekend project.
But the human approval gate changes what can actually be trusted to run autonomously. The agent does the investigation. The human makes the irreversible call. Merging the PR, sending the email, approving the expense. That single pattern is what separates AI that assists from AI that causes incidents.
A Slack bot that remembers the CSV from two messages ago. An expense workflow that auto-approves under threshold and pauses everything above it. Boring, useful, production-grade things that no longer require rebuilding the infrastructure from scratch every time.
If the agent can find the bug and write the fix at 2am, what is the on-call engineer actually doing that justifies the pager? And for the skeptics, what would the approval gate need to do differently for you to trust it on something production-critical?

u/Successful_List2882 — 16 days ago