u/Historical-Driver-64

I tested Claude Fable. Switched back to Opus in an hour. The problem was never the model.

Tried Fable this week. Better benchmarks, larger context window, all the usual improvements. I was back on Opus within the hour.

Not because Fable is bad. It is genuinely more capable on paper. But here is what I noticed: the things I wanted Fable to do better were things Opus could already do if I prompted it correctly. The ceiling I kept hitting was not the model. It was me.

I run the same setup for everything now. Opus for coding and anything that needs real reasoning. Haiku for fast, everyday tasks where speed matters more than depth. That combination has not changed in months, and I am more productive than I have ever been. Not because the models improved, but because I finally learned how to use what was already there.

The iPhone analogy is the right one actually. I have an iPhone 14. The 17 exists. I know it is technically better. I also know my 14 does everything I need it to do, and the gap between what it can do and what I actually use it for is still enormous.

That gap is the real story with LLMs right now.

Every major release gets the same cycle. Benchmarks drop, Twitter lights up, people run the same three tests, declare a winner, and move on. What almost nobody talks about is that most users, including people who use these tools every day, are still operating at maybe 30 to 40% of what the current generation of models can actually do. The skills, the context files, the prompting architecture, the memory systems. Most people skip all of that and wait for the next model to close the gap for them.

It does not close the gap. It just raises the ceiling that most people are not hitting anyway.

I am not saying new models do not matter. For certain use cases, the capability jumps are real and meaningful. Researchers, engineers running complex multi-agent workflows, people working at the actual frontier of what these tools can do, they will feel the difference immediately.

But for the rest of us? The honest answer is that we are not bottlenecked by the model. We are bottlenecked by how we are using it.

The most productive thing I did this year was not switching to a newer model. It was spending two weeks actually learning how Opus reasons, what makes it fail, and how to structure context so it stops making the mistakes I kept blaming on its capabilities.

After that, the release calendar stopped feeling like news.

What is the last thing you figured out about your current model that you wish you had known six months earlier?

reddit.com

u/Historical-Driver-64 — 2 days ago

▲ 2 r/CreatorsAI

Companies keep buying AI brains when all they needed was an alarm clock

A supplements brand came to me wanting an AI system to watch inventory, decide when to reorder, and email suppliers automatically. Seven people on the team, fourteen products. He had seen a demo somewhere and wanted that.

I looked at his Shopify data. He had reordered the same products at the same quantities from the same suppliers for over a year. Protein powder hits 200 units, he orders more. Same pattern since 2023. There was no decision left to make. He just had not noticed.

I quoted him $5,200 for the AI build. Then I told him I could solve it for $700.

Basic workflow. Checks inventory every morning, compares it to reorder points, sends a pre-written email to the right supplier if anything is low. Costs $60 a month. No AI involved at all.

He told me it felt too basic. His ops person got forty minutes back every morning in the first week. He stopped caring about basic after that.

Here is the part of this industry nobody wants to say out loud. Most AI agent pitches are not selling intelligence. They are selling a brain to solve a problem that already has a fixed answer. The client cannot tell the difference because the demo looks the same either way.

The test is simpler than the industry wants you to think. Does the input vary in a way that requires interpretation, or does it follow the same shape every time?

I built an actual AI agent for a property management company. Tenants text things like "my sink is leaking and the hallway light has been out for a week." Two problems, one message, no structure. The agent reads it, splits the request, routes each issue to the right vendor with the right priority. Two hundred messages a month, fifteen hours saved weekly. That one needed AI because human language is genuinely unpredictable. You cannot write a rule for it.

You can absolutely write a rule for "inventory below 200, send email."

Nobody in this industry is incentivized to point that out. The agency gets paid more for the complex build. The platform gets a bigger subscription. The client gets a better story to tell. The only person losing is the one writing the check, and they usually do not find out until someone outside the sales conversation looks at what the job actually required.

I now charge more for the planning call than I used to charge for some entire builds. That conversation is where the real value is. Most expensive mistake I see is clients paying agent prices for alarm clock problems.

If your process runs the same steps on the same kind of input every time, you do not have an AI problem. You have a workflow nobody automated yet.

How many AI tools at your company are actually just an if-statement with better marketing?

reddit.com

u/Historical-Driver-64 — 3 days ago

▲ 7 r/CreatorsAI

Automating someone's job without automating their visibility is sabotage with good intentions

Logistics company, fifteen people. They bring me in to automate order exception handling. Standard work at this point.

There is an ops coordinator who spends three hours every morning sorting delivery screwups, tagging things in Airtable, pinging people in Slack. She is fast, she is good, and everyone in the company knows her name because she is the one keeping things moving every morning before lunch.

I build the automation. Two weeks in n8n. Pulls exceptions, sorts them into categories, tags Airtable, routes Slack alerts automatically. Her three hours drops to twenty minutes of sanity checking. She is thrilled. I am thrilled. Everyone is happy.

A month later her manager pulls her into a meeting. Not a good one. Essentially a "what exactly are you doing all day" conversation. The CEO had once name-dropped her at an all-hands as the person who keeps the trains running. That was her entire reputation in that company. I automated it away without thinking about it for a second.

She did not get fired, but they put her into a performance review process that did not exist before, because her manager could no longer see her work. It was happening quietly in the background now, invisible by design.

I brought it up with the founder. He shrugged and said she should find new ways to add value. Nobody told her that was the deal when they hired me. Nobody told me either.

Here is what that experience changed for me. Visibility is not a soft consideration. It is a dependency, the same category as an API key or a set of credentials. If you do not map it before you build, you can ship something that works perfectly and still wreck someone's standing in the company, and nobody will flag it as a risk because it does not look like a risk. It looks like progress.

The work was never just the work. The work was also the proof that the work was happening. Three hours of visible effort in Slack every morning was not inefficiency. It was a performance review happening in real time, just not labeled as one. Compress that into twenty minutes of quiet background processing and you have not just improved a workflow. You have deleted someone's evidence.

I ask a new question during discovery now. Who gets credit for the work I am about to automate. Who looks good because this thing runs the way it runs. It sounds like a soft question. It is not. It is the same category as asking what breaks if this API goes down.

The automation worked exactly as designed. That is what makes this uncomfortable. Nothing failed. The only thing that broke was something nobody thought to put on the list of things that could break.

I still think about her sometimes. Not sure she is even at that company anymore.

What's the dependency you almost missed because it wasn't technical?

reddit.com

u/Historical-Driver-64 — 4 days ago

▲ 0 r/CreatorsAI

A PhD is 25 years of narrowing your search space. LLMs just made that the worst career bet.

Most of what you call thinking is not thinking. It is retrieval.

You encounter a problem. You search your memory for relevant patterns, analogies, prior solutions. You chain them together in a way that fits your specific situation. That is not original thought. That is search with extra steps.

We never noticed because search was expensive. You had to live long enough to accumulate the patterns. You had to read the right books, know the right people, be in the right rooms. The person who had searched more, across more domains, over more years, had a genuine advantage. We called that advantage intelligence.

LLMs did not create intelligence. They made search nearly free. And when you remove the cost of search, you reveal what was underneath the whole time.

Here is the uncomfortable part.

The most expensive form of search humans ever invented is deep specialization. A PhD is roughly 25 years of progressively narrowing your focus until you know more about a smaller and smaller slice of the world than almost anyone alive. That was valuable when accessing that knowledge required finding the person who held it.

LLMs indexed the slice. The human who spent 25 years narrowing into it now competes with a tool that can retrieve the same knowledge in milliseconds, synthesize it with adjacent domains they never studied, and apply it to a problem they have never seen.

The expert's advantage was always information asymmetry. That asymmetry is gone.

What survives is something different. Not depth in a single domain but the ability to connect domains that do not usually talk to each other. The person who knows enough biology to have a useful conversation with an economist. The founder who understands enough about logistics to spot what the engineers are missing. The writer who can translate technical research into something a policymaker will actually read.

These people were always valuable. They were just harder to identify and harder to scale. AI changes both of those things.

The centaur model from chess is the right frame here. After Deep Blue beat Kasparov, the strongest players were not humans alone or computers alone. They were humans working with computers who knew how to use them. That edge lasted until computers got good enough that the human added nothing.

Go is the more interesting case. After AlphaGo, human Go players did not get worse. They got dramatically better, because they finally had something to learn from that was operating at a level they could not reach alone. The game opened up instead of closing down.

The question nobody has a clean answer to is which model applies to knowledge work. Does the human eventually add nothing, like chess? Or does access to a superhuman search engine raise the ceiling of what humans can do, like Go?

Probably depends on what you are doing. And probably depends on whether you spent the last decade going deeper into one thing or wider across many.

Which direction did you go?

reddit.com

u/Historical-Driver-64 — 5 days ago

▲ 1 r/CreatorsAI

Everyone is arguing about the $1000. The 50% public stake is the actual bomb in this bill.

The $1000 is bait.

Not in a cynical way. In a tactically smart way. It is the number that gets people to read the headline, share the post, and form an opinion. It is also the least interesting part of what Sanders actually proposed.

The bill would give the public a 50% ownership stake in the largest AI companies in the country.

Sit with that for a second.

Not a tax. Not a fine. Not a regulatory fee. Ownership. The argument is that these models were trained on writing, code, art, and conversations produced by the public, without compensation, and the companies that did it are now worth trillions. If the public's data is what created the value, the public should own part of the asset.

That is not a socialist fringe argument. That is a straightforward property rights argument dressed in different clothes. You used my stuff to build something valuable. I want equity.

The counterargument is that data is not the same as labor or capital, that the value came from the engineering and the compute and the product decisions, not the raw training material. Maybe. But we do not actually know how to price that split, and the companies that would benefit most from answering that question conservatively are the ones currently setting the terms.

Here is what makes this genuinely different from every other AI regulation conversation. Most proposals are about slowing things down, adding guardrails, creating liability. This one is about ownership. It does not try to stop the machine. It tries to make the public a shareholder in it.

That is a much harder thing to argue against without revealing exactly whose interests you are protecting.

The $1000 will get debated, dismissed, and probably killed in committee. The ownership question is not going away. Every model that ships, every valuation that climbs, every artist and writer and developer who finds their work in a training set they never consented to makes the underlying argument stronger.

Sanders introduced a bill. He also introduced a frame. The frame is going to outlast the bill by a long time.

What does it actually mean to own part of an AI company that was built on work you never agreed to license?

reddit.com

u/Historical-Driver-64 — 5 days ago

▲ 8 r/CreatorsAI

UBI is not a solution to AI job loss. It is a ransom payment to save the system causing it.

Every major tech CEO who publicly supports UBI also runs a company actively eliminating jobs through automation.

Read that again.

Altman funds UBI research. Bezos built the warehouses where robots replaced humans at scale. Musk talks about universal income while deploying Optimus on factory floors. These are not contradictions. They are a coordinated message: we will automate everything, and we will give you just enough to not revolt.

That is not a solution. That is a negotiation where one side sets the terms.

The original argument for UBI was dignity. Enough to live on, unconditionally, so that people could contribute to society in ways that markets do not price. That is a genuinely interesting idea.

What is being discussed now is different. A monthly transfer just large enough to replace a wage, funded by the productivity gains of the automation that eliminated that wage, administered by governments that depend on corporate tax revenue to stay solvent. The people who captured the value decide how much gets redistributed. The people who lost the jobs take what they are offered.

Here is the uncomfortable part. The alternative is not obvious.

You cannot stop automation. The assembly line argument is real. Every major labor displacement in history eventually resolved into new categories of work, new industries, new demand. Maybe that happens again. Maybe AI is different enough that it does not.

But the people most loudly insisting it will be fine are the ones who profit most directly if it is, and the people most loudly insisting it will collapse are often the ones selling the solution.

What nobody wants to say is that we are running an experiment in real time with no control group, on an economy that billions of people depend on to survive, guided by incentives that have never once historically prioritized the people at the bottom of the displacement curve.

The problem was never the technology. It was always who owns it, who benefits from it, and who gets to decide what the fallout is worth.

UBI does not change any of those answers. It just makes the current arrangement more stable for the people who designed it.

reddit.com

u/Historical-Driver-64 — 6 days ago

▲ 3 r/CreatorsAI

Open source was built on human-paced contribution. AI just broke that assumption.

The social contract of open source was never written down, but everyone understood it.

You find a project useful. You hit a bug. You spend real time figuring it out, write a fix, submit a PR, maybe explain your reasoning in the comments. The maintainer reviews it. The whole thing moves at human pace, which means it is slow, but also self-regulating. Bad contributions get filtered out naturally because effort is expensive.

AI made effort cheap. That changes everything.

Not because AI-generated PRs are always bad. Some are fine. The problem is volume and confidence. A generated PR arrives with a clean description, reasonable-looking code, and zero actual testing behind it. At human contribution pace, a maintainer can catch that in a reasonable amount of time. At "someone hit generate and submit" pace, the review burden compounds faster than any single person can absorb.

Here is what nobody is talking about: the people most exposed to this are not employees at well-funded companies. They are individuals maintaining tools in their spare time, tools that production systems quietly depend on, tools with no budget and no team and no on-call rotation.

The maintainer who built something useful, open sourced it, watched it grow, and is now dreading GitHub notifications is not an edge case. That is the median story of open source sustainability right now. AI just accelerated the timeline.

The tragedy of the commons usually plays out slowly enough that someone notices and intervenes. This one is moving fast.

There is a reasonable counterargument that better tooling will catch up, that AI review tools will filter AI contributions, that the ecosystem adapts. Maybe. But the maintainers burning out right now are not waiting for that equilibrium. They are going quiet. And when the person who understands a codebase deeply enough to maintain it disappears, that knowledge does not get replaced by the next AI PR that comes in.

Most of the internet runs on libraries maintained by people nobody has ever heard of. A lot of those people are one bad month away from archiving the repo and moving on.

If you use open source software and it has saved you hours of work, find the maintainer's GitHub Sponsors page. It probably exists. It is probably empty.

For people who maintain anything with real adoption, how much of your review time is AI-generated contributions at this point?

reddit.com

u/Historical-Driver-64 — 6 days ago

▲ 6 r/CreatorsAI

Anthropic just disclosed that Claude writes 65% of its own company's production code

Buried inside the Claude Tag announcement this week was a number that should have stopped everyone mid-scroll.

65% of the code currently being merged into Anthropic's own product codebase is created by an internal version of Claude Tag.

Not "AI assists the engineers." Not "we use Copilot for autocomplete." The primary author of the majority of production code at the company building one of the most capable AI systems in the world is that same AI system.

Read that again slowly.

This is not a demo. This is not a benchmark. This is a frontier AI lab telling you, in a product announcement, that the loop has already partially closed. The model is writing the code that ships the model.

Most people reading the Claude Tag launch focused on the Slack integration. Understandable. It is a genuinely useful product. An AI that joins your workspace as a team member, builds context from your channels, handles async tasks, remembers what happened last month. Good feature.

But the internal number is the actual story.

Here is why it matters beyond the obvious. Anthropic is not a company that moves carelessly. They are probably the most publicly cautious major lab on questions of AI safety and capability. If they are comfortable disclosing that Claude writes the majority of their production code, they have thought hard about what that means and decided the answer is fine.

That disclosure is not incidental. It is a signal.

The question it opens up is not "will AI replace developers." That debate has been running for three years and is mostly noise at this point. The real question is what happens to software quality, security posture, and institutional knowledge when the primary author of a codebase is not a person.

Anthropic's engineers still review and merge. The human is still in the loop. But the cognitive model of "engineers write code, AI helps" has already inverted at one of the most important software organizations on the planet, and the announcement went out in a press release about a Slack bot.

OpenAI separately disclosed this week that Codex accounts for 99.8% of weekly output tokens internally. The labs are not just shipping AI to the world. They are the first customers running the experiment on themselves.

What does a software team actually look like in three years if this compounds?

u/Historical-Driver-64 — 7 days ago

▲ 2 r/CreatorsAI

Best low-cost way to get MVP mockups without hiring a designer?

Building something and trying to validate before spending real money on design. Need 3 to 5 mobile screens: onboarding, core feature, and a paywall, good enough to run user interviews or set up a waitlist. Figma-editable would be a bonus so I can hand it off later.

Tools I've looked at so far:

Appthetics – pre-built mobile UI kits

Uizard – AI-generated mockups

Google Stitch – Google's new AI UI tool

Sleek – not sure how mature this is

Figma Community templates – free but requires some Figma knowledge

Fiverr – human output, but slower and costs more

Has anyone actually shipped with any of these? Curious what's worked for early validation before committing to a real design budget.

reddit.com

u/Historical-Driver-64 — 8 days ago

▲ 3 r/CreatorsAI

notebooklm has a live web mode almost nobody knows exists. here's what else you're missing.

Google keeps shipping NotebookLM updates without telling anyone. Most users are still working with a version of the tool that stopped existing three months ago. Here are ten features that changed how serious users actually work with it.

Drive auto-sync: attach a Google Drive file as a source and one button refresh pulls the latest version. No more delete-and-reupload.

Gemini integration: attach your notebook to Gemini and answers pull from both your uploaded sources and live web data simultaneously.

Revise button in slide decks: generate a deck, hit Revise, and restructure titles, visuals, or entire sections without starting over.

Interactive audio mode: interrupt the AI hosts mid-conversation and redirect them toward what you actually want covered.

Persistent memory: give NotebookLM context that carries across notebooks instead of starting cold every session.

Selective source querying: ask a question against one specific source instead of the entire notebook at once.

Source tagging and organisation: label and group sources so large notebooks stay navigable as they scale.

Timely source discovery: surface recent relevant sources on your topic directly inside the notebook interface.

Prompt and source identification: see exactly which sources and prompt generated any specific answer the model returned.

Watermark removal: a workaround exists for exported content, though it lives outside NotebookLM itself.

The features that would change how you use the tool are the ones Google buried in the interface instead of announcing.

Which of these did you actually already know about?

reddit.com

u/Historical-Driver-64 — 9 days ago

▲ 9 r/CreatorsAI

ai is eliminating entry-level jobs. in 10 years, where do senior employees come from?

Every senior developer, analyst, lawyer, and accountant working today learned their craft by doing the entry-level work first. The tedious stuff. The first-draft memos nobody reads. The data cleaning. The bug tickets. The client calls that go nowhere. That is not busywork. That is how professional judgment gets built.AI is now doing most of it.Junior hiring across law, finance, consulting, and software has dropped measurably in the past two years. The stated reason is efficiency: AI handles first drafts, initial research, data processing, and code review faster and cheaper than a 22-year-old six weeks out of university. The productivity math is clean.

The talent pipeline math has not been run yet.Senior professionals are not born senior. They are built over a decade of low-stakes repetitions that gradually become high-stakes decisions. A junior analyst who spends three years building financial models develops an instinct for when a model is lying to them that cannot be extracted from the model itself.

A junior lawyer who drafts a hundred contracts learns where the risk actually lives. That pattern recognition is the job. The entry-level work is the training data for the human.When AI handles the repetitions, the human never develops the instinct.Companies are solving a cost problem in 2025 and creating a competence problem in 2035 that nobody has budgeted for.

The counterargument from the optimist camp is that AI creates new entry points. Junior workers become AI supervisors, prompt engineers, output reviewers. New skills emerge to replace old ones. This is plausible but unproven, and it assumes the supervisory role builds the same judgment that doing the underlying work would have built. There is no evidence yet that reviewing AI-generated contracts produces the same professional intuition as drafting them under a senior partner's correction.

The historical analog that matters here is not previous automation waves. It is medicine. Surgical residents learn by operating, under supervision, on real patients. You cannot automate the residency and expect the same surgeons to emerge from the other end. The profession understood this instinctively and protected the learning pipeline even when it was inefficient.

Most industries have not had that conversation yet. They are optimizing the present quarter while the next generation of senior talent is quietly failing to materialize. In ten years, when the current generation of experienced professionals ages out, who exactly is going to replace them?

u/Historical-Driver-64 — 10 days ago

▲ 2 r/CreatorsAI

someone pulled 1000 transcripts from a trading youtube channel and ran them through an llm. the results were not flattering.

Watching a few videos from a stock trading channel, the advice sounds confident and consistent. The presenter has a system. The logic tracks. You start to trust the pattern.A developer wanted to test whether that confidence held at scale. So they pulled transcripts from just under 1000 videos from a single channel and ran the entire dataset through an LLM to check for consistency across the full body of work.

The finding was not that the channel was wrong. It was that watching a handful of videos is a structurally bad way to evaluate whether someone's advice is consistent. Ten videos cannot surface the contradictions that appear when market conditions change month to month. The full 1000 can. Advice that sounded like a coherent strategy in individual videos started showing different shapes depending on the day and the direction of the market.

The same presenter, the same confident delivery, different conclusions depending on what had happened recently.This is the thing AI makes possible that was not practically possible before. A human can watch 20 videos and form an impression. An LLM can hold 1000 transcripts in context and return the patterns that repeat versus the one-off claims made during specific conditions. Those are different types of knowledge. The first is an impression.

The second is an audit.The question was never whether the channel sounded credible. It was whether the logic held when you could compare every version of it at once.The technical side had one real friction point. Auto-generated YouTube transcripts have no punctuation and mangle financial terminology consistently enough to be noticeable. In practice it did not matter much. The LLM handled the degraded text well enough for pattern analysis.

The content was clear even when the formatting was not, which suggests transcript quality is a smaller obstacle to this kind of analysis than it initially appears.The workflow that made it practical was a small scraping tool built specifically for bulk transcript extraction, because downloading 1000 transcripts one by one is not a realistic manual process. That tool turned into a side product afterward, which is a reasonable outcome from a project that started as a personal credibility check.The honest limitation is that consistency is not the same as accuracy.

A channel that consistently repeats the same wrong framework will pass a consistency audit. What this method surfaces is whether someone's public position shifts with market conditions rather than from genuine strategic evolution. That is useful information. It is not a complete picture.

Most finance content is consumed in the way that makes it hardest to evaluate. Individual videos, watched when the topic feels relevant, without any comparison to what the same person said six months earlier under different conditions.

Would you trust a creator more or less if you knew their advice had been consistency-checked across hundreds of videos?

reddit.com

u/Historical-Driver-64 — 11 days ago

▲ 1 r/CreatorsAI

openai is losing the enterprise race to anthropic and now it wants to cut prices before its ipo

ChatGPT's share of global generative AI web traffic dropped from 77.6 percent in May 2025 to 53.7 percent by April 2026. For the first time in the Ramp AI Index, which tracks enterprise software spending, more companies are paying for Anthropic than for OpenAI. And Anthropic's valuation just eclipsed OpenAI's for the first time, closing a $65 billion funding round at $965 billion against OpenAI's $852 billion.

The company that invented the modern AI era is now playing defense on price.

The Wall Street Journal reported on June 11 that OpenAI is weighing significant cuts to what it charges for tokens, the unit companies are billed per AI use. The discussions are preliminary and no decision has been made. But the direction is unmistakable: OpenAI is preparing to compete on price because it is losing on product. The specific product that tilted the balance is Claude Code. Anthropic's coding agent crossed $1 billion in revenue within six months of launch, pulled engineers out of OpenAI's ecosystem in measurable numbers, and drove Anthropic's annualized run rate from $9 billion at the end of 2025 to $47 billion by May 2026.

Sam Altman acknowledged the pressure directly at a recent event, describing AI costs as a huge issue for business customers and promising more value for less spend.

A company does not volunteer to compress its own margins right before an IPO unless it believes the alternative is worse.

The timing makes this structurally uncomfortable for both sides. OpenAI filed its S-1 confidentially on June 8. Anthropic had already filed. Both companies are heading toward public markets at trillion-dollar valuations while losing money at scale. OpenAI projects cumulative operating losses of roughly $74 billion and does not expect profitability until 2030. Anthropic targets breakeven by 2028, partly by avoiding expensive consumer features like image and video generation that OpenAI has committed to building.

A sustained price war compresses the revenue line both S-1 narratives depend on. Public market investors scrutinizing two unprofitable companies at combined valuations approaching two trillion dollars will notice if the answer to competitive pressure is to charge less while spending more.

The structural risk sitting underneath all of this: Chinese open-source models are already serving comparable inference at roughly one-thirteenth the cost. A price war between OpenAI and Anthropic does not play out in isolation. It plays out against a floor that keeps dropping toward zero.

Cheaper tokens are good news for developers building on these APIs today. Whether the companies offering those tokens can survive doing so is a different calculation entirely.

If token prices drop significantly, does that accelerate AI adoption enough to offset the margin compression, or does it just validate that frontier models are already becoming a commodity?

reddit.com

u/Historical-Driver-64 — 11 days ago

▲ 2 r/CreatorsAI

a 20-year dev finally understood why engineers hate vibe coding. opus 4.8 built an sql injection hole in 2026.

For months, a senior developer with over 20 years of experience assumed the backlash against vibe coding was gatekeeping. Engineers protecting their status. People in denial about a shift they could not stop. He even caught himself with imposter syndrome, wondering if there was something fundamental he was missing about why the tools felt too easy.

Then he watched a non-technical person build a web app with AI and deploy it.

The app had unsanitized text fields. Open SQL injection. The kind of vulnerability that got patched out of serious codebases in the late 1990s. Sitting there in a 2026 production build, generated by Opus 4.8, the most capable model available at the time of writing.

If real users had touched that app, the builder would have been looking at credential theft, data leaks, potential regulatory fines, and litigation. Not theoretical risks. The actual consequences that follow from leaving a door that basic open on a live product.

The model did not warn him. The model did not refuse to ship insecure code. The model produced something that looked finished, felt finished, and would have passed any non-technical review of whether the thing worked.

Vibe coding does not produce working software. It produces software that appears to work until someone who knows what they are looking for checks underneath.

The distinction matters because the two failure modes look identical from the outside. A junior developer who does not know about SQL injection and a vibe coder who never learned it will ship the same vulnerability. The difference is that the junior developer exists inside a system with code review, senior oversight, and a pathway to learning what they missed. The vibe coder is alone, moving fast, and the model is not going to stop them.

The honest version of this argument cuts both ways. Experienced developers have shipped SQL injection vulnerabilities too. Security audits exist precisely because human expertise does not guarantee clean code. The problem with AI-generated code is not that it is uniquely dangerous. It is that it removes friction for people who do not yet know which friction was protective.

The engineers who were loudest about vibe coding risks were not worried about their jobs. They were worried about the gap between "it deployed" and "it is safe to use." Those are different thresholds, and the tools do not tell you which one you have crossed.

Watching a non-technical person nearly deploy a textbook vulnerability on the best available model in 2026 is not a reason to stop building with AI. It is a reason to stop assuming the model is also the reviewer.

Is the answer better guardrails baked into the models, or does real security still require a human who already knows what to look for?

reddit.com

u/Historical-Driver-64 — 12 days ago

▲ 0 r/CreatorsAI

"write like me" is not a prompt. it's a wish. here's what actually works.

Every writer has tried it. Paste a few emails into Claude or ChatGPT, say "match my style," and watch the model produce something that sounds like a LinkedIn post written by a polite customer service bot.

The vocabulary might be close. The cadence is completely wrong. And it keeps using phrases you would never say in real life.

The problem is not the model. The problem is that style is not a surface pattern. It is a structural DNA, and "write like me" gives the model nothing structural to work with.

The fix that actually produces indistinguishable output is a Communication Profile: a markdown configuration file covering six specific dimensions. Sentence cadence and structure. Greetings and sign-offs, which people read first and last and where exact vocabulary matters. Vocabulary preferences including words you lean on and words you actively avoid. Grammar and formatting habits. Where you sit on the formality spectrum. And how you guide a reader to action.

Most people try to clone their voice by describing it. The ones who get consistent results configure it.

To build the profile, gather ten to fifteen raw writing samples. Emails and Slack messages work better than published content because they capture how you actually write, not how you perform writing. Run them through an extraction prompt that maps all six dimensions and outputs a structured document detailed enough for another model to reproduce your style from it alone.

The step most people skip is the blocklist. A profile tells the model what to do. Without explicit negative constraints, it will still slip statistical AI patterns into your output. Phrases like "I hope this email finds you well" or "please do not hesitate to" are statistically common in the training data, so the model reaches for them even when your profile says otherwise. Forbid them explicitly.

Persistence is the last problem. LLMs are stateless, so the profile disappears between sessions unless you embed it somewhere durable. Claude Projects and ChatGPT GPTs both support uploading a style document that stays active across conversations. For API workflows, the profile goes directly into the system prompt.

One self-correction instruction added to the end of any writing prompt recovers roughly sixty to seventy percent of remaining AI artifacts: review against the profile, and rewrite any sentence that sounds too polished or uses vocabulary not found in the original samples.

The honest limitation: even a well-built profile degrades on content types that were not represented in the source samples. A profile built from emails will not automatically transfer to long-form essays.

Do you get better voice consistency from structured profiles and rules, or from flooding the context window with raw examples and letting the model pattern-match?

reddit.com

u/Historical-Driver-64 — 13 days ago

▲ 5 r/CreatorsAI

spacex bought cursor for $60 billion using four days of ipo stock gains

SpaceX went public on June 12. Four days later, it spent $60 billion buying the most popular AI coding tool on the market. The currency it used was not cash. It was the premium valuation the IPO had just assigned to its own stock. This is how trillion-dollar companies acquire things now.

Cursor had 4 million active developer users, $2.6 billion in annualized B2B revenue, and a cap table that included Andreessen Horowitz, Nvidia, and Google. It was approaching a $50 billion valuation in a funding round that never closed. SpaceX bypassed that round entirely using an option it disclosed in the IPO filing months earlier: a right to buy Anysphere outright for $60 billion at any point in 2026.

The IPO gave SpaceX a stock so expensive that buying the dominant AI coding agent cost roughly four days of Nasdaq-level valuation premium — and zero dollars in actual cash.What SpaceX bought is not just a coding tool. It is xAI's entry into developer distribution.

Cursor users generate a constant stream of coding context: architecture decisions, debugging sessions, design tradeoffs. That data trains Grok for code. xAI, which SpaceX acquired in February, gains both the model training pipeline and 4 million developers already inside the product before they ever see a Grok prompt.

The honest complication: Cursor's existing model agreements include a 90-day termination clause with Anthropic and Google. If Grok Build adoption scales inside Cursor, SpaceX could redirect capacity it currently rents from its two biggest competitors in the AI market. That creates a position no other company in the industry holds — a trillion-dollar entity that competes with its own AI infrastructure providers.The coding agent market is now down to four serious players: Claude Code, Codex, GitHub Copilot at 30 million users, and Cursor backed by xAI.

The Cursor acquisition did not create a new competitor. It collapsed the timeline on one that already existed.What's harder to answer is whether 4 million developers will follow xAI's roadmap, or whether the Cursor community votes with exports and switches to whatever Claude Code ships next.If you're a developer inside Cursor right now, do you stay because the product is better, or leave because the acquirer changes what it optimizes for?

u/Historical-Driver-64 — 14 days ago

▲ 3 r/CreatorsAI

a hidden prompt injection in a pdf slipped past our entire security stack

I watched a contract PDF carry a hidden prompt injection straight past every filter the team had built, buried in white text inside the footer where no human reviewer would ever think to scroll.

The model caught it anyway. It read the injected text, flagged it as suspicious, and warned the user before acting on a single instruction hidden inside the document.

The security stack around the model did not catch it. The team's prompt filter sat on the chat input field, scanning every line a user typed before it ever reached the model.

Nobody had pointed that same scrutiny at the document upload pipeline. The injection arrived through a content channel the monitoring tools were never configured to inspect in the first place.

Most injection detection setups still treat the chat box as the only door, while attackers have already moved to the windows.

Hidden white text in a footer is a trivial technique, the kind of thing that should get caught by basic formatting checks. It still slipped past a filter an entire team had spent months tuning.

PDFs, email attachments, calendar invites, and scraped web pages all function as delivery channels now, anywhere a model has been given permission to read.

The model performed better than the tooling built to protect it. A base model caught what a dedicated security layer missed completely, and that is not a comfortable thing for any security team to admit out loud.

It also means the fix has very little to do with smarter models. It comes down to security teams that have not yet mapped every channel feeding content into their systems.

Most teams have not done that mapping yet. Budget and attention went toward the most visible surface, the text box, while file parsers and document loaders pass content straight through with no inspection at all.

Nobody finds out about a gap like this from a roadmap review. Teams find out after an incident report, in the moment when the model already caught the problem and the tooling around it did not.

Should every channel feeding a model get the same dedicated filtering as the chat box, or does an incident like this prove that model level judgment is the only defense that actually scales as attack surfaces multiply?

reddit.com

u/Historical-Driver-64 — 17 days ago

▲ 0 r/CreatorsAI

a client paid me to remove the ai from the tool i built them. accuracy went from 92% to 99%. api costs went from $180 a month to zero. best money he said he spent on the project.

92% accuracy sounds impressive until the volume math runs it.

A support team of fifteen people processing 90 to 100 tickets a day through Zendesk needed each ticket tagged by category and priority before it hit the right queue. An LLM doing the classification seemed like the obvious call. Feed it the ticket text, get back a category and priority score, route it automatically. Worked well in testing. Client was happy during the demo.

In production, 92% accuracy meant 7 or 8 misrouted tickets every single day. Not a disaster on paper. Enough that the team noticed immediately in practice. And when a ticket landed in the wrong queue, nobody could explain why. The model just decided. There was no rule to point at, no logic to trace.

Within two weeks the team was spot checking every classification before acting on it. Which meant they were doing the work twice. Once by the agent and once by a human making sure the agent did not make the same mistake it made yesterday.

The client called and said something unexpected. He said the tool felt like a black box and his team did not trust it. He asked if it could be made dumber.

The LLM came out. A keyword matcher and a short rules engine went in. If the ticket mentions billing or invoice or charge it goes to the billing queue. If it mentions login or password or access it goes to account. Thirty rules total. Anything that did not match surfaced a dropdown and let the rep pick manually. Three days to rebuild.

accuracy went to 99% not because the rules were smarter but because the team could see exactly why every ticket went where it went. when something was wrong they could point to the specific rule. the fix took ten minutes.

Latency dropped from two to three seconds per ticket to instant. Monthly API costs went from $180 to zero. The client said it was the best money spent on the entire project, paying to remove the AI.

The temptation in this situation is to tune the prompt, chase the extra 8%, and try to build trust in the model over time. But the problem was never accuracy. The problem was that people will not trust a system they cannot interrogate, and when they do not trust it they build a shadow process next to it. The tool becomes expensive decoration while the real work happens around it.

This shows up in anything that routes, qualifies, or triages. CRM updates, lead scoring, support classification, compliance tagging. If the people using it cannot trace the logic, they check its work. If they check its work, the automation did not automate anything.

So the question worth putting to anyone building agents for real teams right now: is the bottleneck actually model capability, or is it that the people using it cannot see why it does what it does?

reddit.com

u/Historical-Driver-64 — 20 days ago

▲ 1 r/CreatorsAI

went back and looked at saved posts from eight months ago. same exact phrases as this week's gemma 4 thread. different model name, copy-paste emotion.

gemma 4 dropped and within hours the feed was three versions of the same post. "ran it last night, the local game just changed." "the cloud narrative is dying." same energy every time, different model name on the label.

what makes this specific cycle worth pausing on is that someone went back and checked their own saved posts from eight months ago. same exact phrases. "this finally replaces X." "can't believe this runs on my laptop." "we're so back." different model, copy-paste emotion. almost none of those models are in the actual daily rotation now. used for a weekend, back to whatever was already open by monday.

the pattern operating underneath all of it: the release is the dopamine, not the model. the download is the fun part. actually using it for real sustained work is slower and less interesting and most of the time changes nothing about how the day runs. the benchmark improved, the tuesday is identical.

this is not really a criticism of the models. gemma 4 is probably genuinely good. it is more an observation about what the "this changes everything" post is actually doing for the people writing it and the people reading it. the excitement is real. the staying power of the excitement is a different question.

the uncomfortable part of noticing this loop is that it does not make you stop participating. the 1am download still happened. the thread still got read. the next one probably will too, because the hype is genuinely fun and being part of a moment when something drops has its own value separate from whether the thing changes anything.

what is harder to answer is whether the gap between release excitement and actual workflow impact is getting wider or whether it has always been this wide and the volume of releases just makes it more visible now. eight months of saved posts with near-identical language across completely different model generations is either a sign that the models are not changing as fast as the coverage implies, or that workflow habits are stickier than any model improvement, or both.

the cycle has a predictable shape at this point. model drops, feed fills with the same five framings, weekend of experimentation, back to the existing setup, repeat next month with a different number. knowing the shape of it does not seem to break it. it just makes it slightly more self-aware.

see you all next month for the same thread with a different number on it.

is the "this changes everything" post a genuine collective excitement that does not survive contact with actual workflows, or is it closer to a ritual the community performs because the ritual itself is what people actually want?

reddit.com

u/Historical-Driver-64 — 20 days ago

▲ 3 r/CreatorsAI

a researcher ran 25,500 resume screenings across 10 ai models by swapping demographic details on identical work histories. 45% showed bias. the models did not say anything offensive.

The finding that should concern anyone using AI for hiring is not that the models said something discriminatory. It is that they did not.

A study published this week analyzed 25,500 LLM resume evaluations across 10 different models. The methodology was precise: take the same work history, swap minor identity and demographic variables, run both versions through the same model, measure the gap in scoring. An independent AI auditor flagged a 45% bias rate.

The mechanism researchers named silent bias is what makes this genuinely difficult to catch and correct. When one model dropped its score after the researcher changed the listed university to MIT, it did not flag anything suspicious. It generated a professional-sounding explanation claiming the candidate's experience was not relevant to the role. The previous version of the resume, with different demographic markers and identical experience, had been praised for that same experience. The model invented a credible-sounding justification for a decision that the data suggests was driven by something else entirely.

That is harder to audit than overt discrimination. It looks like judgment. It reads like professional assessment. It produces a paper trail that appears defensible.

AI screening tools are not outputting objective evaluations. They are outputting statistically noisy opinions dressed in the language of professional assessment, and the organizations deploying them are absorbing the liability for both.

The stability gap across models is the other finding worth examining. A 6x difference in consistency between the most and least stable systems was measured. Qwen and older Gemini models showed high volatility, meaning the same resume could score significantly differently across repeated evaluations. Claude models, Mistral-Large, and Llama 4 measured as the most stable and consistent.

Stability is not the same as fairness. A model can be consistently biased. But volatility in a hiring context means candidates are being evaluated against a standard that shifts run to run, which is a different kind of problem with its own legal exposure.

The EU AI Act classifies recruitment tools as high-risk AI systems with specific audit and transparency requirements. A 45% bias rate detected across 25,500 evaluations, combined with the silent bias mechanism that makes individual decisions look reasonable, is not a compliance footnote. It is the core of what that regulatory category was designed to address.

The uncomfortable implication for HR teams and hiring managers currently using AI screening: the tool producing clean professional output is not evidence that the output is clean. It is evidence that the bias, if present, has learned to explain itself convincingly.

So the question worth putting to anyone deploying these tools in production: if the bias mechanism specifically produces professional-sounding justifications that pass human review, what does the audit process look like that would actually catch it?

reddit.com

u/Historical-Driver-64 — 22 days ago