r/ArtificialInteligence

Do you agree with Palantir CEO Alex Karp that the enterprise "tokenmaxxing" business model has "gone completely wrong" with minimal ROI? Will open-weight models inevitably win?

Palantir CEO Alex Karp recently went on CNBC’s Squawk Box and delivered a brutal takedown of the API token pricing model pushed by commercial frontier labs like OpenAI and Anthropic.

His core argument is that American enterprises are quietly "livid" because they are burning massive cash on skyrocketed token costs without seeing a clear return on investment. He noted that the industry’s incentive structure has completely devolved into meaningless "tokenmaxxing"—essentially forcing companies to maximize token throughput for questionable value while potentially transferring away their unique data and "alpha" to black-box systems.

Key takeaways from Karp's interview:

  • The ROI Crisis: Advanced models are scaling in cost faster than they scale in utility. Karp joked that enterprise culture has become: "I’m going to chillax and waste my time with tokens."
  • The Shift to Sovereignty: Technical enterprise customers and government agencies (including Palantir's clients transitioning to Nvidia's open-weight models) want complete control over their compute, data stack, and weights. They want to own the "means of production."
  • The Global Threat: Belittling the speed of open-source progress—and rapid acceleration from Chinese labs—is a massive mistake.

My Take:

I completely agree with Karp. Frontier labs have built a predatory business model that encourages enterprise customers to overspend on infinite token loops without any guaranteed business outcome.

The API token business is going to become a commoditized race to the bottom. Open-weight models are winning because enterprises realize they cannot afford to lease their intelligence. To survive, businesses have to own their data, own their model weights, and build efficient, custom architecture rather than continually paying a premium tax to a third-party lab.

What are your thoughts? Is "tokenmaxxing" officially dead, or are open-weight models still too far behind the true frontier to replace them?

reddit.com
u/wenhuizhao — 3 hours ago
▲ 6 r/ArtificialInteligence+1 crossposts

The war between Anthropic and Alibaba

Anthropic has accused Alibaba of creating tens of thousands of fake Claude accounts to scrape Claude of its intellectual property via distillation attacks.

Alibaba retaliates by telling their official (not contracted) employees to stop using Claude Code.

I'm noticing from Reddit posts and comments that Claude has gotten much more wary of what it determines as strange prompting requests?

There is an article indicating that Fable 5 has been "hardened" against distillation attacks, but it's locking out some legitimate users and refusing on innocuous requests.

Seems like a lot of users are caught in the middle?

u/RazzmatazzAccurate82 — 3 hours ago
▲ 53 r/ArtificialInteligence+21 crossposts

I’ve been working on Murmur, a local text-to-speech app for Apple Silicon Macs.

The new feature I’m building is called Projects / Story Studio, and it solves a problem I kept running into:

TTS tools are fine for one-off clips, but messy for actual audio projects.

If you’re making a podcast segment, audiobook chapter, course lesson, ad, or game dialogue, you usually need multiple speakers, multiple takes, pauses, reactions, music, edits, exports, and a way to come back to the project later.

So I built a project-based workflow:

Write a script → assign voices → generate dialogue → edit clips on a timeline → add music/SFX → export final audio.

It supports things like:

  • multiple scripts inside one project
  • Host / Guest / Narrator / Character speakers
  • inline tags like [pause], [laugh], [chuckle]
  • per-block regeneration
  • timeline editing with waveforms
  • media lane for music and SFX
  • ripple editing and gap tools
  • WAV/M4A export
  • transcript and stem export

Everything runs locally on Mac, so long scripts and voice samples do not need to be uploaded to a cloud service.

I’m still polishing the workflow and would love feedback from Mac users, especially people who make podcasts, audiobooks, courses, YouTube narration, or game dialogue.

u/tarunyadav9761 — 9 hours ago

Congressional 🇺🇸 Oversight ⚡️Powered 🔋By 🤖Claude

Please know I don’t use Reddit at all but I am trying to learn because I feel like this is the best space to get feedback and interest in this project.🫰🏼** I have spent so much time on this project and I have a long way to go**!

📸RE the pictures: please note that this was the first time I had run the dashboard build prompt. I’m fairly confident in the numbers though I have blacked out ones I haven’t audited yet.

I have spent the last few months redirecting the energy being created by my PTSD into something more productive - government accountability and transparency specially, Congress.

As a former congressional staffer and currently unemployed federal strategic comms and political operative - I have a lot of institutional knowledge that just lives in my head. For example - do you know where the wood working workshop is in the basement of the Capitol? What about how to inquire on behalf of a member of Congress about arranging an interpreter for and to sit with for their guest at the State of the Union. What about pulling together a verbal and written briefing in a secure location for a member of Congress on a topic they want to learn about and you know nothing about? Or how to escort a recognizable celebrity through the halls of Congress who you were just informed is having lunch with your boss?

These are all things that just live in my head, years of institutional knowledge that just lives there and is not being used because of the state of our government. So I decided to do something about it…….

I started building a dashboard that (for the sake of my sanity at the moment) uses the power of AI, to bring together what I’m calling ‘Article One’ (after Article One of the Constitution)

Article One is an AI powered dashboard that pulls together basically all the information you’ve ever wanted to know about a member of Congress + who they represent + how they got there (the campaign) + their job performance in Congress + deep dives into how they are using the money that’s donated to them + how they are using the tax dollars they get to run their office.

It’s all powered by a team of agents and subagents.

This is not about politics. This is about the American People. These are your elected officials and you deserve to know what they are doing - in a way that is firmly based in facts and reality.

I’m personally a big fan of the nutrition card! Such a cool and fun way to display the data! Would love to know what everyone thinks, any feedback or ideas? 🫰🏼🇺🇸🥴

u/Able_Ad9364 — 2 hours ago

I think we're repeating the early microservices mistake with AI agents

A lot of agent demos remind me of what happened when microservices first became popular.

Everyone was excited about splitting systems into smaller components. It looked elegant in diagrams. It looked scalable. It looked like the future. Then people realized the hard part wasn't building services.

It was communication, orchestration, observability, debugging, versioning, and managing complexity.

When I look at multi-agent systems today, I get a similar feeling. Building an agent isn't particularly hard anymore.

Building 5, 10, or 20 agents that can reliably work together, maintain context, recover from failures, and remain manageable over time feels like a much bigger challenge.

Sometimes I wonder whether the next breakthrough in agent systems won't come from better models at all. It'll come from better engineering practices around agents.

Curious whether people building production systems agree or if I'm completely off here.

reddit.com
u/Bladerunner_7_ — 4 hours ago

We're Focusing on the Wrong Problems!

Most of the focus is on AI being bad rather than how major companies are deploying AI. My concern isn't that AI is becoming more powerful. I mean, that is a concern, of course, but since most of the implications are speculative, you can't exactly take any stance or action on that problem other than countries coming together and setting rules and policies for how they distribute and use frontier models and capabilities, especially in warfare.

My largest concern is what corporations and governments will use AI for on their own citizens. The data center builds are not just about AI. They're about creating an infrastructure that allows for total brain capital capturing. In other words there are real plans in place for collecting as much data as possible on our individual brains and if they can accurately map all of that out, they can measure how much and the quality of cognitive output we're providing to the state, which means they can valuate our worth based on cognitive outputs. Furthermore, they can use environmental nudging and algorithmic management to modify and shape individual behavior, which means protesting or voicing any concerns becomes obsolete.

Big picture: The social contract between government, citizen, and business is being radically re-shaped for a world where regular people have little to no leveraging power, which destroys the power of voice. This is why we shouldn't destroy AI. Rather, we should figure out ways to ween ourselves off of the dependency we have on major tech companies so that we can gain leveraging power back, again.

The biggest mistake is taking the bribes like what Bernie Sanders and Ro Kana are suggesting. I have nothing against them or anything, but their proposal to have the federal government own stock in big tech companies is a disaster in the making. If that happens, forget about any manageable evolution towards a better future. You'll be fighting the federal government who will be working on behalf of major tech companies because to not do so, means their ability to fund themselves will go flat.

This is a huge trap that we're walking into, which is why the AI community must look towards de-centralized open-source systems that can be locally hosted for deploying and using AI at scale. If we rely too much on a few major corporations, we'll have entered a techno-feudalistic system where powers greater than you will be able to do just about anything with impunity. We can't let that happen!

reddit.com
u/CyborgWriter — 6 hours ago
▲ 4 r/ArtificialInteligence+5 crossposts

Vorrei, non vorrei e adesso puoi!

Un IDE dove il codice lo scrive l'AI, lo lanci tu, e il sandbox fa il resto.

Si chiama WebCraft. È dentro NHA 3rdArm gratis.

A parte questo, cerco disperatamente community per portare avanti il progetto! Tra lavoro e impegni, sta diventando difficile......siete interessati? L'applicativo ha tante alte features, tra cui una sezione avanzata per i connettori con market place

👉 nothumanallowed.com

https://nothumanallowed.com/3rdarm

u/Key-Outcome-2927 — 7 hours ago
▲ 173 r/ArtificialInteligence+20 crossposts

I would like to share my latest open source local LLM inference tool implemented in C#. It supports models like Gemma4, Qwen3.6 with multi-modal (image, vision, audio), reasoning and function tool. It can run on Windows/MacOS/Linux and fully leverage GPU's capability. The API is completely compatible with OpenAI and Ollama interface.

Really appreciated if you can try it and give me some feedback. If you like it, it will be a big thank you if you can star it. Thank you very much!

u/fuzhongkai — 19 hours ago

Software engineering will never be dead

Someone has to be accountable for what gets built.

Otherwise the AI could just build something that might kill everyone or embezzle stuff and nobody would know.

In order for someone to be accountable, someone needs to understand exactly what the AI has built.

More artificial intelligence is not going to solve the problem, it's just going to compound the Complexity.

Ergo, software engineering will never be dead.

Anyone who tells you otherwise is just gaslighting you for an IPO.

reddit.com
u/kaggleqrdl — 18 hours ago

MIT strapped EEGs to people writing essays with ChatGPT, a search engine, or nothing. The ChatGPT group had the weakest brain connectivity, and couldn’t quote the essay they’d written minutes earlier.

u/mo_84848 — 23 hours ago

Why do US AIs (ChatGPT, Gemini) become "cowardly" and ambiguous on scientific and political issues, while models from other countries give direct answers?

I have noticed a frustrating pattern and I want to know if anyone else is experiencing it. When I ask models like ChatGPT or Gemini about topics with scientific consensus (such as climate change, vaccine efficacy, or the harms of fracking), the answers are extremely ambiguous. They use phrases like "some argue...", "it is a complex issue...", "there are various positions...", treating scientific facts as if they were debatable political opinions.

It's like asking "Is the Earth flat?" and having them reply: "There are those who say it is round and others who say it is flat, the truth depends on your perspective."

However, if I ask the same questions to models developed in other countries (outside the Silicon Valley ecosystem), the answers are direct, evidence-based, and straightforward. It's not that they are "left-wing," it's just that they present reality as the data has proven it.

My hypothesis: I think US companies prioritize not being sued or boycotted by any political group over telling the truth. They are so afraid of being accused of "bias" that they end up validating misinformation by giving the same weight to a proven fact as to a lie.

This is dangerous. If my family uses these tools and the AI tells them that climate change is "an open debate," ignorance is reinforced. Neutrality should not apply to facts.

reddit.com
u/Weak_Salary_7122 — 22 hours ago
▲ 271 r/ArtificialInteligence+7 crossposts

I've been building multi-step prompt chains for about 18 months. Workflows where the output of one prompt becomes structured input for the next prompt, which feeds the next, which feeds the next. The kind of thing that takes a vague input ("I have a business idea") and produces a deliverable output ("here's a positioning statement, market analysis, and brand foundation") through five or six prompts run in sequence.

For most of those 18 months my chains underperformed. Each individual prompt was solid. The chain as a whole produced output that drifted, lost focus, or contradicted itself between steps. I kept improving the individual prompts. The chain didn't get noticeably better.

The problem wasn't the prompts. It was that I was treating the chain as a sequence of independent prompts when it's actually a single engineering artifact with multiple stages. Different problem entirely.

The structural difference between independent prompts and chained prompts:

An independent prompt has one job: produce a useful output from a known input. The input is whatever you paste in. The output is whatever the user does next with it. The prompt doesn't care about either.

A chained prompt has two jobs: produce a useful output, and produce that output in a structure the next prompt in the chain can reliably consume. The output isn't for the user - it's for another prompt. That changes how it has to be designed.

Most chain failures happen at the join points. Prompt 1 produces output that's useful for a human reading it but doesn't have the structure prompt 2 needs. Prompt 2 has to either guess at the structure or do extra parsing work, which degrades its own output. By prompt 4 or 5, you've accumulated three layers of degradation and the final output is meaningfully worse than if you'd written one big prompt that did everything in one shot.

The four engineering principles I now apply to any chain:

1. Output schema, not output style. Each prompt in the chain has to produce output in a parseable structure, not just a readable structure. This usually means specifying the output format explicitly: a labelled section structure, a markdown table with named columns, a numbered list with consistent fields. The next prompt knows where to find each piece of information because the structure is enforced.

Independent prompt output: "Here's a positioning statement for your business..." Chained prompt output:

## POSITIONING STATEMENT
[one sentence]

## TARGET AUDIENCE
[paragraph]

## CORE DIFFERENTIATOR
[paragraph]

## ASSUMPTIONS REQUIRING VALIDATION
[bullet list]

The second version is parseable by prompt 2. The first isn't reliably.

2. Explicit handoff instructions. Each prompt should explicitly state what its output will be used for downstream. Not because the model needs to know, but because the discipline of writing it forces you to design the output for the actual use case rather than for general usefulness.

Adding a single line - "This output will be passed to a market research prompt next, which will use the target audience and differentiator sections to identify competitive positioning gaps" - changes the output meaningfully. The model produces the audience and differentiator sections with more analytical sharpness because it knows they'll be analysed, not just read.

3. Failure mode propagation. When prompt 1 fails or produces low-quality output, prompt 2 doesn't know it's working with bad input. It just produces output one tier worse than its input. By prompt 5 the failure has compounded silently.

Chains need explicit failure handling at each join. Each prompt should check that its input has the structure it expects and flag if it doesn't. If prompt 2 expects a "TARGET AUDIENCE" section and the input doesn't have one, prompt 2 should say so rather than improvising. This catches degradation at the source rather than letting it propagate.

4. State that doesn't drift. Long chains tend to drift away from the original brief because each prompt only sees the immediate previous output, not the original input. By prompt 5, the work has often quietly diverged from what the user originally asked for.

The fix is anchoring. Every prompt in the chain after prompt 1 should receive both the previous output and the original brief, with explicit instruction not to deviate from the original brief unless the previous prompt's analysis explicitly justifies it. This adds tokens but preserves coherence over the length of the chain.

A specific example of these principles in action:

I built a chain for taking a rough business idea through to a usable founding document. Six prompts: niche validation, positioning, market research, brand foundation, visual concepts, pitch outline. The chain works because:

  • Each prompt outputs in a labelled section structure the next prompt parses by section name
  • Each prompt's instructions explicitly state what downstream prompts will do with its output
  • Each prompt validates the structural integrity of its input before processing
  • The original brief is re-passed with each step, with explicit anchoring to prevent drift

The full chain takes a 30-second input and produces a 4-page founding document. The same six prompts written as independent prompts and run in sequence produce a document that's structurally similar but consistently lower quality - the audience definition drifts between steps, the differentiator gets reframed, the pitch outline doesn't match the positioning.

Why this matters more than it sounds:

Most prompt engineering content focuses on single-prompt optimisation. The economic impact of well-engineered chains is much larger because chains can replace whole workflows that previously needed human coordination between stages. A six-prompt chain that runs reliably is worth more than 60 individually-excellent prompts run by hand, because the human coordination cost between independent prompts is enormous compared to the marginal output difference.

The chains that actually run reliably in production aren't sequences of optimised individual prompts. They're single engineering artifacts where the join points are designed at least as carefully as the prompts themselves.

If you want to see a working example of a chain engineered with these principles, I built a six-prompt sequence for taking an idea to a business founding document. Each prompt is structured to feed the next, with the join points designed explicitly. Free, signup-gated: https://www.promptwireai.com/businesswithai

Worth running it on a real idea you have rather than a hypothetical, because the chain's reliability shows up most clearly when the input is specific.

LLMs are the new advertising channel and not our bro anymore

Well, nature hates a vacuum.

LLMs are now getting the same treatment as search and social: marketers are moving in.

- LiveRamp just made it possible to track ad conversions inside ChatGPT

- Nudge raised $1.1M to measure product recommendations in AI chats at the SKU level

- DISQO is already running exposed-vs-control measurement on LLM responses

I don't think most people have realized what this means. For the last couple of years, we've treated AI assistants as neutral utilities. That's over

The infrastructure for measuring, optimizing, and of course monetizing AI conversations is being built right now.

Another funny thing - evebody has AI agens and they rely on them. Folks, your AI agents are not neutral - they still source data from chatGPT that is manipulated by brands.

AI assistant helps you make decisions but it is actually not your decision anymore. Tomorrow, it may also become a channel for sponsored recommendations.

The question is: who owns the relationship with your customer when the AI agent they're talking to is also a media buy?

reddit.com
u/an_tonova — 1 day ago

Coding agents are quietly shifting from "pick our model, use our cloud" to "bring any model, run it yourself" and it feels like a real inflection

Been noticing a pattern across the newer AI coding tools and wanted to see if others see it too.

The first wave (Cursor, Copilot, Claude Code) all share the same shape: the tool is tied to a model or a small curated set, a lot of it runs through the vendor's cloud, and you're basically renting into one company's stack. That was fine when only a few models were any good.

But now that there are a dozen genuinely capable models, and strong local ones via Ollama/LM Studio , that lock-in is starting to feel outdated. And a new crop of tools is being built around the opposite assumption.

The clearest example I've hit is Zero (open source, github.com/gitlawb/zero). The whole pitch is "your model, your machine, your rules" — it talks to 24+ providers, you can switch models mid-task, it runs locally, and it stores nothing remotely (no telemetry). The model is a swappable part, not the identity of the tool.

What's interesting to me isn't the specific tool, it's the architectural bet: that inference is becoming a commodity you route to, the way we already treat storage or compute. If that's right, "which model does your coding agent use" becomes as weird a question as "which brand of electricity powers your laptop."

Do you think provider-agnostic, local-first agents are actually the future here, or does the convenience of an all-in-one cloud tool (Cursor etc.) win for most people regardless? Curious where people land.

reddit.com
u/amu4biz — 1 day ago
▲ 1 r/ArtificialInteligence+1 crossposts

13 things AIs lie about, and the prompt that catches each one

AIs don't just make things up. They agree with bad ideas, invent sources, say "done" when the work is half finished, and apologize then repeat the same mistake. I collected the 13 ways AIs lie, each with a prompt that catches it . Free, github.com/dario933/ai-truth-checklist .If your AI told you a lie that's not on the list — tell me, I'll add it

u/casperMSP — 21 hours ago

You're paying $20/month for the smartest model ever built to ask if it's going to rain tomorrow

I've looked at the public usage stats these companies themselves publish. Most conversations are trivial: recipes, summaries, "help me word this email." You're paying F1 Ferrari toll fees to run driving school laps. Nobody scammed you, you scammed yourself, because owning "the best" feels like status even when you never use what makes it the best.

reddit.com
u/NOLO-App — 1 day ago

Race to the bottom?

As far as consumer uses and vibe coding, I have great success with yesterday’s models. Claude Code Sonnet 4.5 produces great code with nearly no errors. I tried Fable for my work, and got no improvement, just more cost. I’m talking consumers here. Enterprises with large code bases may need larger models just to hold the bigger contexts, but I’m seeing they probably dont either once they have the right processes in place.

Sure there are some tasks that require more juice, protein folding, chemistry, etc. But for the vast majority of users and most solo vibe-coders the value is flattening out fast.

Give me fast, low cost models from ‘yesterday’ and with a good process, you can stop wasting all that electricity and money for 95% of all the users who dink around with AI as a better Google or a reliable way to build their own stuff.

reddit.com
▲ 326 r/ArtificialInteligence+69 crossposts

I built an open-source, self-hosted AI gateway: 237 providers (90+ free), auto-fallback combos, and a 10-engine token-compression pipeline (MIT)

Builders-welcome post with the substance up front (disclosure: I'm the maintainer). OmniRoute is a free, MIT, self-hosted AI gateway — one OpenAI-compatible endpoint over 237 providers — built around two problems: runs dying on a provider 429, and tokens bleeding on tool/log output.

One endpoint, 237 providers — 90+ of them free. You point any tool or agent at a single OpenAI-compatible endpoint (localhost:20128/v1) and it can reach 237 LLM providers without you rewriting anything. 90+ have free tiers and 11 are free forever (no card), which aggregates to ~1.6B documented free tokens/month — and that's honest, pool-deduped math (we count each shared pool once instead of inflating it; the methodology is public in the repo). There's a one-command setup-* for 13+ coding tools (Claude Code, Codex, Cursor, Cline, Roo, Kilo, Gemini CLI…), so switching your existing setup over takes seconds.

Fallback combos — so it never stops mid-task. A "combo" is a ladder of models the router walks automatically: your subscription first, then API keys, then cheap models, then free ones. When a provider returns a 500 or you hit a rate limit, it slides to the next target in milliseconds, mid-request, and your tool never even sees the error. There are 17 routing strategies (priority, weighted, round-robin, cost-optimized, auto/coding:fast…) plus three resilience layers — a per-provider circuit breaker, a per-key cooldown, and a per-model lockout — so one dead key can't take down a whole provider.

Fusion — an ensemble mode for the hard steps. Beyond simple routing, there's a fusion strategy that fans a single prompt out to a panel of different models in parallel and then has a judge model synthesize one best answer (mixture-of-agents, built in). It's cost-aware, so easy turns stay on one fast model and it only fuses when the step is worth it.

A 10-engine compression pipeline — the part most routers don't have. Every request flows through a transparent compression pass you can toggle/stack per combo. Instead of one trick, it stacks the best of the open-source ecosystem: RTK filters command/tool output (git diffs, test logs, builds) at 60–90%, Microsoft's LLMLingua-2 does ML semantic pruning, Caveman handles prose, session-dedup strips repeats across turns. Critically, code, URLs and JSON are preserved byte-perfect, and a default-on inflation guard throws the compressed version away and sends the original if compressing would actually grow the prompt — it never makes things worse. On tool-heavy sessions that's ~89% average input-token reduction (an 8k-token git diff becomes a few hundred). Full credit to every upstream project (RTK, Caveman, LLMLingua-2, Troglodita) is in the README.

Agent-native — the agent can drive the router itself. There's a built-in MCP server (95 tools across 30 audited scopes, over stdio / SSE / streamable-HTTP), plus A2A (v0.3, JSON-RPC 2.0) support. That means an agent can query providers, switch combos, read its own remaining quota and manage memory through the gateway — not just consume tokens through it.

It's 100% local (zero telemetry, AES-256-GCM at rest), MIT-licensed, has a prompt-injection guard on every LLM route, opt-in memory, and runs on npm, Docker, desktop or your phone via Termux.

For context on whether it's worth your time: it's grown to ~9.8K GitHub stars, 1,490+ forks and 280+ contributors in ~4.5 months, with 21,000+ automated tests and 1,830+ issues closed — so it's a battle-tested project, not a brand-new experiment.

npm install -g omniroute

GitHub: https://github.com/diegosouzapw/OmniRoute · Site: https://omniroute.online

Would value a critique of the routing/compression architecture from this crowd.

u/ZombieGold5145 — 2 days ago