u/ProfessionalJackals

DeepSeek v4 Pro 75% off is now permanent.

https://x.com/deepseek_ai/status/2057854261699195173

DeepSeek just made the 1/4 discounted price for v4 Pro permanent.

Attribute deepseek-v4-flash deepseek-v4-pro
PRICING – 1M INPUT TOKENS (CACHE HIT)(2) $0.0028 $0.003625 (75% off(3)) / $0.0145
PRICING – 1M INPUT TOKENS (CACHE MISS) $0.14 $0.435 (75% off(3)) / $1.74
PRICING – 1M OUTPUT TOKENS $0.28 $0.87 (75% off(3)) / $3.48

It increases the gap with the frontier (Sonnet/GPT 5.4) models to a 12 to 17x difference. And we are not even talking about the cache hit, where the difference is easily 60 to 80x cheaper. And DS models are very good at hitting those caches.

That is how you draw in customers Microsoft!

reddit.com
u/ProfessionalJackals — 11 hours ago

DeepSeek Cache reuse between models?

OpenCode Go has a rather good log where you can see your cache usage, input (=cache-difference), output. So it allows you to see a more detailed view of what is going on between different Providers and Models.

Not only is DS4 Flash insane cheap, and very good in general. What really shines is the Cache handling. While the inner model excellent cache handeling with 90%+ is well known. There is another small detail that gets exposed via OpenCode Go their logs.

Example Pro > Flash

  • Use DS4 Pro to Plan something.
  • Use DS4 Flash to have the plan implemented.

You expect that Flash will rebuild its own cache. All the other models lose their cache when switching between the lite/normal vs pro version.

Yet ... you see a cache reuse going on, as Flash maintained over 70%+ off the Pro its cache. It does not happen every time, but seen this now multiple time that it mains 10 to 70% of the cache.

Example Flash > Pro

  • Use DS4 Pro to Plan something.
  • Use DS4 Flash to have the plan implemented.
  • Use DS4 Pro to verify the work.

Here it happened again ... DS4 Pro maintained over 70% of the DS4 Flash. Reducing the cost on the initial cache buildup by around that same price...

It seems when your switching the models, if they are hosted on the same servers. And you can end up recycling your existing cache layer. Never seen this with the other models.

Kimi 2.6 had no cache surviving between switching models. But then again, it even did not maintained its internal cache in the same prompt. Making the price skyrocket by a factor of 6x on some prompts (like its being reset in the background or your jumping between servers).

MiMo 2.5 / Pro had no cache surviving between switching models.

Some interesting thing to notice... Makes you wonder if you can fill up the cache with the Flash model and then switch to Pro. Tried it a few times, but its rather random. Sometimes it works with a large cache, sometimes it remember part of the cache, other times nothing survives. So your not always landing on the same server nodes.

Anyway, this just reinforces that DeepSeek is insane cheap for what they are offering.

reddit.com
u/ProfessionalJackals — 6 days ago

Opus 4.6/4.7 vs GPT 5.5 ... Is 5.5 insane cheap for GH/Copilot?

There is some interesting data in the monthly April report. Extracted from my report and reverse calculated...

  • Opus 4.6: 855 PRU > 64.745,546 ACIS > +$579,49
  • GPT 5.5: 338 PRU > 1.349,24 ACIS > +$13,49

PRU = Premium Request * Multiplier. ACIS = Processing used by Microsoft?

  • Opus 4.6: 855 PRU / 3 = 285 Premium Requests
  • GPT 5.5: 338 PRU / 7.5 = 45 Premium Requests

Average:

  • Opus 4.6: $579.49 / 285 = $2,033 per Premium Request
  • GPT 5.5: $13,49 / 45 = $0,299 per Premium Request

Anybody else gotten this strange result in the cost calculation? I understand that people will give me the "but you do not know how many tokens you used". Sure, but my style of programming did not suddenly change.

We are missing what the token based plan actually means in regards to the actual models being used? Because there seems to be a large ACIS differences per model..

Even the GPT-5.3-Codex vs GPT-5.3-Codex (Auto) show some strange differences. My sample base is too small but auto is calculated insane cheap. My difference is like 30x but that can be because of task difference.

Anthropic models are calculated with very high ACIS, even Haiku is extreme expensive. Where as GPT models are cheap.

But even then i see GPT 5.5 being extreem cheap, vs GPT 5.3 Codex, almost like some models are run on different locations, what influences the cost? As in Microsoft is running GPT 5.5 on their own infrastructure but is using GTP 5.3 Codex externally?

Opus 4.7 is calculated about 2.5x as expensive as Opus 4.6.

While we gotten the new subscription prices, we are missing actually the ACIS cost for each model. And it feels like Microsoft wants to be able to easily adjust the "token cost" on each model more on the fly.

Can people take a look at their actual cost breakdowns?

reddit.com
u/ProfessionalJackals — 10 days ago

The future of Copilot Credit system?

I suspect that we will see some kind of tiered credit system. Probably something like Xiaomi's subscriptions.

Where you get a subscription:

Month

  • $10: 10 million credits / Per month
  • $40: 50 million credits / Per month
  • $200: 300 million credits / Per month
  • ....

Year

Maybe 10 to 20% discount on the price itself, for people who buy a year.

  • $8*12: 10 million credits / Per month
  • $35*12: 50 million credits / Per month
  • $150*12: 300 million credits / Per month
  • ....

Again: These are made up numbers ... But you get the idea where they can go with this.

Overcharge

If you consume more then your subscription allowed credits, your charged at the token/credit price of the model.

As in very expensive, what forcing people automatically into getting the larger (then they may need packages). What in return means more "unused" credits over the bulk of the client base.

Premium Models

Probably some kind of credit modifiers like we currently have, where more expensive models get different multipliers.

  • GPT 5.3 = 1.0:1
  • GPT 5.4 = 1.5:1
  • GPT 5.5 = 2.0:1

It kind of naturally forces people into using more efficient and cheaper models by itself but also keeps the door open so people do not run away to OpenAI/Anthropic directly.

Other models

And with MS Being able to run Chinese models on their own servers, that opens another avenue of cheap models they can offer on their self run servers.

Enterprise

Even Enterprise customers may look at the door, if they see the actual real price of a pure 1:1 billing system, even with the rumored discounts. But if they can combine seats with shared credits, that benefit from higher paying seats even more... You see where the natural effect comes into play. You have a 1000 seats but only 100 people just hammer the system. Sure, you can keep a 900 $10 accounts, and get 100 $200 ones and pay the EXPENSIVE overcharge. Of you can upgrade everybody to 1000 $40 accounts, and share the load.

Visual Studio Code Agent window

The fact that we see Visual Studio Code with the new Agent window, that literally screams for a system like this.

Because anything as a pure 1:1 credit/token system will be so insane expensive, that MS will have wasted billions in subsidizing to grow a client base, that it will lose to its "competitors".

Pure speculation but that is the only way the whole Copilot system will not collapse on itself. And removes the misuse angle of the premium prompt system. While pushing for larger payments without killing off the entire non-Enterprise customer base.

Ironically, i suspect that Anthropic and OpenAI will also move to a system like this.

| And yes folks, before people complain (love you) over my numbers, they are made up. But shows how they can pull off something like this.

reddit.com
u/ProfessionalJackals — 12 days ago

Here are some basic test result to see how useful OpenAI Codex Plus is at $20.

  • OpenAI Codex plugin for Visual Studio Code
  • 6 prompts with GPT 5.5 High
  • 5m to 12m for each
  • Session limit: 21/100%
  • Week limit: 88/100%

Codex Plus gives you about:

  • 1h to 1h20 out of a 5h session.
  • 5 to 6h out of a week session.

So my conclusion is that you require the $100 Pro, to get a work week (5 workdays) out of Codex.

Issue:

  • The diff compare is much less integrated into Visual Studio Code
  • OpenAI's GPT 5.5 felt a lot slower then Microsoft Copilot GPT 5.5. Maybe its less visual feedback, or the lack of using sub-agents?

I don’t need to use GPT 5.5, but due to the Copilot monthly reset and using up prompts, I’ve been using a lot of Copilot GPT 5.5, so my experience with that model is still fresh. This makes it easier to get a better baseline across both platforms.

reddit.com
u/ProfessionalJackals — 21 days ago

As most discussion is heavily focused on the Student/Pro/Pro+ account but what is the response from the Business/Enterprise mangers on these new changes?

There was some feedback that business get offered a 45% discount for the new Credits system.

  • Is this discount going to keep the bean counters happy? Or are companies looking at the alternatives?
  • Does Microsoft provide you with a clue of your actual token usage? Aka so that the managers can estimate the impact?

It will be interesting to see the actual response from the other side. If companies are going to move away, or wait to see the impact on their billing first.

reddit.com
u/ProfessionalJackals — 25 days ago