Back in February, advertising on ChatGPT was a rich kid's club. You needed a $200K monthly minimum just to get in the door, roughly $2.4M a quarter to find out if the channel even worked for you. Dentsu, Omnicom, WPP and their enterprise clients got to play. Everyone else got to watch.

Then OpenAI did something interesting. They dropped the minimum to $50K in April. On May 5 they dropped it to zero. Anyone with a credit card can now log into ads.openai.com and run ads inside ChatGPT conversations. No agency, no invitation, no six figure handshake.

The early money is absurd. ChatGPT crossed $100M in annualized ad revenue within six weeks, and that was with less than 20% of eligible users even seeing ads on a given day. That's a fraction of capacity. OpenAI is openly targeting $2.5B this year and $100B by 2030. They are not treating this as an experiment.

Now the part that actually changes the job. This isn't Google Ads with a new logo. There are no keywords. You write "context hints," plain language descriptions of the conversations you want to appear in, and OpenAI's system decides where you match. Early advertisers who pasted in keyword lists instead of writing natural descriptions burned their budgets on loosely matched conversations.

Think about what the impression itself is, too. Nobody scrolls ChatGPT. The person seeing your ad just typed "best project management software for a 10 person team" into the box. They're mid decision, not mid doomscroll. Roughly one in five queries on the platform already carries commercial intent, across 900M weekly users.

Before anyone maxes out a card, the honest caveats:

Ads only show to Free and Go tier users. If your buyers live on Plus, Pro or Enterprise, your audience is capped.
You get conversion data after the click, but zero visibility into what the person was chatting about before it

https://preview.redd.it/cqnxxxx7dyah1.png?width=1556&format=png&auto=webp&s=bb930dfcface7dab47636580722d652e673a10ee

AI search is still around 0.7% of US search ad spend. Projected to hit 13.6% by 2029, but that's a projection, not a promise

Every ad channel that mattered had a brief weird window where access was open, competition was thin, and nobody knew the rules. Google in 2002, Facebook in 2007. The people who showed up during the confusion didn't win because they were smarter. They won because they were early and paid attention while everyone else waited for best practices to be written.

The best practices for this channel don't exist yet. Somebody in this sub is going to end up writing them.

GLM-5.2 dropped on HuggingFace under MIT license, and multiple practitioners are calling it the strongest open-weight text model available. Simon Willison called it "probably the most powerful text-only open weights LLM." That framing is mostly fair, but the architecture details matter a lot before you size hardware or drop it into a tool loop.

What the architecture actually does

GLM-5.2 is 753B total parameters, but only ~40B activate per token. Each incoming word wakes up the relevant ~40B parameters and ignores the rest. That's MoE (Mixture of Experts) and it means cheaper compute per token at inference time. The catch most people miss: the full 753B still has to sit in GPU memory. People hear "40B active" and size for 40B and nothing loads.

For long context, GLM-5.2 uses two stacked tricks:

DSA (DeepSeek Sparse Attention, borrowed from [DeepSeek-V3.2]): normally every word attends to every other word, so cost grows with the square of sequence length. DSA runs a cheap scan first to pick the ~2048 most relevant tokens, then does full attention only on those. About 50% cheaper at 128K context with minimal accuracy loss.
IndexShare (GLM's actual new contribution):** DSA still re-runs that cheap scan every layer. IndexShare reuses the scan result every 4 layers instead. That cuts the indexer's own cost ~75% and per-token FLOPs 2.9x at 1M context. This is the part that's genuinely new.

The benchmark picture

Z.ai's own numbers (no independent re-runs yet): SWE-bench Pro 62.1 vs GPT-5.5's 58.6, both at 400K context. On the Terminal-Bench 2.1 harness, it scores 81.0 vs Claude Opus 4.8's 85.0, a 4-point gap. Jeremy Howard called it at least as good as Opus 4.8 and GPT-5.5 for his text use.

Real catches before you commit

It's free to download, but not free to run. The whole model has to fit in GPU memory, and at full quality that's about 8 high-end datacenter GPUs (8x H200). The cheaper 8x H100 setup doesn't have enough memory and won't load it. The only realistic home option is a maxed-out Mac Studio (256GB+), and even then it's slow, a few words a second.

Two more things the benchmark scores hide. First, it's chatty Simon Willison measured it spending ~43k words of output per task where rivals use 24-37k. You pay per word out, so in an agent that calls itself in a loop, that adds up to a real bill.

Second, Zvi Mowshowitz noticed it often says it *is* Claude, which hints it may have been trained on Claude's output. If that's true, the benchmark scores might be flattering. Nobody's confirmed it either way yet.

No vision support at all, which is a hard blocker for multimodal agent pipelines.

Long-context serving cost is genuinely lower here than on a dense model of comparable capability, but the hardware bar is higher than the "40B active" framing suggests, and the verbosity issue will bite you in agentic loops where output tokens add up fast.

If you've run MoE models in production tool loops before, how did you handle the verbosity problem, prompt engineering or post-processing the outputs?

r/HowToAIAgent

OpenAI quietly killed the $200K entry fee for ChatGPT ads. Six weeks later they'd made $100M.

GLM-5.2 is 753B params but only uses ~40B per token. Here's what that actually means for agent builders

What the architecture actually does

The benchmark picture