Most of the Economy Won't Run on the Best Model

Jun 04, 2026

Hey everyone,

I want to share some of my thoughts on why I think a significant change is going to happen in the AI industry, and the market is blind to it. It’s a thesis about where the money in AI actually goes once we transition to scaled-up AI workloads and how that might look different from today’s expectations.

Let me start with an analogy that I keep coming back to.

When a company hires an accountant, it does not go out and hire a PhD in pure mathematics to reconcile the ledgers. Not because the PhD couldn’t do it — they obviously could, and probably faster — but because it makes no economic sense. The PhD is overqualified, which is just another way of saying they are too expensive for the value the task produces. The economic output of bookkeeping is capped. There is only so much upside in getting the books done. So you hire the cheapest person who clears the quality bar, and you pocket the difference. And you can take this analogy and apply it to multiple other jobs.

Now flip it. If you are running a drug-discovery program, you absolutely want the PhD — in fact you want five of them, plus a Nobel laureate consulting on the side. Why? Because the economic output of a single discovery is enormous, almost unbounded. The expected value of a breakthrough is measured in tens of billions, so the cost of the smartest possible person working on it rounds to zero against the prize. Here, intelligence is the only thing that matters, and cost is an afterthought.

This is, I think, exactly how the AI model market is going to bifurcate. And we are right at the inflection point where it starts to happen.

The metric is not intelligence. It’s intelligence per dollar

Today, essentially everyone uses the state-of-the-art (SOTA) model for everything. You want to summarize an email? SOTA model. Classify a support ticket? SOTA model. Extract three fields from an invoice? SOTA model. We do this for one simple reason: the frontier models have only just crossed the threshold of being broadly truly impactful for knowledge work, and when something has only just started working, you reach for the best version of it you can find. You don’t optimize cost on a capability you weren’t sure you had last quarter.

But I believe this is a transitional behavior, not a stable equilibrium. And there are a few reasons for that.

The Stanford HAI AI Index found that the inference cost for a system performing at GPT-3.5 level dropped more than 280-fold between November 2022 and October 2024 — from roughly $20 per million tokens to about $0.07. Andreessen Horowitz, looking at the same phenomenon across the whole performance spectrum, concluded that for a model of equivalent performance, cost falls by roughly 10x every year — faster than compute fell during the PC revolution, faster than bandwidth fell during the dotcom build-out. Epoch AI, slicing it by benchmark, found the price to hit GPT-4-level performance on PhD-level science questions fell by about 40x per year, with the range across benchmarks running anywhere from 9x to 900x annually. We just had Sara Friar, OpenAI’s CFO on the All in conference say the following:

“ The good news on compute is that there is a massive deflationary curve on cost, right? From ChatGPT... uh, [GPT] 4 to 5.4, I think the deprecation of cost was something like 97%. It’s like kind of an amazing curve, actually..but that happened in like two years. “

Pick whichever number you find least aggressive. They all say the same thing: the capability you are paying a premium for today becomes nearly free in about a year.

On top of it, many companies and enterprises are starting to burn through their annual planned token consumption in just a few months. This is a trend that is accelerating, and I have been hearing it all across the industry. Yesterday, there was a comment published from Sam Altman saying:

“Probably the second biggest theme is around cost. People are really saying, that’s kind of become a meme now, but “my company spent my entire 2026 budget in Q1. Can you make this more efficient?”...that went from at the beginning of this year, an issue that never came up - I know people were totally happy with the amount they were spending - to all of a sudden a huge issue”.

What this means is that companies have no choice but to optimize costs, and that will soon mean using models other than the SOTA model for specific tasks.

The second force is that the frontier itself is getting smaller, not just cheaper. Epoch AI has pointed out that frontier models are now roughly an order of magnitude smaller in parameter count than GPT-4 was, because once inference becomes the dominant cost, you stop training huge models and start over-training small ones on far more data. Distillation compounds this: a teacher model’s capability gets compressed into a student, a fraction of its size. This is exactly what Meta is doing internally when using their AI models to power their ads and content platform. The student model is the one that is applied at scale, and it distilled its knowledge from the teacher model.

So putting all of these together. We have a rapidly falling price for any given level of capability and frontier that is already shrinking in size in terms of what is actually being deployed, and we have companies burning through their annual token budgets in a matter of months.

As such, I believe that for the overwhelming majority of economically valuable knowledge work, the correct model is not the SOTA model. It’s the cheapest model that clears the task’s quality bar. And as pilots move into full production (which is the stage we are in today) — where you’re suddenly paying for millions or billions of tokens a day instead of running a demo — intelligence-per-dollar becomes the only metric that survives contact with a CFO.

At the same time, the SOTA model and its use case don’t disappear. It goes where the economic ceiling is unbounded: frontier R&D, drug discovery, novel mathematics, and the hardest agentic reasoning chains. But that is a smaller slice of the token volume in terms of our current economy. The accountant’s quadrant — classification, extraction, summarization, routine code, customer support, the boring profitable middle of the economy — is where the majority of tokens actually are, and that quadrant is going to run on cheaper, distilled, often fine-tuned, frequently “older” models.

The investment angle: the money moves to the owners of installed compute

If the thesis above is right, where does the capital go?

The intuitive answer — the one the market is currently screaming — is “buy the picks and shovels.” Buy Nvidia, buy Broadcom, buy the ASIC co-designers, buy memory, anything that sells new compute. But my view is that while that sector still might do well, there is a different part of the tech stack that will benefit even more, looking at the rate of change from the current state.

The sellers of new compute (semis) are only winners in a world of continued high-cadence spending on new compute. And my thesis specifically questions whether that cadence is necessary. So let me lay out the two states the world can be in, because the asymmetry between them is the whole argument.

Scenario 1: Capex falls or stabilizes. If you can squeeze an order of magnitude more useful tokens out of the hardware you already own — because models got smaller, cheaper, more efficient and verticalized — then you no longer need to spend $100bn+ every single year just to stay relevant. In this world, the owners of the installed base win and the sellers of new compute lose. Hyperscaler free cash flow inflects sharply upward, because capex was the one thing suppressing it. Multiples re-rate higher as the cloud business converts from a capex incinerator into a cash machine running largely paid-for, partly-depreciated hardware. And the semis de-rate, because the market finally realizes the upgrade treadmill has slowed.

Scenario 2: Capex stays high — and revenue explodes. This is the Jevons-paradox-on-steroids case. Demand is so strong that hyperscalers do both: they extract enormous output from cheap, long-lived existing hardware and keep buying new gear. Here everyone wins at once — but the hyperscalers win more, because their incremental revenue now lands on a cost base that is partly depreciated and dramatically more efficient per token. Operating leverage goes vertical.

The interesting thing is that the market is currently priced for neither. The market is currently pricing only a future in which CapEx continues to grow for the foreseeable future and the semiconductor industry benefits, but at the same time, the hyperscalers are making a losing bet with spending on this CapEx, as the market is questioning the return on that spend.

I think this market premise is very wrong, as we are actively transitioning to production-scale AI workloads where the economics are different from those in the pilot world, where we mostly lived for the last few months.

As always, I hope you found this article valuable. I would appreciate it if you could share it with people you know who might find it interesting. I also invite you to become a paid subscriber, as paid subscribers get additional articles covering both big tech companies in more detail, as well as mid-cap and small-cap companies that I find interesting.

Subscribe to Paid

Thank you!

Disclaimer:

I own hyperscaler Meta (META), Amazon (AMZN), Microsoft (MSFT), Google (GOOGL) stock.

Nothing contained in this website and newsletter should be understood as investment or financial advice. All investment strategies and investments involve the risk of loss. Past performance does not guarantee future results. Everything written and expressed in this newsletter is only the writer’s opinion and should not be considered investment advice. Before investing in anything, know your risk profile and if needed, consult a professional. Nothing on this site should ever be considered advice, research, or an invitation to buy or sell any securities.

Karl K

Jun 30

Seems like they’ll need to new benchmarks for non-SOTA models to companies can “pick” a task from x model (and the benchmark will validate the model’s ability to complete said task)

Matt McDonagh

Great read! We have similar takes!

https://lifeinthesingularity.com/p/glm-52-proves-ai-comes-for-all-moats

12 more comments...

UncoverAlpha

Discussion about this post

Ready for more?