UncoverAlpha

UncoverAlpha

The Forgotten Chip: CPUs the New Bottleneck of the Agentic AI Era

UncoverAlpha's avatar
UncoverAlpha
Feb 23, 2026
∙ Paid

Hey everyone,

For three years, GPUs have been the only chip that mattered in AI. Every investor pitch, every earnings call, every CapEx headline was about who could get more Nvidia GPUs.

CPUs? An afterthought. The boring, commodity chip that just sat next to the GPU and passed data along. Nobody cared. That’s changing fast. And if you’re not paying attention to the “CPU renaissance” happening right now, you’re missing what I believe is one of the more important infrastructure shifts in this AI cycle.

In this article, I will break down exactly why agentic AI is changing the CPU demand, how exactly CPUs are used in agentic AI, how big the CPU market can become because of AI agents, and which public companies stand to benefit. I’ll also discuss whether we’re heading into a genuine CPU bottleneck and how long it could last.

Why Agentic AI Changes Everything for CPUs

To understand why CPUs suddenly matter, you need to first understand how agentic AI workloads are fundamentally different from the »classic« chatbot-style AI we’ve been running for the past three years.

The old workflow — chatbot:

When you use ChatGPT or any standard AI chatbot, the process is straightforward. You type a question, the CPU tokenizes it (converts your text into numerical tokens the model can process), ships it over to the GPU, the GPU runs the tokens through the model and generates a response, then ships the output back to the CPU, which de-tokenizes it and delivers the answer. In this workflow, the CPU does very little. Maybe 5-10% of the total compute. The GPU is doing all the heavy lifting with its matrix multiplications, attention calculations, and token generation. This is why, for three years, the entire industry was laser-focused on GPUs.

The new workflow — agentic AI:

Agentic AI is fundamentally different. Instead of a simple question-answer loop, you’re dealing with autonomous systems that plan, execute, use tools, browse the web, query databases, make API calls, write and run code, and then reflect on whether they did a good job before deciding what to do next. A single user request can spin off dozens or even hundreds of sub-agents, each running their own loops of reasoning and action in parallel.

All of that orchestration, tool calling, API handling, memory management, and coordination between sub-agents happens on the CPU, not the GPU. The GPU still handles the inference (the “thinking” part), but between each inference call, the CPU is doing an enormous amount of work. It’s parsing responses, deciding which tool to call next, managing the execution plan, handling file I/O, running code, making network requests, and coordinating which sub-agents depend on which other sub-agents’ results.

In an interview, a VP at Intel explained:

“Agentic AI is nothing but a combination of independent agents... If there are in workflow, say, 10, 20, 30, 40, 100 agents, and they all need to talk to them, then they need different locations to operate. When I say location, I talk about CPUs.”

source: AlphaSense

A Georgia Tech and Intel research paper from November 2025 quantified this, and the findings are striking: tool processing on CPUs accounts for between 50% and 90% of total latency in agentic workloads. In many agentic workflows, the CPU is responsible for the majority of the wait time, not the GPU. The GPU sits idle, waiting for the CPU to finish its work before it gets the next batch of tokens to process.

This completely inverts the infrastructure economics we’ve been operating under. In the chatbot era, you needed a small number of high-end CPUs paired with massive GPU clusters. In the agentic era, you potentially need more CPUs than GPUs, and the CPU-to-GPU ratio in a rack or cluster needs to go up significantly.

“For every GPU workload, there is a supporting CPU demand. The CPU is going to handle the data processing, the orchestration, the API layers, post processing.”

Source: AWS employee on AlphaSense

Breaking Down the CPU Workload in Agent Systems

Let me walk through what the CPU actually does in an agentic workflow, because I think understanding the details here is important for appreciating why this demand is structural and not a temporary blip.

Step 1: Planning: The user gives a broad instruction (e.g., “Research the competitive landscape of the DRAM industry and write me a report”). The CPU tokenizes this and sends it to the GPU for an initial inference call. The GPU generates a plan of execution, not a final answer. That plan comes back to the CPU.

Step 2: Orchestration: The CPU now breaks that plan into sub-tasks and assigns them to multiple agents. This is pure CPU work. It’s managing a directed acyclic graph of tasks, determining which ones can run in parallel, which depend on others, and in what order they should execute. If you have 10 research sub-topics, you might have 10 sub-agents that can all run simultaneously.

Step 3: Tool execution: Each sub-agent starts working. This is where CPUs get extremely busy. Sub-agent 1 might make a web search API call, wait for results, parse the JSON response, extract relevant text, and package it for another inference call. Sub-agent 2 might query a database, run a SQL query, and process the results. Sub-agent 3 might open a file, read its contents, and prepare them for analysis. All of this — the API calls, network I/O, file handling, data parsing, JSON processing — is CPU work. The GPU is idle during these operations.

Step 4: Inference loops: Each sub-agent may also run its own chain-of-thought reasoning, sending multiple inference requests to the GPU. Between each inference call, the CPU processes the output, decides if the agent is done, and either feeds the next prompt or moves to the next step.

Step 5: Reflection: Once all sub-agents complete, the CPU gathers all their outputs and sends them to the GPU for a reflection inference loop — essentially asking the model, “did we answer the original question well enough?” If not, the whole cycle restarts. The key characteristics a CPU needs for this kind of workload are: high single-core clock speed (to minimize orchestration latency), high core count (to run many agents in parallel), fast memory access and large caches (to manage all the context and intermediate state), and strong I/O connectivity (PCIe lanes for network and storage, because agents are constantly hitting APIs and databases).

The AI server factories sitting above your general-purpose compute infrastructure don’t replace those traditional CPU servers. They create more demand for them. Because now, instead of one human slowly browsing the web and running a few apps, you have hundreds of AI agents aggressively consuming CPU resources at machine speed.

The demand for CPUs is already showing up in earnings calls

This new CPU demand has already been shown in recent earnings calls.

On AMD’s Q4 earnings, AMD’s data center segment posted record revenue of $5.4 billion in Q4 2025, up 39% year-over-year and 24% sequentially.

But the key wasn’t the GPUs but the CPUs. Lisa Su explicitly called out CPUs as a major growth driver, stating:

“demand for EPYC CPUs is surging as agentic and emerging AI workloads require high-performance CPUs to power head nodes and run parallel tasks alongside GPUs.”

AMD’s 5th Gen EPYC Turin CPUs accounted for more than half of total server CPU revenue by the end of Q4, and the number of EPYC cloud instances grew more than 50% year-over-year to nearly 1,600 instances. The number of large enterprises deploying EPYC on-premises more than doubled in 2025. Su specifically highlighted that in agentic workflows, when AI agents spin off work in an enterprise, “they’re actually going to a lot of traditional CPU tasks.” She expects the server CPU market to grow by “strong double digits” in 2026.

Su also noted that “x86 processors have a particular edge in agentic workloads where AI agents spin off work to traditional CPU tasks, with the vast majority of such tasks running on x86 today.”

Looking ahead, Su guided for data center segment revenue to grow more than 60% annually over the next three to five years and for AMD’s AI business to scale to tens of billions in annual revenue by 2027. CPUs are a meaningful piece of that equation, not just GPUs.

And it’s not just the earnings call, you can also see it from multiple conversations with industry experts.

A former CTO of a HP competitor highlights that infrastructure is moving from static policy-based routing to “inference-based” routing. An AI-powered controller layer, running on CPUs, dynamically analyzes incoming workloads to determine whether they require expensive GPU cycles or can be offloaded to traditional x86 CPUs, optimizing resource allocation.

Agentic AI often involves deterministic tasks—such as following a specific rule set or executing a defined API call—that do not require the probabilistic power of a GPU. A Director at a Global Consultancy notes that these deterministic aspects of agentic workflows are most efficiently executed by CPUs, reinforcing the need for a balanced infrastructure where GPUs handle the “thinking” and CPUs handle the “doing”

The CPU demand was a shock for Intel

If AMD saw the CPU demand wave coming, Intel was genuinely surprised by it. Intel’s Q4 revenue came in at $13.7 billion, above guidance, with data center and AI revenue rising 15% sequentially — the fastest sequential growth this decade. But here’s the key: Intel admitted it couldn’t meet all the demand.

CEO Lip-Bu Tan said the company “delivered these results despite supply constraints, which meaningfully limited our ability to capture all of the strengths in our underwriting markets.” CFO David Zinsner was even more direct, admitting that Intel “misjudged” the pace of data center CPU demand and that the company is now “shifting as much as we can over to the data center” by reallocating wafer capacity from client (PC) CPUs to server CPUs.

Zinsner acknowledged that Intel is “absolutely constrained” and is deprioritizing the low-end client market to push capacity into data center products. Intel expects its supply to hit a low point in Q1 2026 before improving in Q2, but in the meantime, revenue “would have been higher if we had more supply. Management explicitly positioned CPUs as “central to AI orchestration and scaling inference.”

The AWS-OpenAI Deal was the tell

The most interesting data point on CPU demand came not from a chip company but from a cloud infrastructure deal back in November 2025. AWS and OpenAI announced a $38 billion, seven-year strategic partnership. The press release stated that OpenAI would access “hundreds of thousands of state-of-the-art NVIDIA GPUs, with the ability to expand to tens of millions of CPUs to rapidly scale agentic workloads.”

People wrongly focused on the Nvidia GPU part, but the CPU part is far more interesting. Tens of millions of CPUs. For agentic workloads. They didn’t have to include that detail. The fact that it’s in the official announcement tells you how seriously the frontier AI labs are thinking about CPU compute as a scaling requirement. All capacity under this agreement was targeted for deployment before the end of 2026, with options to expand into 2027.

Nvidia — The Vera CPU

Nvidia itself is making a big bet on the CPU side. Its upcoming Vera CPU, part of the Rubin platform announced at CES 2026, is specifically designed for agentic reasoning workloads. Vera delivers up to 2x the performance of the previous Grace CPU, with 88 cores per die and significant uplifts in memory and chip-to-chip bandwidth.

What’s particularly notable is that Nvidia announced Vera can be deployed as a standalone platform for agentic processing, separate from the GPU. CoreWeave is set to use standalone Vera CPUs, and Jensen hinted in a Bloomberg interview that “there are going to be many more” standalone CPU deployments. And it didn’t take long the Meta & Nvidia deal was announced a few days ago:

»This partnership will enable the large-scale deployment of NVIDIA CPUs and millions of NVIDIA Blackwell and Rubin GPUs, as well as the integration of NVIDIA Spectrum-X™ Ethernet switches for Meta’s Facebook Open Switching System platform…The collaboration represents the first large-scale NVIDIA Grace-only deployment.«

This is Nvidia essentially confirming the thesis: in agentic AI, the CPU-to-GPU ratio needs to go up, and some workloads may be purely CPU-bound.

Are We Heading Into a CPU Bottleneck?

We’re already in one. The server CPU supply chain is under significant stress, and the constraints are coming from multiple directions simultaneously.

Intel is struggling with yield issues at some of its fabs, slowing the production ramp for newer Xeon parts. The company has admitted it cannot meet demand and is reallocating capacity from PC CPUs to server CPUs, meaning the PC segment will take a hit. Intel expects supply to improve starting Q2 2026, but the situation remains “acute” in Q1.

TSMC is prioritizing AI accelerators, which means less capacity for CPUs. AMD’s server CPUs are manufactured by TSMC, but TSMC is aggressively prioritizing its advanced node capacity for higher-margin AI accelerator chips (GPUs and custom ASICs). TSMC chairman C.C. Wei publicly stated that advanced-node capacity is “about three times short” of what major customers plan to consume. When TSMC’s 3nm process is running at 160,000 wafers per month and that’s still not enough, and when CoWoS advanced packaging capacity is sold out through 2026, CPU wafer allocation gets squeezed as a collateral effect.

Intel has also already warned Chinese customers of delivery lead times of up to six months for certain server CPUs. AMD’s lead times have stretched to 8-10 weeks for some products. Intel server chip prices in China have risen more than 10%. China represents over 20% of Intel’s total revenue, and major customers like Alibaba and Tencent are affected.

An additional problem to supply is the memory-driven pull-forward. The severe global memory shortage is creating a rush effect on CPU purchases. When memory prices started rising in China late 2025, customers accelerated CPU purchases to lock in system-level pricing before costs spiraled further. This pull-forward exacerbated the existing supply tightness.

A cloud computing materials manager reports:

“Our supply chain was a constraining factor... GPU, CPU, and RAM were the top three drivers for us being constrained” as customers convert to “more powerful CPUs that can run higher AI workloads.”

Source: AlphaSense

A global IT distributor reports CPU shortages are “directly driving a 30% increase in average selling prices (ASPs) during the fourth quarter of 2025” with “increased backlogs” as order intake exceeds expectations.

So the CPU bottleneck is already here; the question now is how long it will last.

In the next section, I analyzed how many CPUs we will need in this agentic AI and gave a timeline of when supply could meet the demand, on top of which companies stand to benefit most from this trend:

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2026 Rihard Jarc · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture