UncoverAlpha

Q1 2026 Channel Checks & Alternative Data: Cloud is on Fire

UncoverAlpha — Fri, 24 Apr 2026 12:02:33 GMT

Hey everyone,

I am posting my regular channel check & other alternative data report before we start the big tech earnings.

For this report, I covered cloud providers Google, Microsoft, and Amazon in terms of their cloud business, and some insightful signals on Microsoft Copilot, which is a pressure point for Microsoft.

Let’s dive in.

Cloud is on fire

Let’s start with my alt data on the most relevant channel-check interviews from clients, cloud consultants, integrators, and former employees.

When bulk analyzing these interviews, demand is high for Q1 2026 across all three hyperscalers: AWS, Azure, and GCP. Looking at the future demand pipeline for the next 3-6 months, the results are even more impressive: almost 60% of these experts see demand exceeding their expectations, 26% see it meeting expectations, and 15% see it below expectations. Keep in mind that expectations were already high going into this year and quarter, so for 60% of experts seeing higher-than-expected demand, the signal is very strong. The driver of demand, as expected, is AI workloads and the move from test to production environments, especially with Agentic AI starting to roll out.

Looking deeper, let’s look at the individual hyperscaler level and what the data shows. As always, I made the % breakdown of experts who think AWS, Azure, or GCP is accelerating the fastest. Here are the results:

62% think GCP is growing the fastest, 41% think Azure is growing the fastest, and 27% think AWS is growing the fastest (important note: the sum is greater than 100% because some experts mentioned two platforms as growing at a faster pace than the other).

Now, this data doesn’t add much value until we compare it to my historical data from past quarters, as we did in the last reports, to truly understand whether anything shifted significantly in Q1. Here is the data:

Left for Dead on AI, Meta and Amazon Are About to Have the Last Laugh

UncoverAlpha — Fri, 17 Apr 2026 13:01:57 GMT

Hi everyone,

In this article, I break down some significant fundamental shifts when it comes to AI efforts from Amazon and Meta, and why I think both are the next two big AI beneficiaries. Based on what we are seeing, both companies are on a path to reaccelerating their efforts, while still perceived by the market as “AI laggards”. We believe this premise will be proven wrong in the coming months.

Let’s start.

The Era of Subsidized AI Model Usage is Over, the IPOs are coming

UncoverAlpha — Fri, 10 Apr 2026 14:38:41 GMT

Hey everyone,

The AI industry is approaching an inflection point that will reshape the priorities of AI model companies and the entire space. There are four interconnected themes I want to walk through today, and they all converge on a single conclusion: the era of subsidizing AI model usage is coming to an end.

Here’s what I cover in this article:

Anthropic is taking over the enterprise — but the curse of the best model is real
OpenAI is losing the enterprise race to Anthropic and facing structural problems heading into its IPO
The era of subsidized AI model usage is ending as both companies prepare for public markets
The IPO race: who lists first matters more than most people realize

Let’s get into it.

Anthropic Is Taking Over Enterprise: And the Curse of the Best Model Is Coming for Them

Anthropic announced that its revenue had surpassed $30 billion, up from $9 billion at the end of 2025. That’s more than tripling in roughly four months. Anthropic has now supposedly surpassed OpenAI’s run-rate revenue of approximately $25B.

The enterprise composition is what separates Anthropic from the rest. Approximately 80% of revenue comes from business customers. The number of customers spending over $1 million annually has doubled to more than 1,000, up from 500+ where it was just 2 months ago. Business subscriptions to Claude Code have quadrupled since the start of 2026.

And then there’s Mythos. Just a few days ago, Anthropic announced Claude Mythos Preview - a new general-purpose model that sits in an entirely new tier above Opus. The draft described it as “by far the most powerful AI model we’ve ever developed” and said it is “very expensive for us to serve, and will be very expensive for our customers to use.”

The benchmarks are quite telling: 93.9% on SWE-bench Verified (vs. Opus 4.6’s ~80.9%), 77.8% on SWE-bench Pro, 82% on Terminal-Bench 2.0, 97.6% on USAMO 2026, and 83.1% on CyberGym vs. Opus 4.6’s 66.6% — a 16.5 percentage-point jump on cybersecurity tasks. On Anthropic’s internal zero-day exploit benchmark, Opus 4.6 had a near 0% success rate at autonomous exploit development. Mythos succeeded 181 times out of several hundred attempts on the same Firefox vulnerability task. It also found thousands of zero-day vulnerabilities across every major operating system and browser, including a 27-year-old bug in OpenBSD and a 16-year-old bug in FFmpeg that automated testing had missed across 5 million test runs.

Mythos is not being made generally available because Anthropic wants to first roll it out to selected companies because of cybersecurity risks. It’s deployed through Project Glasswing to 12 partner organizations (Amazon, Apple, Broadcom, Cisco, CrowdStrike, Linux Foundation, Microsoft, Palo Alto Networks, and others) plus about 40 additional organizations, with Anthropic committing $100 million in usage credits. While I am not dismissing any cyber risks that a model like this could bring, it is also convenient that now model providers will »release« these models to a small group of companies, because the reality is that with their current compute, they can’t even serve Opus 4.6 to their user base let alone Mythos, which is even more expensive to run. The current Mythos Preview, for which these companies got access, is around 5x more expensive than Opus 4.6 after the initial $100M credit commitment from Anthropic based on their pricing. It is also rumored that OpenAI will also “release” their newest model in a similar fashion, again citing cybersecurity risks.

There has been a lot of frustration from Claude users lately, as many have started to hit their rate limits much faster in their subscription plans, as Anthropic is having to manage this surge in demand with the amount of compute that they have.

This is what I call the Inference Trap, and both OpenAI and Anthropic have now been caught in it.

The pattern is simple: build the best model → users surge → inference compute explodes → you either throttle users, raise prices, or cannibalize training compute. OpenAI experienced it during the Ghibli moment in March 2025, when ChatGPT gained 1 million new users in a single hour and 100 million signups in a week. Sam Altman admitted they were “forced to do a lot of unnatural things,” specifically borrowing compute capacity from OpenAI’s research division and slowing down the release of new features.

Anthropic is living through its own version right now. In March 2026, the company experienced five major platform outages in a single month. Claude Code users reported burning through 5-hour sessions in under 90 minutes. The problem for Anthropic is that if you are a model provider in this AI race, you don’t want to cannibalize training compute, as it means that you can lose the race for the next model.

This brings me to another point I want to make: pricing increases on frontier AI models are inevitable. When Anthropic eventually deploys Mythos-class models at scale, the inference cost per query will be higher than Opus. And they already can’t serve Opus at current demand levels without throttling. The math only works if prices go up, or if the compute infrastructure grows fast enough to meet demand, which it can’t in the short term.

Moving to OpenAI.

OpenAI seems to be losing the Enterprise Race And Heading Into an IPO with some headwinds

While Anthropic is sprinting ahead on enterprise revenue, OpenAI is dealing with a set of problems that are becoming hard to ignore.

The revenue gap has flipped. A year ago, OpenAI was at roughly $6 billion ARR, and Anthropic was at $1 billion. The gap looked huge. Today, Anthropic is at $30 billion, and OpenAI is at $25 or similar to Anthropic, but the pace of growth is slower. Anthropic added roughly $21 billion in net new annualized revenue in just three months. OpenAI’s enterprise business now makes up 40% of revenue (up from ~30% last year) and is “on track to reach parity with consumer by the end of 2026” — but Anthropic has been enterprise-first from the start, with 80% enterprise revenue and structurally higher retention.

To add to this, SensorTower data now show that ChatGPT’s monthly active users in the US have started to fall slightly, adding to the headwinds.

Even without this data, we have been waiting for quite some time for OpenAI to make the update of reaching 1 billion weekly active users, as the 800M mark was announced 6 months ago. Based on this users growth has slowed down.

The funding structure also isn’t perfect. OpenAI closed a $122 billion round at an $852 billion valuation at the end of March— the largest private funding round in history. But the structure is telling. Amazon committed $50 billion, but only $15 billion arrived as upfront cash. The remaining $35 billion is conditional, tied to milestones that some indicate may include achieving certain AI capability thresholds or pursuing an initial public offering by the end of 2026.

SoftBank pledged $30 billion structured in three equal tranches of $10 billion each, arriving in April , July, and October. SoftBank’s structure essentially assumes a liquidity event within that window. About $3 billion came from retail investors through bank channels, and OpenAI was included in ARK Invest ETFs.

In other words, a significant portion of OpenAI’s headline $122 billion raise is conditional on an IPO actually happening. This means OpenAI is under pressure to go public regardless of whether the timing is optimal. There are now reports from The Information that Altman and OpenAI’s CFO are on different sides over the IPO, as Altman is pushing for an IPO this year, while the CFO believes OpenAI is not ready yet. OpenAI has already denied this, but it's not expected that any company would confirm such rumors, even if they were true.

The profitability picture is the key thing. According to different reports, OpenAI’s gross margins sit at approximately 40%, constrained by variable compute costs. The company is generating +$2-3 billion per month but losing +$14 billion per year. Reports of internal documents project that compute costs will reach $121 billion by 2028, with a cumulative loss trajectory that doesn’t reach breakeven until 2029-2030. Compare this to Anthropic, which projects positive free cash flow by 2027-2028 while spending roughly 4x less on training.

OpenAI also has an alternative to Claude Code called Codex, but adoption there, although growing, doesn’t seem to be at the same pace as Claude Code. It’s telling that OpenAI is even offering users more token usage, while Anthropic is limiting it.

As this data from Ramp shows, the AI model share of first-time enterprise customers has heavily tilted towards Anthropic in the last few months:

The key moat that OpenAI has is the ChatGPT brand, which, as a first mover, created a verb similar to Google when it comes to consumers. In the last months, however, Claude and Anthropic have become “the verb” when it comes to enterprises and AI use cases for work. On the consumer end, OpenAI will probably have to shift hard towards an ad-supported business model to get some revenue from its big free user base. Building an efficient ad platform is much more complex than most think and requires time. At the same time, OpenAI is trying to stop Anthropic in the enterprise market, but so far, it doesn’t seem to be working as Anthropic is capturing the market at a faster pace. The question is whether OpenAI’s strategy of trying to capture both markets at once and “doing everything” is really the right one. I would argue that it is not. Now you even have Meta entering the AI arena again, with its first AI model since the formation of Meta's superintelligence unit. While their model is not SOTA, there are specific use cases where it is very competitive. Meta focused on use cases like health, social media, games, and shopping. This pattern will become more dominant in the coming years as the AI model market matures and you see model specialization rather than just general models. In this environment, the importance of having a narrow focus becomes even bigger.

Subscribe now

According to The Information, OpenAI is now communicating to investors that they believe one of their important advantages going forward vs Anthropic is in the availability of compute. OpenAI said it believes Anthropic had 1.4 GW of capacity at the end of last year, while OpenAI had 1.9 GW. But OpenAI said it plans to ramp its capacity more steeply, with total gigawatts in the mid-single digit range at the end of this year and more than 10 GW in 2027.

In contrast, it believes Anthropic will have 3 to 4 GW in 2026 and 7 to 8 GW by the end of 2027. While I agree that availability of compute is a big factor going forward, I believe the main one and even more important is inference economics and the costs of serving the models to your clients. As users shift workloads to production, availability and reliability become key. Nobody wants to have a product that is unstable and sometimes works great, but other times is not available.

The End of Subsidized AI Model Usage

Both Anthropic and OpenAI are preparing for IPOs. And an IPO changes everything about how an AI company thinks about compute costs.

When you’re private and burning venture capital, you can subsidize inference. You can run models at a loss. You can offer $20/month unlimited plans that cost you +$100/ month to serve. You can double the rate limits as promotions. You can hand out credit packages. The goal is growth at all costs, because the next funding round values you on revenue, not margins.

When you’re public, the scrutiny shifts to unit economics. Gross margins, operating margins, cash burn trajectory, path to profitability — these become the metrics that determine your stock price. An S-1 filing forces you to disclose all of this in audited detail.

Both OpenAI and Anthropic are probably operating at approximately 40% gross margins, constrained by the variable cost of running inference. And while no company will show net profit as they are projected towards the end of 2030, the gross margin will be something that investors will particularly keep an eye on, especially the gross margin on inference. While the margin profile math works on API usage, the one in the subscription packages is still often subsidized by the model companies, and this is something they will look to tweak in the coming months before they file their S-1s.

This has a ripple effect across the entire supply chain industry. For the past three years, the question has been: “Which chip delivers the most FLOPS?” The answer was always Nvidia, and companies paid whatever Nvidia charged because performance was the bottleneck, and money was abundant.

Going forward, the question becomes: “Which chip gives me the cheapest tokens and the best total cost of ownership (TCO)?” It’s not even about watts anymore, as you can see from recent podcast comments from Nvidia’s Jensen and Google Sundar — it’s about cost-per-token, because that’s what directly determines your gross margin as a public company, and that is what investors will be laser focused on. This shift also means the end of subsidizing usage in these subscription packages, with either usage limits or higher prices. We will also see even more resources being focused on software optimizations to run the model. Savings on memory and getting more from existing hardware will be the focus in the coming months for both labs, as they are constrained by compute.

Both of these companies also need to make the hard math of the IPO being the “last” funding round, and after the IPO, have enough capital that will be able to support their growth and cash burn for the coming years. Issuing additional stock for raising capital once you are a public company is never looked at positively by the market, so nobody wants to go down that route.

Anthropic is uniquely well-positioned here because it runs Claude on a diversified hardware stack across three suppliers: Nvidia GPUs, Google TPUs, and Amazon Trainium. This gives it real negotiating leverage and the ability to route workloads to whichever chip offers the best price-performance for each model tier. The company just announced a deal with Google and Broadcom for approximately 3.5 gigawatts of next-generation TPU capacity starting in 2027. This is on top of its existing AWS Trainium partnership and Nvidia GPU deployments. It is worth noting that AWS’s CEO just mentioned in an interview on CNBC yesterday that all of Anthropic’s AI models were trained on Amazon Trainium (even Mythos).

OpenAI, by contrast, has been more dependent on Nvidia through its Azure partnership with Microsoft, though it has been diversifying toward custom silicon.

The IPO Race: Who Lists First Gets the Biggest Check

There’s one final dynamic that ties all of this together: both Anthropic and OpenAI know that whoever goes public first has a significant advantage, and the window for both is narrowing. On top of those, SpaceX (which now includes xAI) is also racing towards a +$1T IPO. OpenAI is targeting a +$1T IPO, Anthropic just closed a funding round valued at $380 billion, but because of the surge in usage and revenue is already valued at around $500-$700 billion in secondary listings, so the IPO could be in the $800B-$1T range as well.

Between these three companies alone, we’re looking at potentially +$200 billion in capital being raised from public markets within a 6-12 month window. That’s an enormous liquidity event. For context, the entire US IPO market raised approximately $33 billion in 2024. Even in the hot 2021 market, total US IPO proceeds were around $140 billion.

This is why the race to go first matters so much. The first to market captures the freshest investor capital and sets the valuation benchmark. The second has to compete for the same institutional allocation. The third might struggle if the market has indigestion from the first two.

OpenAI’s board is reportedly concerned that if Anthropic lists first, it could set a valuation benchmark that makes OpenAI’s $1 trillion target look stretched — especially now that Anthropic has higher revenue, better enterprise concentration, and a more credible path to profitability. On the other hand, if OpenAI lists first, it establishes itself as the “AI category-defining IPO” and benefits from a first-mover premium in public market pricing.

Both companies know this. Both are preparing in parallel. And both are racing against time, because every month that passes, compute costs pile up, margins need to improve, and the public market window could shift with macro conditions.

Summary

We are entering a new key period in AI where unit economics take front stage. At the same time, I expect we will see a rapid pace of software optimizations to more efficiently serve these models in the coming months as AI labs put their best talent towards solving this task because it has now become the most important thing that is limiting growth and profitability. The software optimization will focus on resolving key bottlenecks, such as memory (KV cache, context window) and wafer availability. Model distillation and the trend toward smaller models will also grow faster than before because of this. If I were to speculate, the hardware companies might have a “less golden” time than the era they have had so far, while cloud providers might benefit the most as these software optimizations mean that they get more juice out of their existing infrastructure, while demand for compute still keeps on surging, because of wider adoption of AI.

Until next time,

Next week, we are publishing an article on some key developments in the AI space when it comes to Meta and Amazon, exclusive for paid subscribers. If you are not yet a paid subscriber, consider signing up.

Become paid subscriber

Thank you!

Disclaimer:

I own Google (GOOGL), Amazon (AMZN), Microsoft (MSFT), Meta (META) stock.

Nothing contained in this website and newsletter should be understood as investment or financial advice. All investment strategies and investments involve the risk of loss. Past performance does not guarantee future results. Everything written and expressed in this newsletter is only the writer’s opinion and should not be considered investment advice. Before investing in anything, know your risk profile and if needed, consult a professional. Nothing on this site should ever be considered advice, research, or an invitation to buy or sell any securities.

Every Memory Cycle Ends the Same. Until It Doesn't.

UncoverAlpha — Thu, 12 Mar 2026 12:19:35 GMT

Hey everyone,

For three decades, the memory semiconductor industry has followed a brutal and predictable pattern: prices boom, manufacturers over-invest, supply floods in, prices crash, everyone bleeds red ink, and then the whole thing starts over. It’s been one of the most reliably cyclical businesses in all of technology. The cycle has destroyed shareholder value, bankrupted companies, and taught every investor the same lesson: never trust the words “this time is different” when it comes to DRAM.

And yet, here I am, writing an article arguing exactly that.

Let me be clear, I know the history. I’ve studied every major memory cycle of the last 30 years. In this article, we look at them and the numbers. But then I am going to make a case for why the AI era may fundamentally break that pattern, not because demand will be infinite (it won’t), but because the nature of what memory serves has changed in a way that most investors haven’t fully internalized.

Memory is no longer just a component inside your gadget. Memory is becoming a raw input for intelligence. And the demand curve for intelligence looks a lot more like the demand curve for energy, electricity, than it does the demand curve for smartphones.

Let’s start.

The history of memory economics

For those less familiar with the space, the memory semiconductor market is dominated by three players: Samsung Electronics (South Korea), SK Hynix (South Korea), and Micron Technology (United States). Together, these three companies control approximately 95% of global DRAM production. This is an oligopoly, but not one that has historically behaved like one. Unlike OPEC, these companies can’t (legally) coordinate output. And unlike logic chips, memory is essentially a commodity—a bit is a bit. The differentiation comes from process technology, cost structure, and increasingly, product mix (more on HBM later).

The fundamental problem with memory economics is the mismatch between demand elasticity and supply inelasticity. Building a new DRAM fab costs $15-20 billion and takes 2-3 years. Once built, the economics favor running it at maximum utilization because fixed costs are enormous. So when demand rises, prices spike because supply can’t respond quickly. When manufacturers finally bring new capacity online, they tend to overshoot, because everyone is building at the same time based on the same rosy demand signals. Prices crash, margins collapse. Some companies go bankrupt or get acquired. The survivors cut capex, and the cycle begins anew.

This is the pattern. And it has repeated with remarkable consistency.

Cycle 1: The Windows PC supercycle (1993-1996)

The first modern memory supercycle was driven by the explosion of Windows PCs and graphical operating systems. Average DRAM content per PC jumped from roughly 1-2MB to 4-8MB—a 4x increase per device—while PC unit shipments were growing at double-digit rates.

During 1993 and 1994, DRAM demand outpaced supply despite most fabs running at full utilization. Spot and contract prices for 4Mb and 16Mb DRAM rose sharply, and gross margins for leading suppliers surged well above 50%. Korean memory makers like Samsung and Hyundai (now SK Hynix) posted record profits. Semiconductors accounted for 13.4% of Korea’s total exports. It was hailed as the greatest boom in Korean industrial history.

Then reality hit. Roughly 50 fab construction plans were announced during 1995-1996 alone. Capex as a percentage of semiconductor production exceeded 30%. The inevitable happened: DRAM prices peaked in late 1995 and then collapsed—falling 51% in 1996 and another 65% in 1997. Korea’s Big Three chipmakers suffered from overexpansion, and the resulting shock contributed to the Asian Financial Crisis that pushed Korea into a deep recession. Stock prices of memory companies fell 60-80% from peak to trough.

Looking at the data cycle duration (peak to trough): around 2 years. Price declines 51% in year one, 65% in year two, and the stock declines around 60-80%.

Cycle 2: The cloud and smartphone era (2016-2019)

Fast forward two decades, and the cast of characters had changed, but the script was the same. By 2016, the DRAM market had consolidated from roughly 20 players to just three. This was supposed to introduce discipline. And for a while, it seemed like it did.

The 2016-2018 “supercycle” was driven by a convergence of factors: smartphone storage capacity upgrades, the early cloud buildout, and a supply-side twist where manufacturers were shifting capacity to 3D NAND production, which temporarily constrained conventional DRAM output.

The numbers were spectacular, especially for Micron, the only publicly traded pure-play memory company in the U.S.:

Micron 2016: Revenue of $12.4 billion, gross margin of 20.2%, operating income of just $168 million (1.4% operating margin). The company was barely above breakeven.

Micron 2017: Revenue surged 64% to $20.3 billion. Gross margin expanded to 41.5%. Operating income hit $5.87 billion (28.9% margin).

Micron 2018: Revenue jumped another 50% to $30.4 billion. Gross margin peaked at 58.9%. Operating income reached an astonishing $15.0 billion—a 49.3% operating margin. From barely profitable to printing nearly 50 cents of operating profit on every dollar of revenue in two years.

SK Hynix followed a similar trajectory. At its Q3 2018 peak, SK Hynix posted an operating profit of 6.47 trillion Korean won, which at the time was a record.

DDR4 retail RAM prices doubled over the course of 2017 into early 2018. Industry inventories fell to 3-4 weeks, well below the normal 8-week average.

Micron’s stock peaked at roughly $64 in May 2018. But notice- revenue and margins didn’t peak until Q4 of calendar 2018. The stock topped out approximately two quarters before the fundamental peak. This is a classic pattern in cyclical stocks: the market discounts the turn before it shows up in the numbers.

Then came the crash:

Micron 2019: Revenue fell to $23.4 billion (-23%). Gross margin compressed to 45.7%.

Micron 2020: Revenue dropped further to $21.4 billion. Gross margin fell to 30.6%. Operating income was $3.0 billion, down 80% from the 2018 peak.

By December 2018, Micron’s stock had fallen to approximately $28—a 56% decline from the May high. The stock was pricing in the downturn even as the company was still reporting near-peak earnings.

Cycle duration (peak to trough in fundamentals): ~6-7 quarters. Revenue decline (peak to trough): ~30% Gross margin decline: from 59% to 27% (at the Q1 FY2020 low) stock decline (peak to trough): ~56%.

Cycle 3: The COVID cycle (2020-2023)

The pandemic created an unexpected demand surge. PC shipments exploded as the world went remote. Server demand spiked as cloud usage accelerated. 5G phones launched with higher per-device memory content. The upcycle lasted approximately 14 months before the familiar reversal kicked in.

By 2022-2023, the downturn was severe. Bloated inventories from pandemic over-ordering met weakening consumer demand. SK Hynix posted a full-year 2023 net margin of approximately negative 28%. Micron’s 2024 revenue dropped to around $25 billion with gross margins compressing toward the low 20s.

Memory stocks cratered. Micron fell from around $98 in early 2022 to roughly $49 by late 2022—a 50% haircut. SK Hynix fell similarly.

Cycle duration (peak to trough): 6-8 quarters of margin compression. Operating margins went from 30%+ to deeply negative for SK Hynix. Stock decline: ~50%

The pattern across all three cycles is strikingly consistent: a demand-driven boom lasting 4-7 quarters, followed by an oversupply-driven bust lasting 4-8 quarters, with revenue declines of 25-40%, margin compression from peak levels above 50% to the low 20s or even negative, and stock price declines of 50-60% that lead the fundamental downturn by 1-2 quarters.

The history is clear, but now let me tell you why I think this cycle might be structurally different.

From gadget component to intelligence input

In every previous memory cycle, the demand driver was the same: humans buying devices. PCs in the 1990s. Smartphones in the 2010s. Laptops during COVID. The demand function was ultimately capped by the number of humans and the number of devices each human needs. One person buys one phone. Maybe one laptop. Perhaps a tablet. The DRAM content per device grows, but the number of endpoints is bounded.

This meant that once the initial adoption or upgrade wave passed—once everyone who needed a new PC had bought one, or every smartphone had been upgraded to the latest generation—demand would flatten. Supply, which was ramped during the boom, would overshoot. Prices would crash.

In the AI era, the demand function for memory has fundamentally changed. Memory is no longer predominantly serving a fixed number of »human endpoints«. Memory, especially HBM, is now a critical input for generating intelligence.

Think about what HBM (High Bandwidth Memory) actually does inside an AI accelerator. When you ask ChatGPT a question or run an inference on a large language model, the model’s parameters—billions or trillions of numerical weights—need to be loaded from memory into the GPU’s compute cores. The KV cache, which stores the context of your conversation, grows linearly with context length, with Grouped Query Attention (GQA) consuming roughly 0.06 - 0.12 MB per token in a 7B parameter model. A model with 70 billion parameters requires more than a single 80GB GPU worth of HBM just for the weights alone.

Here’s the simplified version: More memory = the ability to run larger models, with longer context, serving more users simultaneously. Memory is not a peripheral component in AI—it is the binding constraint. The so-called “memory wall” is the single biggest bottleneck limiting AI inference performance today. GPUs often sit idle, waiting for data to be fetched from memory. More bandwidth, more capacity means more intelligence output per second.

This is where the analogy to energy becomes powerful. Think about oil. When oil prices drop, what happens? Demand for oil increases because cheaper energy enables more economic activity. The demand curve for energy is downward-sloping- lower prices stimulate consumption. There’s always more work that could be done, more goods that could be transported, more heat that could be generated, if only energy were cheaper.

I believe AI inference demand behaves similarly. If memory costs drop and inference becomes cheaper, that doesn’t mean demand for inference drops. It means more applications become economically viable. More AI agents get deployed. More models get served. More context windows get extended. The demand for intelligence, like the demand for energy, is essentially elastic in response to price declines. Cheaper intelligence leads to more consumption of intelligence, not less.

This is the polar opposite of the gadget cycle. When DRAM prices dropped after the 2018 boom, it didn’t cause people to go buy a second smartphone. The number of endpoints was fixed. But when the cost of running an AI inference call drops by 50%, you can bet that the number of inference calls per day will more than compensate. Every enterprise that was waiting on the sidelines because of cost will deploy its AI project. Every startup that couldn’t afford the compute will spin up their service.

Here’s a human analogy I think captures this well. Imagine two people: one is a genius with poor memory, and the other is of average intelligence but has extraordinary memory and recall. In many real-world tasks—medicine, law, engineering, customer service—the person with superior memory will outperform the genius. Why? Because most practical work isn’t about raw reasoning power. It’s about retrieving the right piece of information at the right time. An AI model with more memory (longer context, more parameters accessible, faster retrieval) will outperform a theoretically smarter model that is memory-constrained. Memory is intelligence in many practical applications.

This is not a theoretical argument. The industry data supports it. HBM capacity per GPU has been scaling aggressively: NVIDIA’s A100 had 80GB of HBM2e. The H200 moved to 141GB of HBM3e. The upcoming Blackwell Ultra configurations push toward 288GB. And the Rubin Ultra platform is targeting 288GB - 576GB of HBM4E per GPU. The trajectory is exponential, and every generation of GPU is constrained by memory, not compute.

Where we are today

The current memory cycle is already historic in scale.

DRAM prices have surged dramatically. By Q4 2025, DRAM spot prices were nearly triple their level from a year earlier. DDR5 prices jumped 30-50% per quarter through H2 2025. Samsung raised memory prices by up to 60% since September 2025. DRAM inventories at major suppliers fell to just 3.3 weeks by the end of Q3 2025—matching the 2018 supercycle lows. SK Hynix and Micron had roughly 2 weeks of inventory each.

AI is expected to consume nearly 20% of global DRAM wafer capacity in 2026 when adjusted for HBM’s 4x wafer intensity.

The valuation: The market doesn’t believe in the durability of this cycle

Here’s where it gets really interesting from an investment perspective.

Despite the strongest fundamental setup the memory industry has ever seen—sold-out HBM capacity through 2026, record margins, structural demand from AI, and a three-player oligopoly with pricing discipline—the market is still pricing these stocks as if a classic downturn is imminent.

Micron trades at a forward P/E of about 10x, SK Hynix trades at approximately 5.2x forward P/E, and Samsung trades at a forward P/E of roughly 5x-7x—although this includes the total company, which includes much more than just memory.

The PEG ratio makes the mismatch even clearer. Micron’s PEG is approximately 0.16x, Samsung is at 0.17, and SK Hynix is at 0.10—meaning the market is pricing almost zero growth premium into the stocks.

But at these valuation levels, the question is not whether these companies will continue to grow; it’s more about how long the current demand signals will last. If these memory demand levels and margins stay here for a few more years, that would be a scenario that markets are not pricing in.

Why? Because the market has been burned by memory cyclicality before. Investors remember that in the 2017-2018 supercycle, Micron stock peaked at ~$64 with a forward P/E of about 4-5x at the top, and then the stock fell 56% even though earnings were still rising. The conditioned response is “memory is peaking, get out before the crash.”

But this framing assumes the old cycle repeats. It assumes that the demand driver (AI infrastructure buildout and inference scaling) behaves like the demand driver in previous cycles (consumer device upgrades). And I believe that assumption could be wrong.

Why the downturn when it comes might be shallower

I’m not arguing that memory prices will never decline. They will. At some point, new fab capacity from current investment plans will come online. At some point, HBM4 yields will improve, and supply will catch up. The 2017-2018 cycle teaches us that supply response is inevitable.

But I believe the depth and duration of the downturn will be structurally different this time (dangerous words I know):

1. The end market is not bounded by human endpoints. In the PC cycle, once every household had a PC, demand plateaued. In the smartphone cycle, once penetration hit saturation, annual unit growth went to zero. But the number of AI inference calls per day is growing exponentially and is nowhere near saturation. Every enterprise, every consumer app, every autonomous vehicle, every AI agent is an incremental consumer of memory bandwidth.

This view is also shared by many industry experts. Here is a former high-ranking employee from ASML on this topic:

»The current conditions actually have made us move away from cyclicality simply because the ratio of the chips that go into laptops and cell phones and other personal-use devices is getting lower each day as the capacity gets transferred to AI-related infrastructure. We may not be able to predict the condition or state of these memory manufacturers based on cyclicality anymore.«

Source: AlphaSense

2. Memory content per AI unit is growing exponentially, not linearly. DRAM content per PC grew from maybe 4GB to 16GB over a decade—a 4x increase. HBM content per GPU is going from 80GB (A100) to 288GB - 576GB (Rubin Ultra) in just a few years—a 7x increase. And the number of GPUs being deployed is also growing at 30-40% annually. The compounding effect of more units × more memory per unit is producing demand growth rates the industry has never seen.

3. HBM is structurally supply-constrained. One gigabyte of HBM consumes approximately 4x the wafer capacity of standard DRAM. HBM also requires advanced packaging (CoWoS or its equivalents), which has its own supply bottleneck. You can’t just flip a switch and convert commodity DRAM lines to HBM production. The manufacturing complexity acts as a natural supply governor that didn’t exist in previous cycles.

4. Long-term contracts are dampening volatility. In a major shift from past cycles, memory companies are increasingly locking in multi-year supply agreements with hyperscalers. SK Hynix has finalized its 2026 HBM supply plan with major clients and expects supply to remain tight through 2027. Micron has sold out its 2026 HBM capacity and has pricing agreements already in place. These contracts reduce the spot market’s influence and provide revenue visibility that the memory industry has never had before.

On top of the long-term contracts, the memory providers are much more careful with investing in new capacity this time, as the past cycle scars are a strong reminder. Here is a comment from a current Microsoft employee on what they expect in terms of memory supply coming online:

» I don’t think anyone on the buying side assumes memory suppliers will automatically rush to add unlimited supply just because demand is strong. The history of boom-bust cycles is very real, and suppliers remember that just as well as buyers do.

From my perspective, the expectation isn’t that all suppliers aggressively overbuild, but that they add capacity in much more controlled stages way than in the past cycles. What is different this time is the nature of demand. A lot of AI-driven demand is tied to long-lived infrastructure programs rather than short consumer cycles, which gives the suppliers more confidence but not enough to blindly overspend.«

Source: AlphaSense

Perhaps the even more telling comment is this one made by a Fromer high ranking Micron employee on the internal cultural scars that the memory cycles have made:

»Micron has always positioned themselves as not the cheapest. Like I said, in the past, yes, when it was under Steve Appleton, Mark Durcan, Mark Adams, they’ve been trying to gain market share by reducing prices, but with the new CEO Sanjay, he is more focused on profitability rather than market share. Market share also is important, but if you were to choose between market share and profitability, he chooses profitability.«

Source: AlphaSense

5. The price elasticity of AI demand works in memory’s favor. If DRAM prices decline 20-30% (as they inevitably will at some point), the cost of running AI inference drops proportionally. This makes AI deployment cheaper, expanding the addressable market, which in turn supports memory demand. The demand floor is higher than in past cycles because cheaper memory creates new demand, rather than simply being absorbed by a fixed number of devices.

At some point, we will see a correction, but one that looks more like a 15-25% revenue decline and margins compressing to the 35-40% range, rather than the historic 30-40% revenue declines and sub-25% margins of previous busts. And crucially, I think the trough will be shorter, because AI inference demand will continue growing even during the cyclical correction, providing a demand floor that didn’t exist in the consumer device era.

The bottom line

The memory industry has spent 30 years teaching investors the same lesson: the cycle always turns, the crash always comes, and “this time is different” are the four most expensive words in investing. I respect that history deeply, and I’ve laid out the data to show you exactly how brutal those turns have been.

But I’m willing to bet against that lesson—partially—because the underlying demand driver has genuinely changed. That is why I also own stakes in SK Hynix and Samsung. Memory was a component in your gadget. Now it’s a substrate for intelligence. And the demand for intelligence—like the demand for energy, for computing, for connectivity—doesn’t follow the same saturation dynamics as consumer electronics.

The real risk for the memory cycle at the current stage is a technical breakthrough that would require orders-of-magnitude less memory and HBM, or a change that would bypass memory altogether. The chances of that happening today are low, but it is something to keep a close eye on all the time.

In the next section of this article for paid subscribers, I analyzed in detail how long I think this memory shortage and cycle will last, the timing of memory supply coming online for memory makers, including Chinese memory providers, and their possible effect on the market. Here is my take:

Amazon's value in the Age of AI Agents

UncoverAlpha — Thu, 05 Mar 2026 13:51:58 GMT

Hi everyone,

In this article, I’m breaking down my current thinking on Amazon. My goal here is to explain in detail the changes caused by AI on three pillars of the business: E-Commerce, AWS, and Advertising, and specifically how valuable each looks like in a world where AI agents increasingly sit between humans and the services they use. At the end, I’ll do a sum-of-parts valuation that I think gives a useful anchor for where the stock sits today.

Let’s start.

E-Commerce - the agentic threat and the logistics moat

Amazon captures roughly 40% of all U.S. e-commerce spending. It has 240+ million Prime subscribers globally (analyst estimates; Amazon last officially disclosed “over 200 million” in 2021), of which approximately 180–185 million are in the United States, representing penetration in about 80% of U.S. households. The Prime flywheel is well-documented: members spend on average $1,400/year, compared with $600 for non-Prime customers, and the retention rate after the first year is 99%, according to CIRP data. Amazon delivered over 8 billion items same or next day to U.S. Prime members in 2025, a 30%+ increase year-over-year.

This is the business everyone knows. But here’s the question that matters for the next 3–5 years: what happens when AI agents start shopping for consumers?

The agentic shopping risk

I strongly believe that in the future, most e-commerce shopping will be done through AI agents acting as personal assistants to consumers, instead of direct consumers. I am not alone in those expectations. McKinsey projects agentic commerce could generate $1 trillion in U.S. retail revenue by 2030. Morgan Stanley expects nearly 50% of American shoppers will use AI agents by then, potentially adding $115 billion in e-commerce spending. Bain research shows that 30–45% of U.S. consumers already use GenAI for product research and comparison. During Cyber Week 2025, roughly 1 in 5 orders on Shopify involved an AI agent. AI-driven traffic to retailer sites has surged 7x since January 2025, according to Shopify data, with AI-driven orders up 11x.

What’s happening is this: instead of opening the Amazon app, a consumer tells ChatGPT, Claude, or Gemini what they need. The agent searches across retailers, compares prices, checks reviews, and either completes the purchase or presents a shortlist. OpenAI has already embedded checkout directly into ChatGPT. Perplexity launched its Comet browser agent. Google is rolling out agentic AI shopping tools.

This is a major shift in consumer behaviour, and Amazon knows it. In November 2025, Amazon sued Perplexity for its AI browser agent making purchases on Amazon’s marketplace. The company has blocked 47 AI bots from crawling its site. But at the same time, CEO Andy Jassy acknowledged on their recent earnings call that agentic commerce “has a chance to be really good for e-commerce.” Amazon recently even posted a job for a principal corporate development officer specifically for “agentic commerce” partnerships.

Forrester retail analyst Sucharita Kodali captured the tension perfectly:

“With an agent on ChatGPT, retailers risk relinquishing transactions on their site to pay a toll on someone else’s highway.”

Amazon’s shot at owning the application layer

That said, Amazon isn’t conceding the front-end. They have several assets that give them a legitimate shot at being a surface where agentic shopping enters:

Rufus — Amazon’s AI shopping assistant, used by more than 300 million customers in 2025. Customers using Rufus complete purchases at a 60% higher rate. It can now auto-purchase items when prices hit thresholds.

The most interesting recent project is the »Buy For Me« project. This is Amazon’s experimental agent that can purchase from other retailers within the Amazon app. This is a smart flip from Amazon: instead of being the store that other agents shop, Amazon becomes the agent that shops everywhere else. Amazon does have some unique assets that make it valuable as the front-end touchpoint, and the key is around Prime Subscriptions.

Prime Video — 315 million ad-supported viewers globally, up from 200 million in early 2024. This is a massive surface for product discovery and agentic commerce integration, especially through interactive shoppable ads during live sports (Thursday Night Football averaged 15.3 million viewers, +16% YoY). Twitch — 105+ million monthly users, heavily Gen Z. An engaged, commerce-friendly audience. Alexa — still the most widely deployed voice assistant in smart home devices. If agentic commerce moves to a voice-first or ambient-first paradigm, Alexa has a head start.

The risk here is that those surfaces might not be enough and that Amazon might not be aggressive enough in the early days of where we are today. From today’s vantage point, the dominant surfaces if I had to choose would still be the smartphone assistant, or a standalone AI app (similar to ChatGPT, Gemini, Claude), and later on the AI glasses and personal assistant given by that provider. Prime Video and Twitch will still serve as important discovery platforms and could turn out to be much more valuable in terms of ads in a world where it will become increasingly hard to reach a human via digital channels, as internet usage will be dominated by AI agents instead of humans. Still, it doesn’t solve the fact that the application layer, where most of the e-commerce starts, moves to other providers. Even if Amazon were to launch an independent AI shopping assistant app, I don’t think in the long-term that would be »moaty« enough. My view is that the dominant provider will be the one that can offer a full AI personal assistant, with shopping as one of its features, not the only or main one. For that to be Amazon, they would need to make an aggressive pivot from current levels and a possibly strong shift into consumer hardware, which I don’t think is their plan.

With all that said, my base case is that Amazon will not be the application layer of agentic shopping and that its e-commerce business will move to the backend part of the shopping experience (still being important). Even in this scenario, Amazon still makes a decent margin given the logistics, payment, and fulfillment infrastructure that it offers at scale.

Advertising

Amazon’s advertising revenue hit $68.6B in 2025, growing 22% YoY in Q4. This is now 9.6% of Amazon’s total revenue, up from 5.9% in 2021. To put it in context, Amazon’s ad business alone is larger than the total revenue of companies like Netflix, Uber, or Salesforce.

But here’s the nuance that most analysts don’t discuss: Amazon’s ad business is really two very different businesses glued together.

Search ads

The vast majority of Amazon’s advertising revenue comes from Sponsored Products: essentially search ads within Amazon’s marketplace. When you search for “wireless headphones” on Amazon, the first several results are paid placements. Amazon doesn’t break this out precisely, but based on WARC data, the retail media component (primarily search ads) accounts for roughly $60.6B of the estimated total, with Prime Video and other upper-funnel formats making up the incremental portion.

Here is my concern: search ads on Amazon are fundamentally tied to humans browsing Amazon’s website and app. If an AI agent shops for you, it doesn’t look at sponsored listings. It doesn’t scroll past display ads. It skips right to the product that best matches your criteria and places the order. As Bain research noted, about 65% of retail media spending still occurs onsite, and that entire bucket is at risk if product discovery shifts to AI-driven search.

This is why I think the search ad portion of Amazon’s advertising business is on a disruption clock. Not tomorrow, not next quarter, but over a 3–5 year horizon, the economics of Sponsored Products face a structural headwind as agentic interfaces capture more of the purchase journey and as we talked in the previous section I give it a low probabiliticy chance that Amazon is able to capture the AI agent assistant application layer so the eyeballs switch from amazon’s site and apps towards the AI assistant owners.

Prime Video ads

The other side of Amazon’s ad business is Prime Video advertising, and this is the piece I think is defensible. Amazon introduced ads on Prime Video in January 2024. S&P Global Market Intelligence Kagan estimated Prime Video’s ad revenue at $433M in 2024 and forecast it to reach $806M in 2025. This is still a small fraction of total ad revenue, but it’s growing fast and serves a different function: brand advertising through streaming video is not susceptible to agentic disintermediation the same way search ads are.

Prime Video reaches 315 million monthly ad-supported viewers globally. That’s larger than Netflix’s ad-supported tier at 190 million. Thursday Night Football alone averaged 15.3 million viewers with 16% growth YoY, and the Packers-Bears wild-card playoff game drew 31.6 million viewers, the most-streamed NFL game in history. Amazon has also integrated Netflix and Spotify inventory into its Amazon DSP, giving advertisers a broader programmatic buying platform.

My estimate is that by 2027–2028, Prime Video ads could reasonably be a $3–5B annual revenue stream, growing at 40%+ rates as ad loads increase and live sports inventory expands (NBA deal kicks in, international sports expansion). This business is much more structurally defensible because people watch content — AI agents don’t.

But even that revenue doesn’t materially change my thesis that the majority of Amazon’s ad business is at risk of serious disruption.

For the sum-of-parts analysis in the last part of this article, I’m splitting the ad business into two buckets. For the search/retail media portion (~$60–63B), I’m assigning it a terminal value as if profits only last 4 more years with zero terminal value after that. That’s deliberately punitive - I’m assuming this revenue stream is structurally impaired. For Prime Video ads, I’ll fold it into the e- commerce/subscription ecosystem, where it has long-term durability.

AWS - the cloud business

AWS is the most important reason why I own Amazon stock and why it has now become my biggest portfolio position.

The biggest fear around AWS has been that AI-related capital expenditures would permanently compress margins. And yes, there was a dip: AWS’s operating margin fell to 32.9% in Q2 2025 as the company ramped up spending aggressively. But by Q4, it had recovered to 35.0%, and the full-year margin was 35.4%.

Here is my core argument: we are severely compute-constrained for the foreseeable future. Amazon has invested $131.8B in capex for 2025 and has guided to approximately $200B for 2026, predominantly for AWS infrastructure. The company added more than 1 gigawatt of data center capacity in Q4 alone and 3.9 gigawatts in the trailing 12 months, which is double what AWS had in total in 2022. And Andy Jassy expects to double power capacity again by the end of 2027.

Despite this massive buildout, demand continues to outstrip supply. Jassy noted on the Q1 call that GPU and motherboard shortages were limiting the pace of AI workload onboarding. Bedrock (Amazon’s managed AI service) reached a multi-billion-dollar annualized run rate with customer spend growing 60% quarter-over-quarter to a base of over 100,000 customers. Trainium2 is fully subscribed with 1.4 million chips deployed.

In this environment, there is no incentive for hyperscalers to engage in a pricing war. When every chip you install is immediately monetized, you don’t cut prices — you add capacity. Until compute supply catches up with demand (which I don’t expect before 2029 at the earliest), AWS can maintain mid-30%+ operating margins without sacrificing growth. The margin should hold around pre-AI era levels (AWS operated in the 28–35% range historically, with 2024 averaging 37%) because the scarcity dynamic supports pricing power.

Trainium and Custom Silicon are key things for long-term margins

This is a point I don’t think gets enough attention. NVIDIA’s gross margin sits at roughly 73–75%. Every cloud provider that is 100% dependent on NVIDIA for AI compute is paying that tax on every GPU. That cost flows through to the cloud provider’s cost of revenue and structurally limits the margin they can earn on AI workloads.

Amazon, through its Annapurna Labs subsidiary, has developed Trainium and Inferentia custom ASICs, as well as Graviton CPUs for general compute. Combined, these custom chips have surpassed a $10B annualized revenue run rate, growing at triple-digit percentages YoY. According to Amazon, Graviton provides 40% better price-performance than x86 processors and is adopted by 90% of AWS’s top 1,000 customers.

Trainium2 powers Project Rainier, the world’s largest operational AI compute cluster with 500,000+ Trainium2 chips, which Anthropic uses to train its Claude models. Trainium3 is in preview with broader volumes expected in early 2026, and Trainium4 is targeted for 2027.

I am sharing here the chart that we made some months ago in our detailed Amazon Trainium piece, where we calculated the manufacturing costs of Amazon Trainium, Google TPUs, and Nvidia’s Blackwell B200:

You can see the most significant difference: if it costs Amazon $ 3,000-$ 3,500 to produce a Trainium3 chip, it costs them $35k-$40k to buy an Nvidia B200 chip. Even though B200 is much more performant from a cost-of-ownership perspective, Trainium3 gives B200 a run for its money.

The margin math is straightforward. When you design and manufacture your own silicon (using TSMC for fabrication and a design partner like Broadcom, Marvell, MediaTek), your cost per unit of compute is significantly lower than buying merchant silicon from NVIDIA at a 73–75% gross margin. This gives AWS a structural margin advantage for AI workloads vs. a competitor that sources 100% from NVIDIA. It doesn’t mean AWS abandons NVIDIA (it still offers NVIDIA instances), but having an alternative lets AWS capture more of the AI value chain and maintain margin in ways that someone who is entirely dependent on NVIDIA simply cannot.

This key difference will prove even more important in the coming years, especially once demand/supply for compute is more in balance and the hyperscalers’ focus shifts from capturing revenue growth to profitability and customer optimization.

Traditional cloud demand is actually accelerating because of AI agents

There’s a narrative that AI is all that matters for AWS growth. That misses something important: AI agents themselves create enormous demand for traditional cloud services, as we already discussed in part in our The Forgotten Chip: CPU the New Bottleneck of the Agentic AI era article. Every AI agent needs storage (S3), compute (EC2, powered increasingly by Graviton), databases, networking, and monitoring. The more AI agents there are in production, the more traditional cloud infrastructure gets consumed.

The number of AI agents and their deployment is rapidly surging right now. Here is an alt provider that tracks the Model Context Protocol (MCP), an open-source standard for connecting AI applications to external systems.

source: Bloomberry

The number of MCP servers being set up every month is growing exponentially, and the MoM pace is accelerating. The market is still in very early stages, as the current number of MCP servers is probably less than 1% of the API market. But the interesting thing was comparing where these MCP servers were being deployed with where API deployments are. Both Azure and GCP % of these MCP server deployments were lower compared to their API deployments, while AWS MCP deployments actually rose compared to API deployments:

source: Bloomberry

While this data is not large enough yet, it could indicate that more smaller companies are on AWS than on the other two hyperscalers and that they are early adopters. The data in some form also shows the importance of AWS’s »legacy« cloud infrastructure, which is very much needed in the Agentic AI phase.

The demand for traditional AI infrastructure is skyrocketing, and you can see it from comments from CEOs of AMD and Intel, where they are basically sold out of CPUs, and you can see it when talking to other industry experts.

This is a former Amazon employee talking about the usage of the AWS S3 service (storage):

»Right now, S3, nobody is thinking about how S3 is exploding. It’s quite an explosion because what are AI systems doing? They’re generating embeddings, they’re storing the prompts and the responses. Where are they storing this? S3. They’re logging every interaction for auditing, tuning, safety. All of this is going into S3. Remember when we used to think of S3 as just their cheap storage? The storage is still cheap, but you’re using more of it«.

source: AlphaSense

The key takeaway from this comment is the need to store this data for audit and safety purposes. Companies running these AI agents need clear, auditable trails of what an AI agent has done, so they can track and monitor as problems arise and fix them. Nobody wants to give an AI agent full permission to run freely across the company's stack and make changes and tasks that nobody has visibility into. This human visibility means services like Storage grow even more in usage.

A big tell was also the November OpenAI- AWS deal. The press release stated that OpenAI would access “hundreds of thousands of state-of-the-art NVIDIA GPUs, with the ability to expand to tens of millions of CPUs to rapidly scale agentic workloads.”

The GPU part is known, but the CPU part is the most interesting one. We need »legacy« cloud workloads and CPUs to enable the AI agent economy; that is just the way it is, and this is a big uplift for AWS, which has the largest fleet of optimized cloud services out there.

Amazon noted that more of the top 500 U.S. startups use AWS as their primary cloud provider than the next two providers combined. That startup and scale-up cohort is building AI-native applications that are heavily cloud-intensive.

The on-prem fallacy and SMB lock-in

Some investors argue that AI inference will eventually move to the edge or on-prem, killing the cloud growth story. Let me push back on this.

First, even if 90% of personal AI assistant use cases eventually run on edge devices (phones, laptops, local hardware), the remaining 10% that stays on cloud or on-prem infrastructure is still an enormous market. These are the “god-like AI” use cases: complex enterprise reasoning, multi-step agentic workflows, financial modeling, drug discovery, code generation at scale. These require the kind of compute density and model size that doesn’t fit on a phone. And these use-cases are the most profitable, as their outputs are the most valuable.

Second, on-prem AI infrastructure is radically more complex than anything businesses have managed before. Running an AI inference cluster on-prem means managing GPU and CPU servers, networking fabric, cooling systems, model deployment pipelines, and monitoring at a level of sophistication that most IT departments have never dealt with. For any small or medium-sized business, the cost and complexity of running your own AI infrastructure to have your “AI accountant” or “AI customer service agent” simply doesn’t make sense when you can rent it from AWS for a fraction of the upfront cost with zero operational hassle.

The cloud is the natural home for AI workloads for the vast majority of companies, and that reality isn’t changing anytime soon. If anything, as AI becomes more central to knowledge work, more companies will move to the cloud specifically to access AI capabilities they can’t build or run themselves.

The revenue trajectory of AWS

With all that in mind, I believe AWS, with its power capacity availability, which I already discussed in my previous articles, is well-positioned for multiple quarters of accelerating growth. I believe, despite AWS’s size, we will soon see the segment grow by +30% YoY. AWS also exited Q4 2025 with a backlog of $244B with a weighted average remaining life of 4.1 years. Capacity is being installed and monetized as fast as it comes online.

If AI agents truly absorb a meaningful portion of knowledge work over the next 5–10 years — and companies like Anthropic ($19B ARR rate up from $9B just two months ago) and OpenAI are building the models to do exactly that — then the total demand for cloud inference is going to be multiples of what it is today. Every AI-powered accountant, lawyer, engineer, customer service agent, and analyst running in the cloud creates recurring compute demand.

The Anthropic and OpenAI stakes are hedges

Besides the already mentioned segments, Amazon also has other important aspects such as Project Kupier, Subscription business, and stakes in Anthropic and OpenAI, which are now becoming increasingly important.

Amazon has invested approximately $8B in Anthropic (capped below 33% ownership) and recently announced a strategic partnership with OpenAI that includes an investment of up to $50B (starting with an initial $15B commitment, with the remainder tied to milestones and a potential OpenAI IPO).

Anthropic just closed a $30B funding round at a $380B post-money valuation in February 2026. If Amazon holds roughly 20% of Anthropic (estimates vary given the cap structure), that stake is worth $76B on paper. But in the last few weeks, Anthropic has accelerated its adoption and revenue growth so much that a $500B valuation for a company that will probably exit 2026 at a $50B ARR growing 5x YoY and disrupting the whole knowledge work economy is nothing extraordinary, which would add $100B of value or almost 5% of Amazon’s current market cap.

For OpenAI, the proposed $100B funding round would value the company at approximately $830B. Amazon’s $50B investment at those terms would represent roughly a 6% stake.

Combined, these stakes could be worth +$145B. And here’s the real value: in a world where Anthropic, OpenAI, and Gemini become the application layer, having significant stakes in two of those companies isn’t just financial investments. They are Amazon’s guarantee that the biggest AI consumers remain AWS customers. OpenAI has committed to spending $100B on AWS over the next eight years. Anthropic is using Project Rainier (500,000+ Trainium2 chips) for training. Both are locked in as massive cloud customers.

Valuation

Now let’s put the numbers together. I’m deliberately being conservative in places and factoring in serious disruption risk. Here are my numbers:

The Forgotten Chip: CPUs the New Bottleneck of the Agentic AI Era

UncoverAlpha — Mon, 23 Feb 2026 13:50:17 GMT

Hey everyone,

For three years, GPUs have been the only chip that mattered in AI. Every investor pitch, every earnings call, every CapEx headline was about who could get more Nvidia GPUs.

CPUs? An afterthought. The boring, commodity chip that just sat next to the GPU and passed data along. Nobody cared. That’s changing fast. And if you’re not paying attention to the “CPU renaissance” happening right now, you’re missing what I believe is one of the more important infrastructure shifts in this AI cycle.

In this article, I will break down exactly why agentic AI is changing the CPU demand, how exactly CPUs are used in agentic AI, how big the CPU market can become because of AI agents, and which public companies stand to benefit. I’ll also discuss whether we’re heading into a genuine CPU bottleneck and how long it could last.

Why Agentic AI Changes Everything for CPUs

To understand why CPUs suddenly matter, you need to first understand how agentic AI workloads are fundamentally different from the »classic« chatbot-style AI we’ve been running for the past three years.

The old workflow — chatbot:

When you use ChatGPT or any standard AI chatbot, the process is straightforward. You type a question, the CPU tokenizes it (converts your text into numerical tokens the model can process), ships it over to the GPU, the GPU runs the tokens through the model and generates a response, then ships the output back to the CPU, which de-tokenizes it and delivers the answer. In this workflow, the CPU does very little. Maybe 5-10% of the total compute. The GPU is doing all the heavy lifting with its matrix multiplications, attention calculations, and token generation. This is why, for three years, the entire industry was laser-focused on GPUs.

The new workflow — agentic AI:

Agentic AI is fundamentally different. Instead of a simple question-answer loop, you’re dealing with autonomous systems that plan, execute, use tools, browse the web, query databases, make API calls, write and run code, and then reflect on whether they did a good job before deciding what to do next. A single user request can spin off dozens or even hundreds of sub-agents, each running their own loops of reasoning and action in parallel.

All of that orchestration, tool calling, API handling, memory management, and coordination between sub-agents happens on the CPU, not the GPU. The GPU still handles the inference (the “thinking” part), but between each inference call, the CPU is doing an enormous amount of work. It’s parsing responses, deciding which tool to call next, managing the execution plan, handling file I/O, running code, making network requests, and coordinating which sub-agents depend on which other sub-agents’ results.

In an interview, a VP at Intel explained:

“Agentic AI is nothing but a combination of independent agents... If there are in workflow, say, 10, 20, 30, 40, 100 agents, and they all need to talk to them, then they need different locations to operate. When I say location, I talk about CPUs.”

source: AlphaSense

A Georgia Tech and Intel research paper from November 2025 quantified this, and the findings are striking: tool processing on CPUs accounts for between 50% and 90% of total latency in agentic workloads. In many agentic workflows, the CPU is responsible for the majority of the wait time, not the GPU. The GPU sits idle, waiting for the CPU to finish its work before it gets the next batch of tokens to process.

This completely inverts the infrastructure economics we’ve been operating under. In the chatbot era, you needed a small number of high-end CPUs paired with massive GPU clusters. In the agentic era, you potentially need more CPUs than GPUs, and the CPU-to-GPU ratio in a rack or cluster needs to go up significantly.

“For every GPU workload, there is a supporting CPU demand. The CPU is going to handle the data processing, the orchestration, the API layers, post processing.”

Source: AWS employee on AlphaSense

Breaking Down the CPU Workload in Agent Systems

Let me walk through what the CPU actually does in an agentic workflow, because I think understanding the details here is important for appreciating why this demand is structural and not a temporary blip.

Step 1: Planning: The user gives a broad instruction (e.g., “Research the competitive landscape of the DRAM industry and write me a report”). The CPU tokenizes this and sends it to the GPU for an initial inference call. The GPU generates a plan of execution, not a final answer. That plan comes back to the CPU.

Step 2: Orchestration: The CPU now breaks that plan into sub-tasks and assigns them to multiple agents. This is pure CPU work. It’s managing a directed acyclic graph of tasks, determining which ones can run in parallel, which depend on others, and in what order they should execute. If you have 10 research sub-topics, you might have 10 sub-agents that can all run simultaneously.

Step 3: Tool execution: Each sub-agent starts working. This is where CPUs get extremely busy. Sub-agent 1 might make a web search API call, wait for results, parse the JSON response, extract relevant text, and package it for another inference call. Sub-agent 2 might query a database, run a SQL query, and process the results. Sub-agent 3 might open a file, read its contents, and prepare them for analysis. All of this — the API calls, network I/O, file handling, data parsing, JSON processing — is CPU work. The GPU is idle during these operations.

Step 4: Inference loops: Each sub-agent may also run its own chain-of-thought reasoning, sending multiple inference requests to the GPU. Between each inference call, the CPU processes the output, decides if the agent is done, and either feeds the next prompt or moves to the next step.

Step 5: Reflection: Once all sub-agents complete, the CPU gathers all their outputs and sends them to the GPU for a reflection inference loop — essentially asking the model, “did we answer the original question well enough?” If not, the whole cycle restarts. The key characteristics a CPU needs for this kind of workload are: high single-core clock speed (to minimize orchestration latency), high core count (to run many agents in parallel), fast memory access and large caches (to manage all the context and intermediate state), and strong I/O connectivity (PCIe lanes for network and storage, because agents are constantly hitting APIs and databases).

The AI server factories sitting above your general-purpose compute infrastructure don’t replace those traditional CPU servers. They create more demand for them. Because now, instead of one human slowly browsing the web and running a few apps, you have hundreds of AI agents aggressively consuming CPU resources at machine speed.

The demand for CPUs is already showing up in earnings calls

This new CPU demand has already been shown in recent earnings calls.

On AMD’s Q4 earnings, AMD’s data center segment posted record revenue of $5.4 billion in Q4 2025, up 39% year-over-year and 24% sequentially.

But the key wasn’t the GPUs but the CPUs. Lisa Su explicitly called out CPUs as a major growth driver, stating:

“demand for EPYC CPUs is surging as agentic and emerging AI workloads require high-performance CPUs to power head nodes and run parallel tasks alongside GPUs.”

AMD’s 5th Gen EPYC Turin CPUs accounted for more than half of total server CPU revenue by the end of Q4, and the number of EPYC cloud instances grew more than 50% year-over-year to nearly 1,600 instances. The number of large enterprises deploying EPYC on-premises more than doubled in 2025. Su specifically highlighted that in agentic workflows, when AI agents spin off work in an enterprise, “they’re actually going to a lot of traditional CPU tasks.” She expects the server CPU market to grow by “strong double digits” in 2026.

Su also noted that “x86 processors have a particular edge in agentic workloads where AI agents spin off work to traditional CPU tasks, with the vast majority of such tasks running on x86 today.”

Looking ahead, Su guided for data center segment revenue to grow more than 60% annually over the next three to five years and for AMD’s AI business to scale to tens of billions in annual revenue by 2027. CPUs are a meaningful piece of that equation, not just GPUs.

And it’s not just the earnings call, you can also see it from multiple conversations with industry experts.

A former CTO of a HP competitor highlights that infrastructure is moving from static policy-based routing to “inference-based” routing. An AI-powered controller layer, running on CPUs, dynamically analyzes incoming workloads to determine whether they require expensive GPU cycles or can be offloaded to traditional x86 CPUs, optimizing resource allocation.

Agentic AI often involves deterministic tasks—such as following a specific rule set or executing a defined API call—that do not require the probabilistic power of a GPU. A Director at a Global Consultancy notes that these deterministic aspects of agentic workflows are most efficiently executed by CPUs, reinforcing the need for a balanced infrastructure where GPUs handle the “thinking” and CPUs handle the “doing”

The CPU demand was a shock for Intel

If AMD saw the CPU demand wave coming, Intel was genuinely surprised by it. Intel’s Q4 revenue came in at $13.7 billion, above guidance, with data center and AI revenue rising 15% sequentially — the fastest sequential growth this decade. But here’s the key: Intel admitted it couldn’t meet all the demand.

CEO Lip-Bu Tan said the company “delivered these results despite supply constraints, which meaningfully limited our ability to capture all of the strengths in our underwriting markets.” CFO David Zinsner was even more direct, admitting that Intel “misjudged” the pace of data center CPU demand and that the company is now “shifting as much as we can over to the data center” by reallocating wafer capacity from client (PC) CPUs to server CPUs.

Zinsner acknowledged that Intel is “absolutely constrained” and is deprioritizing the low-end client market to push capacity into data center products. Intel expects its supply to hit a low point in Q1 2026 before improving in Q2, but in the meantime, revenue “would have been higher if we had more supply. Management explicitly positioned CPUs as “central to AI orchestration and scaling inference.”

The AWS-OpenAI Deal was the tell

The most interesting data point on CPU demand came not from a chip company but from a cloud infrastructure deal back in November 2025. AWS and OpenAI announced a $38 billion, seven-year strategic partnership. The press release stated that OpenAI would access “hundreds of thousands of state-of-the-art NVIDIA GPUs, with the ability to expand to tens of millions of CPUs to rapidly scale agentic workloads.”

People wrongly focused on the Nvidia GPU part, but the CPU part is far more interesting. Tens of millions of CPUs. For agentic workloads. They didn’t have to include that detail. The fact that it’s in the official announcement tells you how seriously the frontier AI labs are thinking about CPU compute as a scaling requirement. All capacity under this agreement was targeted for deployment before the end of 2026, with options to expand into 2027.

Nvidia — The Vera CPU

Nvidia itself is making a big bet on the CPU side. Its upcoming Vera CPU, part of the Rubin platform announced at CES 2026, is specifically designed for agentic reasoning workloads. Vera delivers up to 2x the performance of the previous Grace CPU, with 88 cores per die and significant uplifts in memory and chip-to-chip bandwidth.

What’s particularly notable is that Nvidia announced Vera can be deployed as a standalone platform for agentic processing, separate from the GPU. CoreWeave is set to use standalone Vera CPUs, and Jensen hinted in a Bloomberg interview that “there are going to be many more” standalone CPU deployments. And it didn’t take long the Meta & Nvidia deal was announced a few days ago:

»This partnership will enable the large-scale deployment of NVIDIA CPUs and millions of NVIDIA Blackwell and Rubin GPUs, as well as the integration of NVIDIA Spectrum-X™ Ethernet switches for Meta’s Facebook Open Switching System platform…The collaboration represents the first large-scale NVIDIA Grace-only deployment.«

This is Nvidia essentially confirming the thesis: in agentic AI, the CPU-to-GPU ratio needs to go up, and some workloads may be purely CPU-bound.

Are We Heading Into a CPU Bottleneck?

We’re already in one. The server CPU supply chain is under significant stress, and the constraints are coming from multiple directions simultaneously.

Intel is struggling with yield issues at some of its fabs, slowing the production ramp for newer Xeon parts. The company has admitted it cannot meet demand and is reallocating capacity from PC CPUs to server CPUs, meaning the PC segment will take a hit. Intel expects supply to improve starting Q2 2026, but the situation remains “acute” in Q1.

TSMC is prioritizing AI accelerators, which means less capacity for CPUs. AMD’s server CPUs are manufactured by TSMC, but TSMC is aggressively prioritizing its advanced node capacity for higher-margin AI accelerator chips (GPUs and custom ASICs). TSMC chairman C.C. Wei publicly stated that advanced-node capacity is “about three times short” of what major customers plan to consume. When TSMC’s 3nm process is running at 160,000 wafers per month and that’s still not enough, and when CoWoS advanced packaging capacity is sold out through 2026, CPU wafer allocation gets squeezed as a collateral effect.

Intel has also already warned Chinese customers of delivery lead times of up to six months for certain server CPUs. AMD’s lead times have stretched to 8-10 weeks for some products. Intel server chip prices in China have risen more than 10%. China represents over 20% of Intel’s total revenue, and major customers like Alibaba and Tencent are affected.

An additional problem to supply is the memory-driven pull-forward. The severe global memory shortage is creating a rush effect on CPU purchases. When memory prices started rising in China late 2025, customers accelerated CPU purchases to lock in system-level pricing before costs spiraled further. This pull-forward exacerbated the existing supply tightness.

A cloud computing materials manager reports:

“Our supply chain was a constraining factor... GPU, CPU, and RAM were the top three drivers for us being constrained” as customers convert to “more powerful CPUs that can run higher AI workloads.”

Source: AlphaSense

A global IT distributor reports CPU shortages are “directly driving a 30% increase in average selling prices (ASPs) during the fourth quarter of 2025” with “increased backlogs” as order intake exceeds expectations.

So the CPU bottleneck is already here; the question now is how long it will last.

In the next section, I analyzed how many CPUs we will need in this agentic AI and gave a timeline of when supply could meet the demand, on top of which companies stand to benefit most from this trend:

The Market Hates Big Cloud Spending. The Data Says The Market Is Wrong.

UncoverAlpha — Wed, 11 Feb 2026 14:17:23 GMT

Hey everyone,

Because there has been an emergence of fear after big tech earnings related to CapEx AI spending, I decided to share my views on this topic and why I believe the fears around it are wrong at this point.

We had earnings from Meta, Microsoft, Google, and Amazon, and all of them increased AI CapEx substantially. Microsoft CapEx went from $63 billion in 2025 to a guide of over $100 billion for 2026; Google (Alphabet) went from $91 billion to a range of $175 billion to $185 billion; Meta went from $72 billion to a range of $115 billion to $135 billion; and Amazon went from $131 billion to a $200 billion guidance for 2026.

At this point, given the massive increases in these CapEx number as an investor, you are either on the side of the group of investors who don’t believe the companies will be able to deliver revenues and profits on these investments, or you are on the side who do believe, and based on that, the future revenue and profit outlooks are very high. Given the stocks mostly sold off on this CapEx news, the »bears« took over, but I don’t agree with them, and this article explains why. In the last part of the article, I also break down which hyperscaler looks best positioned to further accelerate their growth in 2026 and 2027.

The CEOs of all these businesses are not only telling you they believe the profits will be there, they are already showing you growth that came from 2023 AI investments (that keep in mind often were also criticized for being outlandish), and more importantly, they are showing you PROFITS on these AI investments.

Here is what the market has missed.

The Q4 2025 AI Earnings profits were overshadowed by future CapEx numbers

Before we go into the actual numbers, the fact is that we got really strong commentary from nearly all the big tech CEOs on AI revenue and returns from these investments. One might think that they are saying this because it is in their interest, but that is not really true. For most big tech companies like Google and Microsoft, it is actually in their interest for AI progress to grow at a more gradual rate than the exponential one it has today. The reason is that a lot of their business lines face disruption risk (Google Search, Microsoft software business, etc.). So these strong commentaries from these CEOs should be taken differently than comments coming from startups like OpenAI, Anthropic, and xAI, who, in some way, have to project the fast growth curve of AI as they need to raise new capital rounds, so they naturally have to project confidence both in terms of companies as well as the market in general.

We got some interesting comments from Amazon, which has a history of being very strict and efficient in its data center business. Andy Jassy confirming multiple times the confidence in the return on invested capital:

»We have deep experience understanding demand signals in the AWS business and then turning that capacity into strong return on invested capital. We’re confident this will be the case here as well.«

»We have, I think, a fair bit of experience over the years in AWS of forecasting demand signals and doing it in such a way that we don’t have a lot of wasted capacity and that we also have enough capacity to serve the demand that’s there.«

»And I think we’ve also proven with AWS over the years in how we build data centers and how we run them and how we invent in there, if you think about our chips and our hardware and our networking gear and how we’ve invented in power that this isn’t some sort of quixotic top line grab, we have confidence that we -- that these investments will yield strong returns on invested capital. We’ve done that with our core AWS business. I think that will very much be true here as well.«

Jassy even confirmed that as soon as they bring new capacity online, it’s essentially sold out:

»And what we’re continuing to see is as fast as we install this capacity, this AI capacity, we are monetizing it. And so it’s just a very unusual opportunity. And so we see that following the same sorts of patterns we saw in the early days of our core AWS investment. I’m very confident we’re going to have strong return on invested capital here.«

From the historic understanding of Amazon in terms of words, they often underhype, so a comment like this was very telling:

»I think this is an extraordinarily unusual opportunity to forever change the size of AWS and Amazon as a whole.«

Remember, just before AI, there was a big trend of companies moving workloads from the cloud back to on-prem because they thought many cloud workloads were too expensive. Now companies are realizing that AI workloads will need to be on cloud, because companies don’t have the resources or even the possibility to manage complex data centers with liquid cooling requirements (most data centers don’t have the option of liquid cooling), GPU utilization rates and managing multiple AI accelerators (Nvidia GPUs, AMD GPUs, ASICs like TPUs, Tranium). Because of this, they have also started moving non-AI workloads to the cloud, as the data needs to be close to the AI workloads for them to run properly.

»We’re continuing to see strong growth in core non-AI workloads as enterprises return to focusing on moving infrastructure from on-premises to the cloud«

Because of AI, the cloud providers have increased their cloud » lock-in « and are growing even non-AI workloads.

I already talked about this trend just a few weeks ago in my Q4 alternative report article, where we showed this chart confirming that companies are going to move to the cloud at an accelerated pace again over the next 2 years:

But it wasn’t just Amazon talking about AI returns; the other Big Tech companies were, too. Meta gave a lot of color on how AI investments are already showing up in their business results:

»In Q4, we doubled the number of GPUs we used to train our GEM model for ads ranking. We also adopted a new sequence learning model architecture, which is capable of using longer sequences of user behavior and processing much richer information about each piece of content. The GEM and sequence learning improvements together grow a 3.5% lift in ad clicks on Facebook and a more than 1% gain in conversions on Instagram in Q4.«

»Instagram Reels had another strong quarter with watch time up more than 30% year-over-year in the U.S. Engagement is benefiting from several optimizations we made to improve the quality of recommendations including simplifying our ranking architecture to enable more efficient model scaling.”

On Facebook, video time continued to grow double digits year-over-year in the U.S., and we’re seeing strong results from our ranking and product efforts on both feed and video surfaces.«

»The optimizations we made in Q4 drove a 7% lift in views of organic feed and video posts on Facebook, resulting in the largest quarterly revenue impact from Facebook product launches in the past two years.«

Meta is seeing results from AI in both better ad targeting and engagement trends. The results actually »revived« Meta’s core and oldest platform, Facebook, which is seeing growth rates it hasn’t seen in years. But AI is opening up other avenues of growth at Meta:

»Another area we’re deploying AI to improve performance is ad creative. The combined revenue run rate of video generation tools hit $10 billion in Q4, with quarter-over-quarter growth outpacing the increase in overall ads revenue by nearly 3x.«

The returns are not only affecting their revenue but also the productivity of their teams:

»Since the beginning of 2025, we’ve seen a 30% increase in output per engineer with the majority of that growth coming from the adoption of agenetic coding, which saw a big jump in Q4. We’re seeing even stronger gains with power users of AI coding tools, whose output has increased 80% year-over-year. We expect this growth to accelerate through the next half. «

But despite these gains, Meta is telling us that it’s still very early as they are still using a limited amount of LLMs, as they have to either optimize them with SLMs because of compute limitations, or are still in the early stages of deploying these LLMs through their product stack:

»We’re also working on merging LLMs with the recommendation systems that power Facebook, Instagram, Threads and our ad system. Our world-class recommendation systems are already driving meaningful growth across our apps and ads business, but we think that the current systems are primitive compared to what will be possible soon.«

»We don’t typically use our larger model architectures like GEM for inference because their size and complexity would make it too cost prohibitive. So the way that we drive performance from those models is by using them to transfer knowledge to smaller lightweight models used at run time. But I would say that we think that there is room for our larger models to benefit from having more compute.«

All of this resulted in Meta giving the highest revenue growth guide in almost 5 years. And despite the higher CapEx guide and costs stemming from both OpEx (new AI team costs + compute costs on public cloud providers) and higher amortization costs, Meta confirmed that they expect 2026 to deliver operating income above 2025.

In terms of Google, Google Cloud grew 48% YoY, one of the highest growth rates among businesses of this scale. Google Search actually grew 17% YoY, which is another growth rate for Search that hasn’t been seen for quite some time. On the call, management even commented that Search saw more usage in Q4 than ever before, as »AI continues to drive an expansionary moment for Search«.

But even ignoring all the commentary from these companies’ management, let’s look at the hard numbers.

First, starting with revenue. All three hyperscalers are essentially selling all the compute they have available; if they had more, they would grow revenue even faster.

AWS grew 24% YoY, Azure grew 39% YoY, and Google Cloud grew 48% YoY. Their backlogs are growing even faster.

From the current revenue growth on top of the backlogs, we can clearly see that the hyperscalers are again growing significantly due to AI workloads. The standout in the quarter, as we correctly pointed out in our alternative data report before earnings, was Google Cloud. It is clear that the AI spend in the past is translating to real revenue growth. So the notion that these companies are spending only on CapEx and we can’t see revenue from it is false. Now, the questions and the narrative in the market are that the profits won’t come from this revenue stream.

The main argument for this thesis is that AI workloads will have a lower long-term profile margin, and, secondly, that people are calculating returns based on projected CapEx guides and comparing them to current revenues.

If we first tackle the CapEx argument. It is important to understand that the CapEx a hyperscaler spends on a data center this year will be utilized over a 2-year period, as it takes around 2 years to build and operationalize a data center. So when people look at 2025 revenue growth for the hyperscalers, they should translate that into CapEx spent in 2023, not in 2024 or 2025. When we are in a period like we are today, when YoY CapEx growth (estimates for 2026) are +53% (AWS), +93% (Google Cloud), +59% (Microsoft Cloud), the math doesn’t make much sense when we compared to 2025 revenues, because we should be really comparing 2023 CapEx to 2025 revenue growth.

If we look at 2023 CapEx numbers, we can see that both Microsoft and Google increased CapEx in 2023 by 17.5% to $32.3B and $28.1B vs 2022 levels, while Amazon reduced CapEx by 17% YoY to $52.7B, although based on my calculations, only a -10% reduction of CapEx in AWS to $24.8B. Now, if we compare those CapEx numbers to the revenues generated by hyperscalers in 2025, the math makes a lot of sense, as yearly revenue additions are outpacing CapEx spending.

Even for the most conservative investors out there, we can take the example of Google Cloud and even take the 2024 CapEx and compare it to the Q4 2025 results:

Google’s 2024 CapEx was $52.5 billion, with roughly $42 billion going to technical infrastructure (cloud/AI). Google Cloud grew from $48 billion (2024) to $70.8 billion (2025)—a $22.8 billion increase.

At the new 30.1% operating margins:

$6.9 billion in first-year operating income from 2024 CapEx

Add depreciation (as operating margin already includes that): +$7.0 billion (6-year schedule at Google)

Total first-year cash: $13.9 billion

First-year return: 33%

But here’s where Google’s trajectory gets interesting. They went from 5% margins (2023) to 17.5% (Q4 2024) to 30.1% (Q4 2025). If margins stabilize at 30% (which I actually think will grow even further) and they run that 2024 infrastructure for five years:

Cumulative OI: around $45 billion

Add depreciation: +$42 billion

Residual value (data center shell): +$8 billion

Total: $95 billion on $42 billion invested

ROI: 126% over 5 years, or a 18% IRR

And that still assumes growth moderates significantly from the current 48% YoY pace, while the margin stays at the 30% level and doesn’t improve.

With the increased pace of 2026 CapEx growth, the hyperscalers are essentially telling us what the revenue additions and, with it, growth rates will be for 2028.

Moving to the argument that the long-term margin on AI workloads will not be good compared to the pre-AI period. The numbers so far do not suggest this at all. Here is a look at AWS and Google Cloud’s operating margins over the last quarters, where AI workloads accounted for the majority of growth.

The operating margin either held up in the same range as AWS (where the % of AI workloads compared to others is still smaller) or increased significantly at Google Cloud, where AI workloads are a bigger piece of the pie. Here, we have to acknowledge that Google Cloud is not only GCP, but nonetheless, commentary from management in all the latest quarters has been that GCP growth rates are even higher than total Google Cloud growth rates, so we should have seen a trend of lower operating margin, not higher, if AI workloads carried a low margin profile. An additional point to consider is also that a lot of the AI workload spend at GCP in this period were coming from Anthropic, which is one client that has much more negotiating power in terms of pricing then a bunch of smaller clients where the cloud providers are moving now as inference AI workloads start to take up more space as companies move their AI use-cases to production. Important in the context of margin is also the statement made by Google on its last earnings call:

“We were able to lower Gemini serving unit cost by 78% over 2025 through model optimizations, efficiency and utilization improvements.”

What this tells us is that as these hyperscalers get even larger, they can optimize and squeeze more out of existing infrastructure. While some of those cost optimizations will be passed on to the cloud client, it is very clear that the ones with the most scale will also be able to use them to further expand their margin profile. Scale, but also custom ASICs play a key role here.

Custom ASIC is the key

Another strong argument that I already laid out in many of my previous articles is the custom silicon that cloud providers are designing. I continue to believe that this will be a critical element for any cloud provider to maintain healthy margins in the long term and avoid becoming overly dependent on a provider like Nvidia, which now has gross margins of almost 75%. In terms of custom ASICs, Google is best positioned with its TPUs, as we already laid out in the TPU article, followed by Amazon with Tranium. While Microsoft’s efforts here lag those of the other two, it is important to note that Microsoft also owns full IP rights to the custom ASICs that OpenAI will develop.

No surprise that on the Amazon earnings call, Tranium was mentioned 27 times, while Nvidia was not mentioned at all. We got even so far that the CEO called out specifically Amazon’s chip business and segmented revenue for us as a separate category:

»I think people know about our chips capability and our chips business, but I’m not sure folks realize how strong a chips company we’ve become over the last 10 years.

If you look at what we’ve done with Trainium, if you look at what we’ve done with Graviton, which is our CPU chip, which is about 40% better price performance than comparable x86 processors, 90% of the top 1,000 AWS customers are using Graviton very expansively. If you combine Trainium and Graviton, it’s well over a $10 billion annualized run rate business, and it’s still very early there.«

Even though they lag from a product perspective, Microsoft also talked about its custom ASIC business very early in the call:

»Earlier this week, we brought online our Maia 200 accelerator. Maia 200 delivers 10-plus petaFLOPS at FP4 precision with over 30% improved TCO compared to the latest generation hardware in our fleet. We will be scaling this starting with inferencing and synthetic data gen for our Superintelligence Team as well as doing inferencing for Copilot and Foundry.«

Custom silicon is what ensures hyperscalers can control their margin profile and market share, even in a more heated market where neoclouds and companies like Oracle have entered.

Investors are questioning the AI compute demand, but in reality, we are just getting started

A lot of investors are looking at the +$600 billion in combined hyperscaler CapEx projected for 2026 and questioning whether this is too much. What most people are missing is that we are still in the very early innings of AI compute demand, and the data backs this up. Right now, coding and developer tools have emerged as the single breakout vertical for AI. For those who don’t follow the industry closely or took a break in January, the difference in usage in 1 month is staggering. Daily install counts on VS Code basically more than doubled in just one month, whereas usage is growing even faster. Here is data from the usage of VS Code for Anthropic’s Claude Code and OpenAI Codex. The demand is going off the charts as developers are now not using these LLMs as tools anymore, but as junior to mid programers, where they now only review the code after the AI:

Here’s the thing, though: coding is essentially one vertical. And it’s already consuming an enormous share of the available inference compute. Now think about what happens when finance, legal, healthcare, customer operations, and other enterprise verticals start scaling their AI workloads to the same degree. According to Menlo Ventures, enterprise AI investment tripled from $11.5 billion to $37 billion in just one year, yet only 16% of enterprise deployments today qualify as true AI agents—most are still fixed-sequence workflows. We are nowhere close to saturation. McKinsey’s data shows 78% of organizations are now using AI in at least one business function, but the actual conversion to heavy inference workloads across non-coding departments is still nascent. These numbers are tiny compared to where coding already is.

The market is pricing in CapEx as if coding-level adoption is the ceiling, when in reality, it is the floor.

These aren’t businesses lighting money on fire. These are businesses generating 30-35% operating margins on the largest infrastructure buildout in corporate history.

The custom chip businesses (Trainium, Graviton, TPUs) are growing triple-digits and creating structural moats that compound over time.

The market is treating this like the 2000 fiber glut. That was infrastructure built for demand that didn’t exist.

This is infrastructure being absorbed as fast as it’s deployed. Hyperscaler CapEx isn’t irrational exuberance. It’s the most rational investment decision these companies can make. Amazon, Microsoft, and Google aren’t hoping for AI to work out. They’re reporting the P&L that shows it already has.

Not all hyperscalers will be able to capture market share this year, though. The limiting factor is availability.

Based on the past capacity commitements I calculated which cloud provider should grow the fastest in 2026 and beyond, and here are the numbers:

The Great SaaS Unbundling: Why AI Will Destroy Half the Industry and Supercharge the Other Half

UncoverAlpha — Mon, 02 Feb 2026 14:46:20 GMT

Hey everyone,

I’ve been thinking a lot about the AI disruption narrative in SaaS. Everyone’s talking about how AI will “transform” software, but I think most people are getting it wrong. The real story isn’t about transformation—it’s about bifurcation. Some SaaS companies are about to get absolutely demolished, while others will emerge stronger than ever. It’s not about looking at valuation levels for some of these SaaS companies and buying what is cheap on a valuation metric.

The determining factor for survival isn’t the brand, or even the data that the SaaS companies have—it’s whether their core system is deterministic or probabilistic.

Let me explain what I mean and why this matters for investors.

The Core Thesis: Deterministic vs. Probabilistic Systems

Deterministic systems are those where precision is critical, state management is complex, and errors cascade into serious consequences. Think accounting software, ERP systems, compliance platforms, healthcare system, payment processors, and sophisticated workflow engines. These systems need to be right 100% of the time—not 95%, not 99%, but 100%. When you’re reconciling a billion-dollar balance sheet or processing payroll for 50,000 employees, “close enough” isn’t acceptable.

“Traditional enterprise functions, such as HR, are inherently deterministic; decisions like employee termination are binary and require rigid logic where specific inputs trigger precise, unvarying sequences. In contrast, LLMs are inherently probabilistic, determining the confidence level of the next token rather than following a hard-coded decision tree.«

Source: Employee at Rippling (AlphaSense)

Probabilistic systems are those where the core value proposition is pattern recognition, content generation, basic automation, or simple decision-making. Think chatbots, content recommendation engines, basic customer support automation, simple workflow tools, and generic productivity software. These systems can tolerate errors and are often based on “good enough” outputs.

More likely, AI is going to eat the probabilistic category, while some deterministic systems will become more valuable by integrating AI as a complementary layer and start expanding into other layers.

Why Deterministic Systems Are Actually Strengthened by AI

This might seem counterintuitive. If AI is so powerful, why wouldn’t it disrupt the complex systems?

“AI succeeds when autonomy is constrained, execution is owned, and determinism is treated as an asset rather than a limitation.”

Jens Eriksvik

When you look at how enterprises are actually deploying AI agents in 2025/2026, they’re not replacing their systems of record—they’re building orchestration layers on top of them. As a Former Microsoft Manager put it:

»A ‘reality check’ is occurring among CIOs as they realize LLMs lack the deterministic consistency required for critical industries like financial services. For use cases such as underwriting, a system that provides a correct answer ‘six out of ten times’ is insufficient; these processes demand 100% consistency, which current probabilistic models struggle to guarantee without extensive re-engineering.«

LLMs interpret human intent, deterministic systems execute the actual work. The deterministic systems are not being disrupted; the operator is (you). This is the architecture that’s winning in production environments.

Why does this matter? Because the companies that own these deterministic platforms become more valuable in an AI world, not less. They become the essential execution layer that AI needs to actually accomplish tasks.

The use of these deterministic platforms should rise substantially as more people gain valuable information from them with the help of AI. Usage goes up, but only with platforms that integrate these AI tools well into their deterministic platform cores.

But even with deterministic systems, there are challenges. The seat-based pricing must be converted to usage pricing. SaaS companies right now have to aggressively cut costs, specifically labour costs, SBC, etc., and get in front of the curve. As a deterministic platform, you can charge a premium for your deterministic core offering. On top of that, you will be able to offer probabilistic tools that complement the deterministic core. Here, the pricing logic is simple: you price it at inference cost + 30% margin. Over time, as you build out your sticky offering as a platform, you can gradually try to expand that margin once again, but right now, that time is not there yet.

As a deterministic platform provider, the goals are clear: provide a clear deterministic core, execute great probabilistic offerings that enhance the core, and cut cost AGGRESSIVELY in terms of labour as you increase OpEx spend on cloud infrastructure to reach mass scale and then negotiate better inference costs because of that scale.

The companies that do this will come out as big winners, as they will be able to consolidate and offer probabilistic features on top of their offering, at inference +30% margins, and, with it, expand their TAM.

The Probabilistic SaaS Bloodbath

Now let’s talk about the other side of this equation—the SaaS companies that are in trouble.

If your core value proposition can be replicated by an LLM with 90% of the quality at 1% of the cost, and you provide a probabilistic product, you don’t have a sound business model anymore. The problem becomes if your core value proposition is pattern matching, content generation, recommendations, or simple automation. Foundation models have gotten so good at these exact tasks that they can replicate your entire product in a few lines of code. The problem is not only the costs (which is a big one), but the problem also extends to the user interface, data, integration, and brand moats.

Having a »great UX” as a SaaS provider is irrelevant when natural language becomes the interface. Users would rather type “generate 10 marketing emails for our Q1 launch” into ChatGPT than navigate through HubSpot’s 47-screen workflow builder. While some call out proprietary data as the strong moat for these kinds of businesses, I would argue otherwise. Modern LLMs can learn from a small example set and perform as well as a model that has thousands of examples. The accelerating nature of LLMs and the emergence of synthetic data also hurt the incumbent data holders. Research from Meta in 2024 showed that models trained on synthetic data generated by GPT-4 perform within 2% of models trained on real data for most classification tasks. And this was in 2024, till today, this only gotten better. Even if the proprietary data gives you your own model with 2-3% better accuracy, because of the probabilistic nature of your business, customers are not willing to pay 100x premiums for 2-3% better outcomes. They might do that if you had a deterministic system, however.

Moving to the »integration moat«. The key emphasis here is that these SaaS solutions are integrated with thousands of other apps and that this ecosystem is hard to replicate. Most SaaS products have well-documented APIs. AI excels as an integration layer without the need for pre-built connectors. With AI agents, these integrations and connections will become even more seamless as adoption accelerates and the SaaS companies want to stay “useful” in the age of agentic AI, making their APIs even more open and clear.

Now to the moat called the brand. There is some merit that enterprises, to some extent, are loyal to brands as they build trust in those brands. But with probabilistic systems, that trust is less strong and loyal than it is with deterministic systems, where you know you get those results 100% accurate. Enterprises are loyal to a degree until the cost gap becomes too big. If the discount is 20-30%, most won’t switch, but if that discount grows to 50-+70% switching starts. The trust factor is also something very fluid. AI startups with low-cost probabilistic system solutions gain trust via media coverage, raising billions in new VC funds, and hiring high-profile people from the incumbents.

The cutting of probabilistic SaaS is already underway, and this is more than just cutting seats.

Publicis Sapient reports actively reducing traditional SaaS licenses by approximately 50%—including major platforms like Adobe—by substituting them with generative AI tools and chatbots. An executive at the firm in an expert interview explains that AI agents are “10x faster, 100x smarter” than junior staff, creating a redundancy that directly cannibalizes the seat-based revenue underpinning commercial SaaS models.

For probabilistic SaaS, the only viable model is to cut costs to a minimum and price your product with a 30%+ margin on inference, but even that might not be sticky enough, especially if you don’t have any deterministic offering and if your clients are primarily SMBs. These companies will not be disrupted directly by AI, but by deterministic systems competing with them, offering AI-generated probabilistic offerings and bundling them into a single offering with a core deterministic system holding it together. If your ERP provider starts offering you a customer service system that works flawlessly with your ERP and uses your inference credits for both use-cases, you will likely switch over rather than have a separate customer service offering even if it is at the same cost.

Valuation Compression is Already Here but it’s across the board

Right now, the market is hitting SaaS across the board as it sees the risk of AI disruption. As of December 2025, the median EV/Revenue multiple for public SaaS companies stands at 5.1x, down from the pandemic peak of 18-19x and much lower than the historic average.

The thing the market hasn’t fully priced in yet is the deterministic and probabilistic platform differences that I laid out here, so the opportunity to own a deterministic SaaS platform at reasonable prices is definitely here.

Based on the criteria laid out in this article, I made a list of public SaaS companies and ranked them in deterministic/probabilistic order, and some of the ones I would highlight as being the least at risk of AI disruption:

Anthropic's Claude Code is having its "ChatGPT" moment

UncoverAlpha — Mon, 26 Jan 2026 16:34:14 GMT

Hey everyone,

I am posting an article on Anthropic Claude Code, which has been growing very significantly lately and, I believe, has developed an important product fit in its category.

Claude Code is going from just another AI coding assistant to a fundamental new architecture that developers need to stay competitive.

In the final months of 2025 and opening weeks of 2026, Claude Code reached a $1 billion annualized run rate just six months after launch—a velocity that even ChatGPT didn’t match. Based on my analysis and data, which I will share in this article, I believe that Claude Code is today closer to $2B ARR than $1B, as it has accelerated significantly in January.

At the same time, Anthropic’s overall annualized revenue jumped from approximately $1 billion at the start of 2025 to $5 billion by August—a 5x increase in eight months—with projections reaching $9 billion by year-end 2025.

But raw revenue growth, while impressive, misses the deeper structural shift. Claude Code has achieved what competitors couldn’t: it’s become the tool developers reach for when facing their hardest problems. At a Seattle meetup in mid-January 2026, over 150 engineers packed the house to trade use cases. One Google principal engineer publicly acknowledged that Claude reproduced a year of architectural work in one hour. Microsoft—which sells GitHub Copilot—has widely adopted Claude Code internally across major engineering teams, with even non-developers reportedly encouraged to use it.

Let’s dive in.

Anthropic is building a defensible moat in enterprise AI.

Anthropic reached 300,000+ business customers by August 2025, up from fewer than 1,000 businesses two years prior. According to Thunderbit, Claude’s enterprise AI assistant market share rose from 18% in 2024 to 29% in 2025—a 61% year-over-year increase—closing the gap with ChatGPT.

Anthropic just recently signed a term sheet for a $10 billion funding round at a $350 billion valuation—nearly double the $183 billion valuation from September 2025. That September round itself represented a massive step up from the $61.5 billion valuation in March 2025. The valuation has grown nearly six-fold in ten months—a trajectory that few technology companies have ever achieved, and the success is mostly tied to their developer clients.

Why are developers choosing Claude?

The market is littered with AI coding tools—GitHub Copilot, Cursor, Amazon CodeWhisperer, Tabnine, Codex, and dozens more. Yet Claude Code captured the developer community in ways its competitors haven’t.

The Architecture!

Claude Code’s distinguishing characteristic isn’t its AI model—though Claude 4’s coding capabilities are state-of-the-art. It’s the architectural decision to operate directly in the terminal with full file system and command-line access. This matters because it changes the fundamental relationship between developer and AI.

Traditional coding assistants like GitHub Copilot work as IDE extensions, offering autocomplete suggestions and chat interfaces. They’re stateless—every interaction starts fresh, with limited context beyond the current file. Claude Code operates differently. It reads and writes files directly, executes bash commands, maintains state across sessions, and coordinates multi-step processes spanning days.

As Noah Brier, an early LLM adopter who discussed the tool on Bloomberg’s Odd Lots podcast explained:

“ it’s more like hiring a junior developer than using autocomplete.”

The terminal-native design solves two problems that plague competing tools. First, it enables persistent state management. Claude Code stores information in files, building up context and knowledge over time. When working on a multi-day refactor, it remembers architectural decisions, maintains to-do lists, and tracks completed work—capabilities that chat-based assistants simply can’t match. Second, it leverages composable Unix commands. Instead of reinventing wheels, Claude Code chains together grep, sed, git, and other standard tools that developers already trust.

This architectural choice has profound implications for adoption. Developers don’t need to learn new interfaces or workflows. They work in the environment they already use—the terminal—with a tool that speaks their language. And because Claude Code operates as a true agent rather than an assistant, it can handle entire projects autonomously while developers focus on architecture and business logic.

The model advantage: Claude 4 and Sonnet 4.5

Ofcourse the underlying AI models matter enormously as well. Anthropic released Claude 4 (Opus and Sonnet) in May 2025, introducing what the company called “the world’s best coding model.” The benchmarks backed up the claim:

Claude Opus 4: 72.5% on SWE-bench (measuring ability to solve real GitHub issues), 43.2% on Terminal-bench (command-line tasks). Claude Sonnet 4: 72.7% on SWE-bench, balancing performance with cost-efficiency. Extended thinking with tool use: Models can now alternate between reasoning and tool use (like web search) during extended thinking sessions. Memory capabilities: When given file access, Claude 4 creates and maintains ‘memory files’ to store key information, dramatically improving performance on long-running agent tasks

Then in September 2025, Anthropic released Claude Sonnet 4.5, which became their most powerful model to date. The improvements were dramatic:

• 77.2% on SWE-bench Verified (82.0% with parallel compute)

• Code editing error rate: Dropped from 9% to 0% on Anthropic’s internal benchmarks

• Long-horizon task performance: Maintains focus for more than 30 hours on complex, multi-step tasks (vs. ~7 hours for Opus 4)

• 61.4% on OSWorld (desktop/browser interaction), up from 42.2% just four months prior

In November 2025, Anthropic released Claude Opus 4.5, which achieved 80.9% on SWE-bench Verified while using up to 65% fewer tokens than previous models. This efficiency translates directly to cost savings for developers running complex workflows.

Critically, these weren’t just benchmark improvements—they showed up in production. GitHub integrated Claude Sonnet 4 to power GitHub Copilot’s new coding agent. Cursor called Opus 4 “state-of-the-art for coding and a leap forward in complex codebase understanding.” Replit reported “dramatic advancements for complex changes across multiple files.” Block noted it was “the first model to boost code quality during editing and debugging.”

Bloomberry conducted research on over 45k companies, and the results are very insightful into which industries Anthropic dominates vs OpenAI.

source: Bloomberry

The software development vertical is especially interesting as companies are 2.3 times more likely to be Claude only than OpenAI only.

On the other hand, the industries where OpenAI dominates Anthropic are Marketing services, real estate, advertising, and business consulting.

In another developer survey conducted by UC San Diego and Cornell University in January, from 99 professional developers, Claude Code (58 respondents) appeared alongside GitHub Copilot (53) and Cursor (51) as one of the three most widely adopted platforms, with 29 respondents using multiple agents simultaneously.

In 2026, Claude is accelerating even faster with the launch of Cowork. The Cowork launch proved particularly significant. Users had been using Claude Code for non-coding tasks (vacation research, spreadsheet work via Slack, oven control). By launching Cowork, Anthropic showed that Claude Code’s total addressable market extends far beyond the 28 million professional developers globally.

Now, in addition to Cowork, we have a new trend of a personal assistant called Clawd bot. While Clawd bot is not owned by Anthropic but rather an open-source project, it has become the »ChatGPT« moment for personal intelligence, and for most users, Clawd works best when used with Claude, causing a surge in usage of Claude Code.

This is the most eye-opening chart from this article. This shows the daily install counts of AI Coding Assistants in Visual Studio Core. For those non-technical, VS Code is the industry standard for code editors and the primary host of AI coding agents:

Since the start of 2026, Claude Code has been surging! It went from 17.7M of daily installs (30-day moving average), similar to where OpenAI’s Codex was, to 29M and continues to rise exponentially. This really shows that Claude Code is having its own »ChatGPT« moment TODAY.

Subscribe now

Why does coding matter so much as an AI vertical?

In short, because the results are measurable, and companies can put serious investment behind these productivity gains. The academic research and enterprise case studies paint a consistent picture: AI coding tools deliver 26-55% productivity improvements, with experienced developers seeing the largest gains.

GitHub Copilot baseline: A 2022 controlled experiment found that developers using GitHub Copilot completed tasks 55.8% faster (95% confidence interval: 21-89%) than control groups. Subsequent enterprise deployments confirmed these gains:

• GitHub’s own research: Developers code up to 51% faster for certain tasks

• Accenture randomized trial: 8.69% increase in pull requests per developer, 11% increase in merge rates, 84% increase in successful builds

• Developer satisfaction: Up to 75% higher job satisfaction, 88% code retention rate (developers keep nearly all AI-generated suggestions)

• Success rates: 78% of developers complete tasks using Copilot vs. 70% without it, with 53.2% more likely to pass all unit tests

Claude Code’s reported gains exceed Copilot’s: Internal data from Anthropic and partner companies suggests even stronger performance for complex, long-horizon tasks:

• Developers report running 5-15 Claude Code instances concurrently—multiple in terminals, plus additional browser sessions

• Rakuten validated capabilities with a demanding open-source refactor running independently for 7 hours with sustained performance

• Boris Cherny (Head of Claude Code at Anthropic): “Claude Code generated roughly 80% of its own code” (with human direction, review, and architectural decisions)

A software engineer in the US costs $200,000-+$400,000 annually. If AI coding tools deliver even conservative 20-30% productivity gains, that translates to $40,000-$90,000 in annual value per developer. For a company with 1,000 engineers, we’re talking $40-90 million in annual productivity gains, justifying substantial spending on AI coding infrastructure.

Anthropic’s Business Momentum

The AI industry’s narrative has fixated on OpenAI’s consumer dominance—ChatGPT’s 800 million weekly active users, 2.5-3 billion daily prompts, and $500 billion valuation. But an important story for investors is playing out in enterprise adoption, where Anthropic is systematically outmaneuvering its larger rival.

This growth trajectory is unprecedented. For context, OpenAI’s 2025 revenue is estimated at $10-12 billion—larger in absolute terms but growing more slowly from a higher base. More critically, Anthropic is projected to break even by 2028, while OpenAI isn’t expected to turn a profit until 2030, according to November 2025 WSJ reporting. OpenAI faces approximately $74 billion in projected losses in 2028 due to massive compute costs, while Anthropic’s enterprise focus and efficiency gains position it for profitability much sooner.

While ChatGPT dominates consumer attention, Anthropic systematically captured the enterprise market where switching costs are high, and revenue is sticky.

According to Views4You, Claude has high penetration rates in different industries:

• Healthcare: 61% usage growth in early 2025, with Claude assisting in medical documentation and patient communication

• Legal: 18% of AI-enhanced litigation tools rely on Claude

• Finance: 24% of major banks use Claude, with 34% of enterprise AI research teams integrating it

• Retail/E-commerce: 38% of chatbots employ Claude

• Real Estate: 25% of listing analysis tools powered by Claude

This enterprise penetration is what separates Anthropic from consumer-focused competitors. Enterprise customers sign multi-year contracts, integrate deeply into workflows, and face high switching costs. Revenue from these customers is predictable, recurring, and premium-priced.

Anthropic with Claude Code is having its own ChatGPT moment, and it’s important, as coding is a big part of the economy and the job market, especially given the salaries. If there are 36M developers worldwide and their average salary is $48k per year, that would translate to $1.75T in developer salaries each year. If we only take the 20-30% production gains, we are talking about $350B to $525B of value created each year from these tools, and I would argue that the productivity gains are much higher than the 20-30%.

Anthropic’s TAM is bigger than many imagine, and its narrow focus on the enterprise and coding markets could prove to be a great strategy as things become more specialized, and it has built a strong head start and developer brand.

If you enjoyed this article, please consider subscribing to the paid subscription, where I share more in-depth analysis of AI companies and industry trends that I am seeing:

Subscribe to Paid

Until next time,

I hope you found this article valuable. I would appreciate it if you could share it with people you know who might find it interesting.

Thank you!

Disclaimer:

I own Google (GOOGL) & Amazon (AMZN), and Microsoft (MSFT) stock, which all have stakes in Anthropic.

Q4 2025 Channel checks & alternative data: The memory crunch is getting worse & one hyperscaler stands out

UncoverAlpha — Wed, 21 Jan 2026 14:49:00 GMT

Hey everyone,

I am posting my regular channel check & other alternative data report before we start earnings of big tech and semiconductor names.

For this report, I got the most interesting data on cloud providers Google, Microsoft, and Amazon, as well as semiconductor memory providers SK Hynix, Samsung, and Micron.

Let’s dive in.

Memory is in a historic crunch

The memory market crunch has spread not only to HBM but also to DRAM and NAND. We can see from this Ornn chart that spot DRAM prices for DDR5 16GB have risen by 366% from the start of Q4 to today. Even since the start of this year, they are up 20.5%.

source: Ornn

Flash MLC 64GB is also up more than 15% since the start of Q4, and SLC2G is up 59% in the same period.

When analyzing relevant expert interviews on AlphaSense in bulk for Q4, the findings confirm that industry demand is skyrocketing.

Memory demand growth accelerated substantially in Q4 2025, with consensus growth ranges expanding from 5%-10% in Q3 to 12%-17% in Q4.

High-bandwidth memory demand maintained strong momentum with 65%-80% year-over-year growth and extended lead times of 20-30 weeks, with zero inventory availability, compared to a 14-35% growth range and 12-24 week lead times in the prior quarter.

Lead times for DRAM, according to the analysis of expert interview expanded from 16 to 40 weeks quarter-over-quarter! Many experts also noted that customers are now placing long-term orders, compared with the more short-term demand orders seen just the quarter before.

HBM pricing premiums are reaching 5x over DDR5, even accelerating from the prior quarter.

When looking forward to the guidance comments from these experts, things look even tighter:

Many mention customers panic buying and inventory hoarding due to anticipated supply shortages. Customer orders are shifting to 9-12 month advance commitments with redundant orders across suppliers, compared to standard quarterly planning cycles in the previous quarter.

Lead times deteriorated substantially across memory categories, with DRAM now extending to 52-56 weeks, driven by overwhelming AI data center infrastructure demand and capacity shifts toward advanced memory products.

In terms of pricing, rebate programs largely disappeared in Q4 compared to the prior quarter, when suppliers like SK Hynix and Micron were most aggressive with incentives and rebates, offering 20-25% discounts. Price increases are +30-100% across product categories, accelerating from the prior quarter’s +5-10% general price increases.

All 2026 capacity is sold out to hyperscalers, with suppliers moving to long-term agreements only, compared to prior quarter mentions of full bookings through 2026, because customers are locking in supply early due to fear of shortages and each accelerator requiring multiple HBM stacks.

I am expecting skyrocketing earnings results from all three SK Hynix, Samsung, and Micron, both in terms of revenue but especially in terms of profitability, as customer negotiating power is essentially zero at this point.

Moving now to the cloud industry

Cloud is accelerating, but one cloud provider stands out in Q4…

2026 AI landscape who benefits the most?

UncoverAlpha — Thu, 08 Jan 2026 16:02:43 GMT

Hey everyone,

Here are UncoverAlpha’s 2026 top forecasts in the AI sector, including which companies stand to benefit most from these trends, and the biggest risk pressure points we are monitoring in the AI market for 2026.

The power in AI shifts from Nvidia to HBM suppliers and advanced packaging, as both bottlenecks will last longer than most people expect

While many recognize bottlenecks in both HBM and advanced packaging, we believe both will persist longer than expected. If we start with HBM.

Memory has historically always been an industry with big demand/supply cycles. Because of that, investors are very wary and are not fully »bought in« when a bottleneck in memory forms, as history shows that often times buying a memory company with a low P/E was a bad strategy (often times at the top of the cycle), and buying a memory company when the P/E was high was better. I am not saying this time is different, but I do believe we are still early in the HBM bottleneck cycle. As ASICs like Google TPUs and Amazon Tranium gain steam, their need for HBM is growing bigger and bigger, similar to Nvidia. 2026 is the year when we will also get a »full AI system« from AMD with their MI400 series. The success of TPUv7 in performance per cost, along with its delivery of a frontier model (Gemini), is driving many other companies to continue investing heavily in this space. HBM providers Micron, Samsung, and SK Hynix are receiving calls from the big tech companies seeking to secure their HBM supply. As HBM production is not increasing substantially, the bottleneck is getting tighter and tighter, where now Micron, Samsung, and SK Hynix can get better prices out of everyone, including Nvidia, as they don’t have only one big buyer anymore (Nvidia). HBM is already sold out for 2026.

According to Korean media outlets, big tech companies like Microsoft, Google, and Meta are practically stationed in Korea in an effort to plead to get any additional capacity from SK Hynix or Samsung. The problem escalated to the point that Google’s management dismissed the procurement personnel responsible, holding them accountable for creating supply-chain risk by failing to sign long-term agreements in advance. However, the HBM problem will even worsen as we transition to HBM4.

Nvidia’s Vera Rubin utilizes an 8-stack HBM4 configuration with a memory bandwidth of 22TB/s and a per-pin Fmax of around 10.7Gbps. AMD’s MI455X opts for a 12-stack HBM4 configuration (so even more than Vera Rubin), but at a lower bandwidth of 19.6TB/s, with a per-pin Fmax of around 6.4Gbps. AMD is betting on using less performant HBM4 and stacking more of it together. Nvidia’s Vera Rubin NVL72 will have 1.5x the HBM capacity of Blackwell and 2.8x HBM4 Bandwidth. But the Vera Rubin is just the appetizer when it comes to HBM capacity. In 2027, Nvidia plans to launch the Rubin Ultra with an enhanced HBM4 version, HBM4e, which will enable 12- or 16-high stacks, potentially reaching up to 1TB of memory per GPU (with an NVL576 system).

Keep in mind that ASICs, to remain competitive, will need to follow similar HBM patterns, which will worsen the crunch.

To top it all, because Nvidia can now sell its H200 into China, the demand that is from China for Nvidia’s H200 is putting additional pressure on memory makers, as H200 uses HBM3e. All three memory providers are building some new fabs for HBM4, but also reorganizing some HBM3 or even DDR4 and DDR5 memory lines into HBM4.

The problem is that HBM4 requires about 3x more wafer space than standard DRAM for the same amount of memory. As these manufacturers are reorganizing these product lines towards HBM, the supply of traditional RAM decreases, and now, even here, we have a bottleneck. The AI industry is in its early stages, and adoption isn’t at a point where we have humanoids, edge AI, AR smart glasses, or AVs in mass use, all of which require massive memory.

Moving now to the second huge bottleneck, advanced packaging.

The Advanced Packaging Bottleneck will get worse

Similar to the HBM bottleneck, I expect conditions will only worsen here. Since we moved to chiplets rather than monolithic SoC, much more advanced packaging is required. You can think of advanced packaging as stitching together different components of an AI accelerator to make them work as one. The goal is also to »stitch « them together as densely as possible to remove latency and energy losses. Advanced packaging is required for Nvidia GPUs, AMD GPUs, Google TPUs, Amazon Traniums, etc.

So naturally, with Nvidia GPU demand and now on top of that, ASICs programs getting scale the bottleneck is severe. The biggest and most important advanced packaging program is TSMC’s CoWoS.

According to Samsung Securities, TSMC’s CoWoS production capacity (converted to wafers) increased from 35,000 sheets per month in 2024 to about 70,000 sheets last year, and is expected to rise to about 110,000 sheets this year. However, evaluations indicate this remains insufficient. Given that TSMC’s CoWoS allocation to NVIDIA is approximately 55%, the calculation indicates that only 8.91 million “Blackwell” AI accelerators can be produced this year. This volume can support data centers with a maximum capacity of 18 gigawatts (GW), representing only 50% of global data center investment capacity this year. Samsung Securities analyzed, “There is a possibility that TSMC will not be able to meet even NVIDIA’s demand this year.”

Here are Goldman’s estimates for annual CoWoS. Even with capacity doubling in 2026 relative to 2025, it is still not enough to meet demand, as 2026 TSMC CoWoS capacity is already essentially sold out.

The crunch went so badly that there are now rumours that Meta has allocated some of its CoWoS from its ASIC chip to Google for its TPUs, as it appears Meta is preparing to start using Google TPUs.

The problem for all ASIC programs is that Nvidia is the dominant client and has most of the capacity reserved. If you are Google, Amazon, or Meta and your ASIC program can’t secure sufficient CoWoS capacity because Nvidia and AMD control most of the capacity, you are considering advanced packaging alternatives to CoWoS, as you don’t have time to wait. I believe this year, other players with advanced packaging will benefit (I will explain more in the company section of this article, including company names).

The transition to Co-Packaged Optics (CPO) from Pluggable Modules

2026 will also mark an important year of transition, as the industry shifts in networking from the pluggable era to CPO. We are seeing this transition as AI models grow in size, resulting in AI clusters with over 100k GPUs connected. With clusters of that size connected, you have a lot of pluggable transceivers. This is a problem as that amount of transceivers can consume 15-20% of the total power in an AI data center. Switching to optical providers can significantly reduce energy consumption, and the signal is more reliable.

The problem with CPO is that it requires advanced packaging, which is already a bottleneck in chip manufacturing. While TSMC is establishing a dedicated zone for this type of packaging through its COUPE platform, the bottleneck remains the same, and both affect each other. More on this in the company section, where I explain which companies benefit from the surge in interest in CPO.

Nvidia’s acquisition of Groq opens up a new path for additional AI chips for specific AI use-cases that open up the SRAM supply chain or combinations of SRAM and other memories

Nvidia’s acquisition of Groq is a big signal to the market, as I wrote in this article. With the move, Nvidia confirms that HBM and advanced packaging bottlenecks will likely persist and seeks to secure growth beyond them. This doesn’t mean that people will stop using HBM. Nvidia’s move signals that they expect the HBM bottleneck to be prevalent and long-lasting, so the industry will sell all available HBM over the coming period. At the same time, they need to consider other memory options to continue growing and address the compute gap.

Groq asset acquisition won’t impact core business, could spark something new

Jensen Huang

The main shift here is SRAM. You can fit an AI model on SRAM without HBM, but the model is then 100x smaller. This means SRAM use cases will be limited, but they do exist.

There are many workloads where latency matters a lot, but the model doesn’t need to be »god like« ( like serving an ad copy ). With agenic work, an agent could, based on the task, decide which high-quality model they need to answer, and if it’s an answer that can be answered by a fast, small model, they can use SRAM, and only if it needs more, they go to HBM.

Robotics also needs more SRAM, as it is low-latency, so you don’t need the big model here. And if you ask the humanoid a complex task, it can go to the cloud and use computing with HBM to get a more complex answer.

The point I am making is that we will see a mix of new memory variants emerge as everyone, including Nvidia, seeks ways to move beyond HBM. With this new trend, there is a new supply chain of companies that will benefit and have caught my attention; they will be shared in the company section of this report.

Subscribe now

Google Gemini will continue to take market share from OpenAI

Since Gemini 3, Google has gained significant momentum in Gemini adoption. That momentum only accelerated after December 17 as Google launched Gemini 3 Flash. Gemini 3 Flash is very affordable and arguably the best current intelligence-per-cost model for many use cases. Not surprisingly, we received data from Similarweb showing that Gemini’s web market share increased to 21.5% from 13.7% three months ago and 5.7% 12 months ago. In the same period, OpenAI’s market share went from 86.7% 12 months ago to 64.5% today.

I expect Gemini to continue gaining market share, as the performance gap with ChatGPT over the past 2 years has now closed, and Gemini has taken the lead. Google’s product execution also took a significant upward shift in the fall of last year, as it appears management has learned that product shipment matters more than benchmark evaluations. On top of that, Google’s unique infrastructure advantage, derived from its TPU ASICs unit, gives it cost and scale advantages that it can leverage to put pricing pressure on the whole market. OpenAI acknowledged that they needed to allocate more compute for inference as the user base grew, and that they had to reallocate some of that capacity from their research (training) operations. Google’s DeepMind AI research unit also has an advantage over OpenAI: it is backed by a strong, Free-Cash-Flow-Generating business, so it is not dependent and doesn’t need to raise external capital.

The application layer of AI is becoming »investible«.

I expect in 2026 we will see renewed interest in companies that are characterized as the application layer of AI. Meta’s recent acquisition of Manus, a fast-growing agent AI company, is a strong signal to the market. While so far it was hard to invest in this layer, as the perception was always that »you are one core model upgrade away from being irrelevant, « things are changing. Manus and many other »AI wrapper« companies are showing that there is value in acquiring users and the user behavior patterns and data from their usage. The rise of methods such as fine-tuning, RAG, and RLHF can strengthen your moat on top of a foundation model, especially as we see frequent improvements in the post-training phase. I think this trend will accelerate in 2026, and opportunities in the application layer will finally emerge.

On top of the application layer, I also believe there will be more distribution deals, partnerships, or revenue M&A done. In distribution, the prime example is Snap partnering with Perplexity to offer Perplexity within Snapchat and, in return, receiving payment from Perplexity. I think the market will begin to view distribution companies more favorably. The most significant pressure on distribution deals will come from the broader ecosystem outside Google, as Google has the most distribution points available with Chrome, Gmail, Workspace, Maps, YouTube, and many others. I expect them to rely more heavily on those distribution points to support Gemini, which will put additional pressure on others.

Key market pressure points for me for both the AI market and macro

There are a few things for me that are important for this AI trend to sustain and continue, which I will be monitoring very closely in 2026. In terms of industry-specific, the number one is funding rounds for AI labs and startups, especially OpenAI and Anthropic. If any of those companies do not raise the amount of funds or at the valuation levels they set, I will view that as a very dangerous signal and may reduce significant exposure to the market. Currently, a lot of the ecosystem still hinges on those two companies to continue with their usage and spending.

The second thing is architectural modifications to how models are trained and served (inference). This ranges from model sizes to pre- and post-training methods, and includes memory requirements. If there is any significant change here, one needs to be very careful and reassess the factors, as it might shift needs and supply chains to others.

There is another pressure point at the macro level: the latest developments in Venezuela and their impact on the US-China relationship, especially regarding Taiwan. If anything happens there, even like a blockade or anything, everything changes.

Companies for 2026

Here are the companies I am invested in or on my watchlist that are aligned with these 2026 trends:

The $20 Billion Admission: Why NVIDIA Just Bought Into the ASIC Revolution with Groq

UncoverAlpha — Fri, 26 Dec 2025 12:33:14 GMT

Hey everyone,

As the AI industry is never sleeping, yesterday we got news that Nvidia was »acquiring« (more acquihire) the chip ASIC startup Groq for around $20B. If you have been a reader of our publication for some time, you know we have mentioned Groq multiple times. A little more than a year ago, I also did an exclusive interview with my friend Groq’s General Manager, Sunny Madra, which you can go back and check out.

While many people are speculating on why Nvidia would essentially buy (license: the formal term used) a $20B ASIC startup, I wanted to add my thoughts to the mix, as I believe the Groq acquisition is highly strategic for Nvidia and sends an important signal to the market.

How is the Groq chip different than a GPU/TPU?

First, let’s dismiss the argument that Nvidia bought Groq because its CEO, Jonathan Ross, is one of Google’s TPU founders. Groq’s chip, also called the Language Processing Unit (LPU), is very different from a TPU or a GPU.

Let me quickly explain the GPU, the TPU, and the LPU in terms of how they differ:

The GPU

The GPU architecture was originally designed for graphics—calculating thousands of pixels at once. For AI, it treats a Large Language Model (LLM) as a massive parallel processing job.

The Bottleneck: GPUs rely on HBM (High Bandwidth Memory), which sits outside the processing core. Every time the GPU needs to calculate a word (token), it has to “fetch” the model weights from that external memory. This creates a “memory wall” where the processor is often sitting idle, waiting for data to arrive.

The Logic: It uses a “hub and spoke” model. It is incredibly versatile and can do everything from training to gaming, but it isn’t “perfectly” efficient for the specific sequential nature of generating text.

The TPU

You can read in detail my piece on Google TPU to get a detailed understanding, but to summarize the key points from this article, the TPU is an ASIC (Application-Specific Integrated Circuit) designed specifically for Tensor math (linear algebra). It uses a Systolic Array. Imagine a “heart” that pumps data through a grid of processors. Once a piece of data enters the grid, it is passed from one processor to the next without needing to go back to main memory.

The Logic: TPUs are much more efficient than GPUs for massive batches of data. This makes them very effective in Training and complex inference (similar to the GPU)—where you are feeding the machine billions of data points at once. However, for a single user asking a question (Inference), they often still face latency issues.

The Groq LPU

Groq’s LPU is a complete departure from the other two. It doesn’t use HBM (External Memory) at all. Instead, it uses SRAM (Static Random Access Memory), which is built directly into the silicon of the chip.

The biggest differentiation from that is the Speed. SRAM is up to 100x faster than the HBM found in GPUs. Because the data is right there on the chip, there is zero “fetch time.”

In a GPU, the hardware decides when to process data (probabilistic). In an LPU, the software/compiler decides exactly where every piece of data will be at every billionth of a second (deterministic). It’s like a perfectly timed assembly line where no one ever has to wait for a part. The unique part of the LPU is that Groq first designed an automated compiler, and only then designed the chip. The reason is that Jonathan, who worked at Google on the TPU, knew the software was the biggest pain and that the Groq startup couldn’t compete with 10k Nvidia software engineers who write low-level assembly routines (kernels) all day. Because of that automated compiler, you don’t write any manual kernel optimizations for LPUs, as every token’s path is predetermined.

So where does the LPU excel? LLMs generate text one word at a time. The LPU is designed to stream these words through its “conveyor belt” architecture, which is why you see Groq generating hundreds of tokens per second while GPUs struggle to hit 50.

But the LPU is not the »GPU killer« some might think.

The LPUs strength for some use-cases but a weakness for others is its tiny memory capacity. Even an Nvidia H200 GPU has 141GB of HBM3e memory. A single Groq LPU chip has only 230MB of SRAM. Because 230MB isn’t enough to hold even a small AI model, you have to link hundreds of LPU chips together just to run one model. For example, to run Llama-3 70B at full speed, you might need hundreds of LPUs (multiple server racks), whereas you can fit that same model onto just two or four Nvidia GPUs in a single small box. Because you need so many LPU chips to handle the memory requirements of modern models, the initial hardware investment can be big and the data center footprint much larger than the one with the GPU.

Because the LPU is also deterministic, as the software must map out every single calculation before it starts, it is more difficult to handle dynamic workloads or changing underlying architecture (from Transformer to something else).

But there is upside to the LPU. Even though a single Groq LPU system (a GroqRack) is more expensive to buy than a single Nvidia server, it can be significantly cheaper to run if you have high-volume traffic.

To get ultra-low latency on a GPU, you have to use a “Batch Size of 1” (meaning you process only one user’s request at a time). This makes the GPU incredibly expensive per token because most of its processing power is sitting idle while it waits for memory to move. But the LPU is designed for a Batch Size of 1. It achieves 300–500 tokens per second while keeping its internal “assembly line” nearly 100% full.

And then there is the very important energy aspect.

Because the LPU doesn’t have to power external HBM (High Bandwidth Memory), it is fundamentally more energy-efficient for the actual math it performs. Moving data from external HBM to a GPU core costs about 6 picojoules per bit. Retrieving it from Groq’s local SRAM costs only 0.3 picojoules per bit. On an architectural level, Groq is roughly 10x more energy-efficient per token than a GPU for inference.

But as we talked about before, the downside is that while LPUs are cheaper to run, you are paying more for floor space, networking cables, and physical maintenance. So why did Nvidia decide to buy Groq?

Subscribe now

The Groq strategic play from Nvidia

There are five main reasons Nvidia bought Groq: Energy Bottleneck, HBM Bottleneck, CoWoS Bottleneck, liquid-cooled Data Center Bottleneck, and the competition aspect.

While we already discussed the energy benefits of LPU vs. GPU in the previous section, we are now in an age where energy is the limiting factor for Nvidia’s growth. Having a second option that is more energy-efficient, especially for simpler inference workloads, is important. To add context, Groq’s LPUs don’t require liquid cooling, which is an important aspect of the whole deal. In the world, there are far more air-cooled data centers than liquid-cooled ones. Nvidia’s latest Blackwell, as well as other future products, will be mostly liquid-cooled as they are meant for maximum performance. In the cloud industry, many air-cooled data centers that can’t be repurposed for liquid cooling are being left. In fact, in a recent interview with Groq CEO Ross, he mentioned that Groq has just landed a big European data center project where LPUs will be hosted, the data center was actually left vacant by a hyperscalers who didn’t want to extend the lease as it didn’t have the options to be a liquid-cooled DC.

While in an Nvidia perfect world, Nvidia would surely prefer that all DCs be liquid-cooled, the reality is different, as securing a reliable water source is often a problem and will take time. Nvidia’s reliance on liquid DCs could also lead to growth problems, as liquid cooling adds complexity that many DC operators struggle with (the latest CoreWeave delay is just one example). So Groq adds an air-cooled option for Nvidia to sell in the future and capture more short-term revenue. So the fact that Groq LPUs take more data center footprint is not a problem, as they can be used in air-cooled DCs that are not being utilized that much. In my view, Nvidia’s air-cooled option is also important, as many competitors, such as AWS’s Trainium, which is a strong alternative as I discussed in the last article, are air-cooled chips.

Moving to another key aspect of this deal: the HBM bottleneck. While HBM has been a bottleneck for some time now with Google TPUs, AMD MI400s, and AWS Tranium 3 and 4 starting to become more competitive and »eating« more and more HBM, the availability of HBM has become worse and worse. HBM for 2026 is sold out, and a real question is how long it will take for 2027 to sell out, too. The three players, SK Hynix, Samsung, and Micron, are also not eager to expand capacity too much in the future, as they know their industry is cyclical and has recently seen major overbuilds. Now that more chip design companies are competing strongly for HBM capacity, the negotiating power of Micron, SK Hynix, and Samsung will only increase. For Nvidia to secure a viable option for non-complex inference workloads like LPUs is a big positive, as they don’t use any HBM. Again, the play for Nvidia here is to continue its revenue growth and sales of compute units, without being 100% constrained by available HBM.

Another strategic advantage is that Groq’s chips perform well even when fabbed on older nodes. The reason for this is SRAM: since they don’t have external memory, they don’t need the densest transistors to achieve high speed. Groq’s latest generation of LPUs is, in fact, fabbed at a 14nm node at GlobalFoundries. While they are transitioning to newer nodes at Samsung, the fact that you can produce capable chips on an older node, not at TSMC, is another big advantage for someone like Nvidia, as it bypasses another bottleneck: TSMC and CoWoS. The chances of a Groq state-of-the-art chip being fabbed outside of TSMC are much higher than a B300 or Vera Rubin. So, again, with this move, Nvidia is opening a new avenue for growth that doesn’t face the same bottlenecks as Blackwell or Vera Rubin.

Now, to the last point: competition. Nvidia knows that if the HBM-energy-liquid cooling-CoWoS bottlenecks squeeze the market and cause a significant shortage of compute, customers and competitors will start looking for alternatives to bypass those bottlenecks, and a Groq with a supply chain not bottlenecked by the same factors is a prime candidate for that. Groq, going into this »acquisition« was growing fast, and more importantly, its capacity was growing fast.

Groq CEO 4 months ago:

»18 months ago, we had 1/10000 of the token capacity. Today we have about 20M tokens per second capacity a month and a half ago we had 10M«

So, rather than Meta or Microsoft buying Groq and opening an alternative path beyond the limited GPU path, Nvidia decided to pull the trigger itself.

What does this mean for Nvidia?

Did Nvidia acknowledge that GPUs are not the best hardware for every AI workload? Yes. At the same time, Nvidia is signaling that they expect their GPUs to be completely sold out for years and that they want to grow outside of their bottlenecks.

More inference revenue will also mean a different margin. Inference margins for Nvidia will not be as high as even the Groq CEO acknowledges this recently:

»Inference is going to be a high-volume, low-margin market. Nvidia is going to build every single GPU that they can physically manufacture this year, AMD is going to do the same thing; they are limited by the HBM, and they are going to sell every single GPU that they build. The thing is that it is not enough. On top of that, every time they sell for inference, when you are paying the 70-80% margin on a GPU, you have to charge that to your end users. Inference is a high-volume, low-margin business. Now, when we start deploying a large number of inference chips, Nvidia, AMD, they can sell their chips for training, which they are really good at, and they can keep that margin high as you can amortize that over 10-20x more compute that you are going to need for inference.«

What does this mean for you as an investor? In the next few days, I will publish my 2026 outlook and the most interesting names I am investing in or watching. Nvidia's Groq move definitely added a new subsector to my list, as a new supply chain is opening up. If you have not yet consider becoming a paid subscriber, as most of that list of names will be for paid subs only.

Subscribe to Paid

Until next time, happy holidays!

As always, I hope you found this article valuable. I would appreciate it if you could share it with people you know who might find it interesting.

Thank you!

Disclaimer:

I own Meta (META), Google (GOOGL), Amazon (AMZN), Microsoft (MSFT), TSMC (TSM), Intel (INTC) stock.

Amazon Trainium: Scaling AI Without Breaking the Bank

UncoverAlpha — Thu, 18 Dec 2025 14:04:20 GMT

Hey everyone,

In this article, I am publishing a comprehensive deep dive into Amazon’s custom ASIC chip, Trainium. I will cover the technical details, as well as the performance, costs, and strategic factors of this unit, and what they mean for Amazon and the broader semiconductor ecosystem.

Topics covered:

How Amazon’s custom chip businesses started
How does Trainium work
The software optimization layer
Performance of Trainium
Why is Trainium this cheap?
The biggest thing holding Trainium back
AWS’s Trainium Business Strategy and Competitive Positioning

Let’s dive into it.

How Amazon’s custom chip businesses started

Amazon Web Services (AWS) entered the custom silicon development after acquiring the Israeli chip startup Annapurna Labs in 2015. This acquisition paved the way for AWS’s in-house chips, such as the Graviton CPU family and Nitro virtualization cards, and later for its machine learning accelerators. AWS’s machine learning chips comprise Inferentia (for ML inference) and Trainium (for large-scale model training), names which directly reflect their intended use cases. AWS’s recent Trainium generation, both 2 and 3, can handle high-end inference use cases in addition to training.

The first-generation AWS Trainium was unveiled at re:Invent 2020 as Amazon’s first in-house training accelerator. Built on a 7 nm process with roughly 55 billion transistors, Trainium1 began powering EC2 Trn1 instances by 2022. AWS then launched Trainium2 (second generation) in late 2023, fabricated at 5 nm and featuring a new NeuronCore-v3 architecture. Trainium2 dramatically scaled up the core count – quadrupling the number of compute cores per chip – and introduced support for structured sparsity, achieving about 3.5× higher throughput than Trainium1 despite slightly lower per-core frequencies.

By early 2024, Trainium2 was available via EC2 Trn2 instances and UltraServer systems, delivering 30–40% better price-performance than contemporary GPU-based instances (such as Nvidia A100/H100 instances) according to AWS.

Most recently, at AWS re:Invent 2025, Amazon announced Trainium3, its third-generation AI training chip built on an advanced 3 nm node. The new chip powers the EC2 Trn3 UltraServer – a 144-chip rack-scale system – and offers up to 4.4× more compute performance, ~4× higher energy efficiency, and nearly 4× more memory bandwidth compared to the prior Trainium2 generation.

How does Trainium work

Under the hood, Trainium chips are highly specialized ASICs focused on matrix math and parallelism. Each chip contains multiple NeuronCores, AWS’s term for its AI-optimized compute cores. Notably, starting with Trainium, Annapurna Labs added dedicated Collective Communication cores alongside the scalar, vector, and tensor engines in each NeuronCore (a good technical detail of this was written on The Next Platform). These communication engines accelerate distributed training operations (e.g., all-reduce for gradients), reflecting a “system-first” design that tightly couples compute with networking. As one AWS architect explained, they »first designed the full system and [worked] backwards... to specify the most optimal chip« rather than treating the chip in isolation. This co-design philosophy (developing silicon alongside software and systems) enables AWS to tailor Trainium’s architecture to improve large-scale training efficiency.

Data Types and Throughput:

Trainium supports a range of numeric formats commonly used in AI: FP32, BF16/FP16, and, notably, a configurable FP8 format designed to boost throughput.

It is essential to understand the term FP (Floating Points) because, later in the article, we will compare the performance of Nvidia’s Blackwell, Trainium3, and Google’s TPUv7 for specific FPs. For less technical readers, FPs are the »resolution« of AI math. Just as a 4K video requires more data and a faster internet connection than a 720p video, FP32 requires more power and time than FP4. By moving to lower FP formats, chip makers are effectively reducing the ‘data weight’ of AI, allowing it to run faster with less electricity. Still, the trade-off can be lower accuracy (increased risk of errors).

Trainium2 and later chips also implement 4:1 structured sparsity (i.e., the ability to skip 4 out of 16 or similar patterns of weights) to exploit model sparsity for additional speedups. According to analysis by The Register, Trainium3’s hardware can leverage 16:4 sparsity to quadruple effective throughput on supported workloads. This means a single Trainium3 chip, which delivers about 2.5 petaFLOPS of dense FP8 performance, can exceed 10 petaFLOPS effective throughput on sparse models. For higher-precision tasks (such as BF16 training), Trainium still offers competitive performance while focusing on FP8/FP16 for maximum speed where acceptable.

Memory and Interconnect:

Each Trainium generation has pushed memory limits to handle ever-larger models. Trainium2 packed 16 GB HBM stacks (HBM3) per chip (total ~96 GB/chip), whereas Trainium3 uses faster HBM3E with 12‐high stacks, giving 144 GB per chip at 4.9 TB/s bandwidth. This nearly 50% increase in memory capacity (and a ~70% increase in bandwidth) enables Trainium3 to feed its compute units efficiently for training massive models. AWS also engineered a proprietary high-speed interconnect called NeuronLink (chip-to-chip links) and a switching fabric (NeuronSwitch) for scaling out. For networking, they also opened the table for other options as they want to optimize for maximum efficiency and vendor flexibility, even on the networking layer.

Trainium2-based systems used a 3D torus topology. Trainium3 introduces an all-to-all switched fabric with NeuronSwitch-v1, which roughly doubles intra-node bandwidth and reduces latency between chips. Thanks to this fabric, a single Trn3 UltraServer can unite 144 Trainium3 chips into a single coherent system, and AWS’s UltraCluster 3.0 can further connect “up to 1 million Trainium chips” across multiple racks to scale the cluster. In testing, AWS reported that these improvements enable 4× faster model training and inference latency reduction when comparing Trainium3 UltraServers to the previous generation.

The software optimization layer

As most of you know by now, any software optimization layer not called CUDA has its hurdles. AWS has tried to mitigate this by integrating its Neuron SDK with popular ML frameworks (TensorFlow, PyTorch, JAX, Hugging Face libraries, etc.) to ease porting. Given recent moves, AWS is increasingly leaning into opening up the software ecosystem to the open-source community and accelerating adoption. Anthropic is key to maturing the Neuron software stack for broader external adoption. A high-ranking Amazon employee made an interesting comment regarding the strategy here:

»To answer your question, for us in five years, we hope on inference size we can at least address more than 50% of the pure play external customers. That’s the reason we are trying so hard to attract those leading companies or investing leading companies like Anthropic to work on accelerator because they are invest by Google and us. They are training their model on both TPUs and Trainium.

I think, basically, they are the trailblazer for all other external customers. Once they test out everything, they develop all the SDKs, those things then future other customer adoption will come in the next five years. This, I think, our conviction. I think this will be a great success going forward. This is what we think.«

source: AlphaSense

So Amazon is betting heavily on Anthropic and its engineers, who have become highly proficient in optimizing Trainium to help build the software library base for broader adoption of Trainium chips. Having Anthropic on top of embracing the open-source ecosystem seems like a clever approach.

Given that CUDA is entrenched in engineers’ mindshare, this strategy seems the only viable option within the software stack, given a full embrace of the open-source community.

Performance of Trainium

Trainium3 chip provides ~2.5 PFLOPS (10 PFLOPS sparse) and 144 GB memory. Hence, a fully populated UltraServer delivers on the order of 360 PFLOPS (dense FP8) or more than 1.4 exaFLOPS (with sparsity) of compute and over 700 TB/s of aggregate memory bandwidth. This puts Trainium3 UltraServer in the same class as the largest GPU-based systems.

According to AWS, early customers have reported substantial performance and cost benefits. For instance, Amazon’s Bedrock service (which offers foundation models) is already running production workloads on Trainium3, and others, such as Anthropic, have achieved 50% cost reductions and multi-fold throughput gains by switching from GPUs to Trainium hardware. This claim is from AWS, so take it with a grain of salt, as we don’t know on what specific workloads these numbers were tested on.

The real number we are looking for is the total cost of ownership (TCO) per performance.

Before we go to Trainium 3 (Trn3), I did find some interesting information on Trainium 2:

An Amazon employee in May mentioned the following:

» We offer as a price per FLOPS. It’s probably 30%-40% cheaper equivalent to their leading NVIDIA instance in our data center. The Trainium2 is about 30% cheaper than upper H200. We sell those. We incentivize customers to use it. Our cost perspective, because NVIDIA enjoys such a hefty margin, we all know it.«

source: AlphaSense

Similar takes are found from different customers.

A customer in February noted that cost-conscious startups using TPUs or Trainium can reduce costs to 1/5 of those of NVIDIA clusters if longer, less time-critical training runs are allowed.

In April, an executive provided granular hourly pricing data, reporting that while NVIDIA H100 chips cost approximately $3 per hour per chip (via providers like CoreWeave), Trainium chips were available for roughly $1 per hour. They further noted that AWS offered potential discounts for long-term contracts that could bring the effective price down to $0.50 per hour, representing roughly 1/6 to 1/7 of the cost of an H100.

Another customer mentioned in August that Amazon is offering »massive discounts« on Trainium processors even within their own cloud instances to undercut NVIDIA GPU spot pricing.

A director at Tenstorrent also noted the benefits of ASIC utilization, noting that GPU utilization for training often sits at only 30-40% due to data movement bottlenecks. In contrast, AI accelerators (ASICs) like Trainium can achieve near 100% utilization because they are explicitly architected for these workloads.

Most experts consistently cite a 30-50% cost advantage for Trainium over comparable NVIDIA instances, driven by lower unit costs and aggressive pricing strategies.

Now moving to the performance of Tranium 3. Looking at Trainium3 at FP8 precision, a Trn3 UltraServer is roughly on par with Nvidia’s latest 72-GPU “Blackwell Ultra” system in total throughput. However, at ultra-low precision FP4 for inference, Nvidia’s system still leads by ~3×.

SemiAnalysis also did their numbers on the TCO/performance of Trn3.

Similarly, they found that the TCO per marketed performance Trainium3 is 30% better than GB300 NVL72 on FP8, but on FP4 it is much worse.

What does this mean? To put that in perspective, currently, for training workloads, FP4 is harder to use because “low precision” can cause the model to “diverge” (basically, the AI becomes confused during learning).

However, NVIDIA has recently proven with NVFP4 that you can train with 4-bit precision by using clever scaling. This could potentially reduce training costs by an additional 30–40% over FP8. At least for the next few months, it isn’t expected that the big AI labs will switch to FP4 for training.

In Inference, the story is slightly different, as AI labs are aggressively adopting FP4. FP4 enables massive models (such as a 1-trillion-parameter MoE) to fit within the memory of fewer chips. If a model that used to require 16 chips now fits on 8 chips due to FP4, your cost per token drops by half. No surprise that Amazon has already announced that, for Trainium4, FP4 performance should be 6x that of the Tranium 3.

This data suggests that Tranium3 can be a very good alternative to Nvidia for training workloads if you know how to use the Trainium software stack.

Trainium 3’s operating cost also matters in this calculation, as TRn3 runs at ~1,000W per chip, while Nvidia’s GB300 runs at ~1,400W, so it’s not just about the upfront CapEx.

Subscribe now

Why is Trainium this cheap?

This is a calculation I derived from multiple sources, including BOM and industry expert interviews on the manufacturing costs of Trainium3, TPUv7, and Nvidia B200:

source: own estimates

A Trainium 3 chip is half the price of an Nvidia B200 when looking at the pure manufacturing cost, but looking at the price that Amazon would have to pay, which includes the Nvidia margin ($35k-$40k), the difference is staggering, and the reason why the Trainium has the TCO/performance advantage starts to make sense. While Amazon also applies a margin to those costs for external clients, it is nowhere close to Nvidia’s margin. These estimates for the costs are somewhat confirmed by an Amazon executive’s comment back in May:

»Typically, in our internal chip, without considering all these R&D investments, because that’s going to be spread over chip, just from a manufacturing cost perspective, our Trainium chips are typically 1/3 cheaper than NVIDIA. Of course, [half of the] NVIDIA’s price is margined. Just from an acquisition of price perspective, we are around 1/3 of the cost when we buy NVIDIA chip, similar generation.«

source: AlphaSense

Amazon’s significant cost advantage is also a result of how it optimizes chips and manages its supply chain. As an AIchip Technologies (Amazon supplier) employee explained it:

»It’s become so important that I think at this point, Annapurna has a lot of their own design team. They feel they can pretty much do not use a design partner for front-end design, and they can license a high-speed SerDes IP from a third party like Synopsys, Cadence and achieve lower cost. That’s why I think now in the third generation onward, they’re more focusing on the cost instead of the other aspect«

source: AlphaSense

The biggest thing holding Trainium back

The most significant factor holding back Amazon’s Tranium is…

The chip made for the AI inference era – the Google TPU

UncoverAlpha — Mon, 24 Nov 2025 13:54:19 GMT

Hey everyone,

As I find the topic of Google TPUs extremely important, I am publishing a comprehensive deep dive, not just a technical overview, but also strategic and financial coverage of the Google TPU.

Topics covered:

The history of the TPU and why it all even started?
The difference between a TPU and a GPU?
Performance numbers TPU vs GPU?
Where are the problems for the wider adoption of TPUs
Google’s TPU is the biggest competitive advantage of its cloud business for the next 10 years
How many TPUs does Google produce today, and how big can that get?
Gemini 3 and the aftermath of Gemini 3 on the whole chip industry

Let’s dive into it.

The history of the TPU and why it all even started?

The story of the Google Tensor Processing Unit (TPU) begins not with a breakthrough in chip manufacturing, but with a realization about math and logistics. Around 2013, Google’s leadership—specifically Jeff Dean, Jonathan Ross (the CEO of Groq), and the Google Brain team—ran a projection that alarmed them. They calculated that if every Android user utilized Google’s new voice search feature for just three minutes a day, the company would need to double its global data center capacity just to handle the compute load.

At the time, Google was relying on standard CPUs and GPUs for these tasks. While powerful, these general-purpose chips were inefficient for the specific heavy lifting required by Deep Learning: massive matrix multiplications. Scaling up with existing hardware would have been a financial and logistical nightmare.

This sparked a new project. Google decided to do something rare for a software company: build its own custom silicon. The goal was to create an ASIC (Application-Specific Integrated Circuit) designed for one job only: running TensorFlow neural networks.

Key Historical Milestones:

2013-2014: The project moved really fast as Google both hired a very capable team and, to be honest, had some luck in their first steps. The team went from design concept to deploying silicon in data centers in just 15 months—a very short cycle for hardware engineering.
2015: Before the world knew they existed, TPUs were already powering Google’s most popular products. They were silently accelerating Google Maps navigation, Google Photos, and Google Translate.
2016: Google officially unveiled the TPU at Google I/O 2016.

This urgency to solve the “data center doubling” problem is why the TPU exists. It wasn’t built to sell to gamers or render video; it was built to save Google from its own AI success. With that in mind, Google has been thinking about the »costly« AI inference problems for over a decade now. This is also one of the main reasons why the TPU is so good today compared to other ASIC projects.

The difference between a TPU and a GPU?

To understand the difference, it helps to look at what each chip was originally built to do. A GPU is a “general-purpose” parallel processor, while a TPU is a “domain-specific” architecture.

The GPUs were designed for graphics. They excel at parallel processing (doing many things at once), which is great for AI. However, because they are designed to handle everything from video game textures to scientific simulations, they carry “architectural baggage.” They spend significant energy and chip area on complex tasks like caching, branch prediction, and managing independent threads.

A TPU, on the other hand, strips away all that baggage. It has no hardware for rasterization or texture mapping. Instead, it uses a unique architecture called a Systolic Array.

The “Systolic Array” is the key differentiator. In a standard CPU or GPU, the chip moves data back and forth between the memory and the computing units for every calculation. This constant shuffling creates a bottleneck (the Von Neumann bottleneck).

In a TPU’s systolic array, data flows through the chip like blood through a heart (hence “systolic”).

It loads data (weights) once.
It passes inputs through a massive grid of multipliers.
The data is passed directly to the next unit in the array without writing back to memory.

What this means, in essence, is that a TPU, because of its systolic array, drastically reduces the number of memory reads and writes required from HBM. As a result, the TPU can spend its cycles computing rather than waiting for data.

Google’s new TPU design, also called Ironwood also addressed some of the key areas where a TPU was lacking:

They enhanced the SparseCore for efficiently handling large embeddings (good for recommendation systems and LLMs)
It increased HBM capacity and bandwidth (up to 192 GB per chip). For a better understanding, Nvidia’s Blackwell B200 has 192GB per chip, while Blackwell Ultra, also known as the B300, has 288 GB per chip.
Improved the Inter-Chip Interconnect (ICI) for linking thousands of chips into massive clusters, also called TPU Pods (needed for AI training as well as some time test compute inference workloads). When it comes to ICI, it is important to note that it is very performant with a Peak Bandwidth of 1.2 TB/s vs Blackwell NVLink 5 at 1.8 TB/s. But Google’s ICI, together with its specialized compiler and software stack, still delivers superior performance on some specific AI tasks.

The key thing to understand is that because the TPU doesn’t need to decode complex instructions or constantly access memory, it can deliver significantly higher Operations Per Joule.

For scale-out, Google uses Optical Circuit Switch (OCS) and its 3D torus network, which compete with Nvidia’s InfiniBand and Spectrum-X Ethernet. The main difference is that OCS is extremely cost-effective and power-efficient as it eliminates electrical switches and O-E-O conversions, but because of this, it is not as flexible as the other two. So again, the Google stack is extremely specialized for the task at hand and doesn’t offer the flexibility that GPUs do.

Performance numbers TPU vs GPU?

As we defined the differences, let’s look at real numbers showing how the TPU performs compared to the GPU. Since Google isn’t revealing these numbers, it is really hard to get details on performance. I studied many articles and alternative data sources, including interviews with industry insiders, and here are some of the key takeaways.

The first important thing is that there is very limited information on Google’s newest TPUv7 (Ironwood), as Google introduced it in April 2025 and is just now starting to become available to external clients (internally, it is said that Google has already been using Ironwood since April, possibly even for Gemini 3.0.). And why is this important if we, for example, compare TPUv7 with an older but still widely used version of TPUv5p based on Semianalysis data:

TPUv7 produces 4,614 TFLOPS(BF16) vs 459 TFLOPS for TPUv5p
TPUv7 has 192GB of memory capacity vs TPUv5p 96GB
TPUv7 memory Bandwidth is 7,370 GB/s vs 2,765 for v5p

We can see that the performance leaps between v5 and v7 are very significant. To put that in context, most of the comments that we will look at are more focused on TPUv6 or TPUv5 than v7.

Based on analyzing a ton of interviews with Former Google employees, customers, and competitors (people from AMD, NVDA & others), the summary of the results is as follows.

Most agree that TPUs are more cost-effective compared to Nvidia GPUs, and most agree that the performance per watt for TPUs is better. This view is not applicable across all use cases tho.

A Former Google Cloud employee:

»If it is the right application, then they can deliver much better performance per dollar compared to GPUs. They also require much lesser energy and produces less heat compared to GPUs. They’re also more energy efficient and have a smaller environmental footprint, which is what makes them a desired outcome.

The use cases are slightly limited to a GPU, they’re not as generic, but for a specific application, they can offer as much as 1.4X better performance per dollar, which is pretty significant saving for a customer that might be trying to use GPU versus TPUs.«

source: AlphaSense

Similarly, a very insightful comment from a Former Unit Head at Google around TPUs materially lowering AI-search cost per query vs GPUs:

»TPU v6 is 60-65% more efficient than GPUs, prior generations 40-45%«

This interview was in November 2024, so the expert is probably comparing the v6 TPU with the Nvidia Hopper. Today, we already have Blackwell vs V7.

Many experts also mention the speed benefit that TPUs offer, with a Former Google Head saying that TPUs are 5x faster than GPUs for training dynamic models (like search-like workloads).

There was also a very eye-opening interview with a client who used both Nvidia GPUs and Google TPUs as he describes the economics in great detail:

»If I were to use eight H100s versus using one v5e pod, I would spend a lot less money on one v5e pod. In terms of price point money, performance per dollar, you will get more bang for TPU. If I already have a code, because of Google’s help or because of our own work, if I know it already is going to work on a TPU, then at that point it is beneficial for me to just stick with the TPU usage.

In the long run, if I am thinking I need to write a new code base, I need to do a lot more work, then it depends on how long I’m going to train. I would say there is still some, for example, of the workload we have already done on TPUs that in the future because as Google will add newer generation of TPU, they make older ones much cheaper.

For example, when they came out with v4, I remember the price of v2 came down so low that it was practically free to use compared to any NVIDIA GPUs.

Google has got a good promise so they keep supporting older TPUs and they’re making it a lot cheaper. If you don’t really need your model trained right away, if you’re willing to say, “I can wait one week,” even though the training is only three days, then you can reduce your cost 1/5.«

source: AlphaSense

Another valuable interview was with a current AMD employee, acknowledging the benefits of ASICs:

»I would expect that an AI accelerator could do about probably typically what we see in the industry. I’m using my experience at FPGAs. I could see a 30% reduction in size and maybe a 50% reduction in power vs a GPU.«

We also got some numbers from a Former Google employee who worked in the chip segment:

»When I look at the published numbers, they (TPUs) are anywhere from 25%-30% better to close to 2x better, depending on the use cases compared to Nvidia. Essentially, there’s a difference between a very custom design built to do one task perfectly versus a more general purpose design.«

What is also known is that the real edge of TPUs lies not in the hardware but in the software and in the way Google has optimized its ecosystem for the TPU.

A lot of people mention the problem that every Nvidia »competitor« like the TPU faces, which is the fast development of Nvidia and the constant »catching up« to Nvidia problem. This month a former Google Cloud employee addressed that concern head-on as he believes the rate at which TPUs are improving is faster than the rate at Nvidia:

»The amount of performance per dollar that a TPU can generate from a new generation versus the old generation is a much significant jump than Nvidia«

In addition, the recent data from Google’s presentation at the Hot Chips 2025 event backs that up, as Google stated that the TPUv7 is 100% better in performance per watt than their TPUv6e (Trillium).

Even for hard Nvidia advocates, TPUs are not to be shrugged off easily, as even Jensen thinks very highly of Google’s TPUs. In a podcast with Brad Gerstner, he mentioned that when it comes to ASICs, Google with TPUs is a »special case«. A few months ago, we also got an article from the WSJ saying that after the news publication The Information published a report that stated that OpenAI had begun renting Google TPUs for ChatGPT, Jensen called Altman, asking him if it was true, and signaled that he was open to getting the talks back on track (investment talks). Also worth noting was that Nvidia’s official X account posted a screenshot of an article in which OpenAI denied plans to use Google’s in-house chips. To say the least, Nvidia is watching TPUs very closely.

Ok, but after looking at some of these numbers, one might think, why aren’t more clients using TPUs?

Subscribe now

Where are the problems for the wider adoption of TPUs

The main problem for TPUs adoption is the ecosystem. Nvidia’s CUDA is engraved in the minds of most AI engineers, as they have been learning CUDA in universities. Google has developed its ecosystem internally but not externally, as it has used TPUs only for its internal workloads until now. TPUs use a combination of JAX and TensorFlow, while the industry skews to CUDA and PyTorch (although TPUs also support PyTorch now). While Google is working hard to make its ecosystem more supportive and convertible with other stacks, it is also a matter of libraries and ecosystem formation that takes years to develop.

It is also important to note that, until recently, the GenAI industry’s focus has largely been on training workloads. In training workloads, CUDA is very important, but when it comes to inference, even reasoning inference, CUDA is not that important, so the chances of expanding the TPU footprint in inference are much higher than those in training (although TPUs do really well in training as well – Gemini 3 the prime example).

The fact that most clients are multi-cloud also poses a challenge for TPU adoption, as AI workloads are closely tied to data and its location (cloud data transfer is costly). Nvidia is accessible via all three hyperscalers, while TPUs are available only at GCP so far. A client who uses TPUs and Nvidia GPUs explains it well:

»Right now, the one biggest advantage of NVIDIA, and this has been true for past three companies I worked on is because AWS, Google Cloud and Microsoft Azure, these are the three major cloud companies.

Every company, every corporate, every customer we have will have data in one of these three. All these three clouds have NVIDIA GPUs. Sometimes the data is so big and in a different cloud that it is a lot cheaper to run our workload in whatever cloud the customer has data in.

I don’t know if you know about the egress cost that is moving data out of one cloud is one of the bigger cost. In that case, if you have NVIDIA workload, if you have a CUDA workload, we can just go to Microsoft Azure, get a VM that has NVIDIA GPU, same GPU in fact, no code change is required and just run it there.

With TPUs, once you are all relied on TPU and Google says, “You know what? Now you have to pay 10X more,” then we would be screwed, because then we’ll have to go back and rewrite everything. That’s why. That’s the only reason people are afraid of committing too much on TPUs. The same reason is for Amazon’s Trainium and Inferentia.«

source: AlphaSense

These problems are well known at Google, so it is no surprise that internally, the debate over keeping TPUs inside Google or starting to sell them externally is a constant topic. When keeping them internally, it enhances the GCP moat, but at the same time, many former Google employees believe that at some point, Google will start offering TPUs externally as well, maybe through some neoclouds, not necessarily with the biggest two competitors, Microsoft and Amazon. Opening up the ecosystem, providing support, etc., and making it more widely usable are the first steps toward making that possible.

A former Google employee also mentioned that Google last year formed a more sales-oriented team to push and sell TPUs, so it’s not like they have been pushing hard to sell TPUs for years; it is a fairly new dynamic in the organization.

Google’s TPU is the biggest competitive advantage of its cloud business for the next 10 years

The most valuable thing for me about TPUs is their impact on GCP. As we witness the transformation of cloud businesses from the pre-AI era to the AI era, the biggest takeaway is that the industry has gone from an oligopoly of AWS, Azure, and GCP to a more commoditized landscape, with Oracle, Coreweave, and many other neoclouds competing for AI workloads. The problem with AI workloads is the competition and Nvidia’s 75% gross margin, which also results in low margins for AI workloads. The cloud industry is moving from a 50-70% gross margin industry to a 20-35% gross margin industry. For cloud investors, this should be concerning, as the future profile of some of these companies is more like that of a utility than an attractive, high-margin business. But there is a solution to avoiding that future and returning to a normal margin: the ASIC.

The cloud providers who can control the hardware and are not beholden to Nvidia and its 75% gross margin will be able to return to the world of 50% gross margins. And there is no surprise that all three AWS, Azure, and GCP are developing their own ASICs. The most mature by far is Google’s TPU, followed by Amazon’s Trainum, and lastly Microsoft’s MAIA (although Microsoft owns the full IP of OpenAI’s custom ASICs, which could help them in the future).

While even with ASICs you are not 100% independent, as you still have to work with someone like Broadcom or Marvell, whose margins are lower than Nvidia’s but still not negligible, Google is again in a very good position. Over the years of developing TPUs, Google has managed to control much of the chip design process in-house. According to a current AMD employee, Broadcom no longer knows everything about the chip. At this point, Google is the front-end designer (the actual RTL of the design) while Broadcom is only the backend physical design partner. Google, on top of that, also, of course, owns the entire software optimization stack for the chip, which makes it as performant as it is. According to the AMD employee, based on this work split, he thinks Broadcom is lucky if it gets a 50-point gross margin on its part.

Without having to pay Nvidia for the accelerator, a cloud provider can either price its compute similarly to others and maintain a better margin profile or lower costs and gain market share. Of course, all of this depends on having a very capable ASIC that can compete with Nvidia. Unfortunately, it looks like Google is the only one that has achieved that, as the number one-performing model is Gemini 3 trained on TPUs. According to some former Google employees, internally, Google is also using TPUs for inference across its entire AI stack, including Gemini and models like Veo. Google buys Nvidia GPUs for GCP, as clients want them because they are familiar with them and the ecosystem, but internally, Google is full-on with TPUs.

As the complexity of each generation of ASICs increases, similar to the complexity and pace of Nvidia, I predict that not all ASIC programs will make it. I believe outside of TPUs, the only real hyperscaler shot right now is AWS Trainium, but even that faces much bigger uncertainties than the TPU. With that in mind, Google and its cloud business can come out of this AI era as a major beneficiary and market-share gainer.

Recently, we even got comments from the SemiAnalysis team praising the TPU:

»Google’s silicon supremacy among hyperscalers is unmatched, with their TPU 7^th Gen arguably on par with Nvidia Blackwell. TPU powers the Gemini family of models which are improving in capability and sit close to the pareto frontier of $ per intelligence in some tasks«

source: SemiAnalysis

How many TPUs does Google produce today, and how big can that get?

Here are the numbers that I researched:

Q3 earnings: Google's AI muscle, Meta Goes All in, Microsoft shows its cards

UncoverAlpha — Thu, 30 Oct 2025 12:00:43 GMT

Hey everyone,

I wanted to share my thoughts and the key highlights from yesterday’s earnings calls from Google, Microsoft, and Meta, because I believe we got some very important signals for these companies and the industry at large.

Google earnings

The key findings from the call I would summarize in the following:

Google Search is much better than expected, and the future of AI Search monetization is getting clearer
GCP continues to impress and win over deals
Google is executing on AI

Search

Google Search delivered impressive results, generating $56.6B in revenue, up 14.6% YoY.

Sundar explained:

»AI is driving an expansionary moment for Search. As people learn what they can do with our new AI experiences, they’re increasingly coming back to Search more.«

Google already talked about in Q2 earnings that overall queries and commercial queries continue to grow YoY on Search, yesterday they added even more color and said:

»During the Q2 call, we shared that overall queries and commercial queries continue to grow year-over-year. This growth rate increased in Q3, largely driven by our AI investments in Search, most notably AI Overviews and AI Mode.«

So, especially AI overviews are prompting people to search more, which is enhancing the Search experience and giving us first glimpses of what future AI search will look like. Google also said that AI mode is resonating well with users and that it now has 75M DAUs.

Ad clicks in the quarter were up 7% YoY, and CPCs were up 7% YoY.

A very interesting segment of the call was when Google discussed the monetization of AI Search.

»And as I’ve shared before, for AI Overviews, even at our current baseline of ads below and within the AI’s response, overall, we see the monetization at approximately the same rate.«

But the bigger AHA moment for me on the call was when Google hinted at a path where AI Search can also make traditionally non-monetizable search queries monetizable:

»You could also argue that on queries, that historically have not been well monetized. We think there is a potential opportunity here where you can obviously imagine that we can build this out with smart AI integration.«

»there’s an opportunity to actually take, let’s say, queries that are not fully commercial but could have an adjacent commercial relationship to basically expand this into more attractive ads offerings without -- while really creating a really interesting user experience at the same time.«

This is a very important piece for investors as only about 20% of traditional Search queries are commercial, so if AI Search can unlock that pie further while the monetization rate of AI overviews is similar to tradtional Search the TAM in terms of ads for the AI Search part could be even bigger than the traditional Search TAM, which most investors don’t expect right now.

GCP

As expected, and as I shared in my last article, which explained what alt data and my expectations were for GCP, AWS, and Azure, GCP delivered a really strong quarter.

Google Cloud revenue was $15.2B, up 34% YoY, but within Google Cloud, GCP continued to grow at a rate that was much higher than Cloud’s overall revenue-growth rate, as mentioned by management. My guess is that, based on this comment and other data that I am seeing, we are talking about a +40% YoY growth rate in GCP.

The impressive stat was also Google Cloud’s backlog, given that they don’t have a prime customer such as OAI, and the usage of Anthropic is split between AWS and GCP:

»Google Cloud’s backlog increased 46% sequentially (quarter over quarter) and 82% year-over-year, reaching $155 billion at the end of the third quarter.«

As I already shared and now confirmed by management, GCP is winning big with new clients:

»The number of new GCP customers increased by nearly 34% year-over-year. Two, we are signing larger deals. We have signed more deals over $1 billion through Q3 this year than we did in the previous 2 years combined.«

The diversifications of clients also seem very healthy, especially as I am more and more concerned about the concentration of the industry to two clients:

»Over the past 12 months, nearly 150 Google Cloud customers each processed approximately 1 trillion tokens with our models for a wide range of applications.«

AI execution

A big, important data point was that the Gemini app now has over 650M MAUs, and that queries increased by 3x from Q2 of this year. Just to put it in context, ChatGPT probably has around 1B MAUs, so Gemini has made significant gains in the last quarter.

Gemini adoption is also present with enterprises, not just end-users:

»Our first-party models, like Gemini, now process 7 billion tokens per minute via direct API used by our customers.«

On an important question of models advancing at a slower pace, Sundar acknowledged that, but at the same time, hinted that this is the reason why Gemini 3.0 is coming out a few months later than expected:

»I’m incredibly impressed by the pace at which the teams are executing and the pace at which we are improving these models. But it also is true at the same time that each of the prior model you’re trying to get better over is now getting more and more capable. So I think both the pace is increasing, but sometimes we are taking the time to put out a notably improved model, so I think -- and that may take slightly longer.«

And yes, we got confirmation, Gemini 3.0 is coming out THIS year.

Google Summary

All in all, a great quarter by Google, showing not only that GCP is taking the most market share right now and is uniquely positioned with its TPU offering, but also Google Search showing what the future is going to look like, and that future might be even better than the past, which is a big message.

Subscribe now

Microsoft earnings

With Microsoft, my main focus is on what's most important: Azure. We got a strong Azure number with Azure growing 40% and 39% in constant currency, but the questions and concerns were focused on customer concentration and the relationship with OpenAI.

And we got some really great insights on that topic and how Satya and Microsoft are thinking about this going forward.

First of all, as you might expect just from having OAI as a client, forward bookings are off the charts:

»Commercial bookings increased 112% and 111% in constant currency and were significantly ahead of expectations, driven by Azure commitments from OpenAI as well as continued growth in the number of $100 million-plus contracts for both Azure and M365. These results do not include any impact from the incremental $250 billion Azure commitments from OpenAI announced yesterday. Commercial remaining performance obligation increased to $392 billion and was up 51% year-over-year.«

But the question is not just what your bookings are, but whether the most important client, OAI, can pay for those orders, and what if Microsoft overbuilds because of that one client?

Microsoft gave us an answer to that and how they are viewing things.

Throughout the call, management was very careful to send the following message: Yes, we will expand and build new data centers at a high pace over the next 2 years, but we are doing so based on even short-term demand we have, so 2 years.

So they are saying that Microsoft is not taking much risk because it is matching its buildout to short-term demand, since the ability of OAI to fulfill short-term orders is much easier to see than the long-term projections.

It was also clear from the call that Microsoft's biggest risk is the short life expectancy of GPUs. In a way, they actually hinted that they see them as 2-year duration assets. Here is an important segment on that from their CFO:

»Let me talk a little bit about maybe connecting a couple of the dots because with $400 billion of RPO, that’s sort of short-dated as we talked about, our needs to continue to build out the infrastructure is very high. And that’s for booked business today. That is not any new booked business we started trying to accomplish on October 1, right?

And so the way to think about that, and you saw it this quarter in particular, and as we talked about ‘26, the remainder, number one, we’re pivoting toward -- increasingly, we talked about this short-lived assets, both GPUs and CPUs, Again, we talk about all these workloads are burning both in terms of app building. Now when that happens, short-lived assets generally are done to match sort of the duration of the contracts or the duration of your expectation of those contracts. And so I sometimes think when people think about risk, they’re not realizing that most of the lifetimes of these and the lifetime of the contracts are very similar.

And so when you think about having revenue and the bookings and coming on the balance sheet, the depreciation of short-lived assets, they’re actually quite matched, Mark. And as you know, we’ve spent the past few years not actually being short GPUs and CPUs per se, we were short the space or the power is the language we used to put them in. So we spent a lot of time building out that infrastructure. Now we’re continuing to do that also using leases. Those are very long-lived assets, as we’ve talked about 15 to 20 years.

And over that period of time, do I have confidence that we’ll need to use all of that, it is very high.

And so when I think about sort of balancing those things, seeing the pivot to GPU, CPU short-lived, seeing the pivot in terms of how those are being utilized, we are -- and I said this now, we’ve been short now for many quarters. I thought we were going to catch up, we are not. Demand is increasing. It is not increasing in just one place. It is increasing across many places.

We’re seeing usage increases in products. We are seeing new products launch that are getting increasing usage, and increasing usage very quickly. When people see real value, they actually commit real usage.

And I sometimes think this is where this cycle needs to be thought through completely is that when you see these kind of demand signals and we know we’re behind, we do need to spend. But we’re spending with a different amount of confidence in usage patterns and in bookings, and I feel very good about that. I have said we are now likely to be short capacity to serve the most important things we need to do, which is Azure, our first-party applications. We need to invest in product R&D and we’re doing end-of-life replacements in the fleet. So we’re going to spend to make sure that happens.

It’s about modernization. It’s about high quality. It’s about service delivery, and it’s about meeting demand.«

And Satya added to this with this comment, confirming the worry about the usefulness of the life of GPUs, when you get much more capable GPUs every year:

»The second thing that we’re also doing is continually modernizing the fleet. It’s not like we buy one version of, say, NVIDIA and load up for all the gigawatts we have. Each year, you buy, you write the Moore’s Law, you continuously modernize and depreciate it. And that means you also use software to grow efficiency.«

Satya also communicated between the lines that they want to customize any of their data centers for OpenAI, as they want to hedge:

»But it’s great to have the hit first-party apps in the beginning because you can build scale that then if it’s a fungible and that’s where the key is. You don’t want to build for a digital native in -- as if you’re just doing hosting for them. You want to build. That’s where -- I think some of the decision-making of ours is probably getting better understood. What do we say yes to, what do we say no to.«

On Azure’s 40% growth, we also got information that a lot of that growth actually comes from OAI:

»Results were ahead of expectations, driven by better-than-expected growth in our core infrastructure business, primarily from our largest customers.«

An important information Microsoft laid out in the call was also how they will expand capacity in the next 2 years and what this means for revenue for Azure:

»We will increase our total AI capacity by over 80% this year and roughly double our total data center footprint over the next 2 years, reflecting the demand signals we see.«

The last comment I would highlight was Satya’s comment, saying in the end, Microsoft’s success is not OpenAI but their own model, which I found was quite interesting:

»And then we have to fund our own R&D and model capability because in the long run, that’s what’s going to differentiate us.«

Microsoft Summary

It was a great quarter for Microsoft and Azure based on the numbers, but there is growing concern about the concentration on one client. It would be interesting to see what Azure growth would be like, ex, OAI.

Meta earnings

I can summarize the earnings exactly as I laid out in this post a few days ago. The core business of the family of apps is on fire, but Zuck has his eyes set on AI and wants to have an OpenAI inside of Meta. What this brings, at least in the short term, is heavy pressure on Meta’s profits and FCFs as OpenAI’s business model is a heavy cash burn one.

The core family of apps

Let’s first look at the core business. Revenue was $51.2B up 26% YoY.

»Across Facebook, Instagram and Threads, our AI recommendation systems are delivering higher quality and more relevant content, which led to 5% more time spent on Facebook in Q3 and 10% on Threads.«

AI is having a profound effect on Meta across both ad targeting and engagement trends.

Reels now has an annual run rate of over $50 billion.

»And now the annual run rate going through our completely end-to-end AI-powered ad tools has passed $60 billion.«

The big WOW moment for me on the call, when it comes to the core, was:

»In the U.S., overall time spent on Facebook and Instagram grew double digits year-over-year, driven by continued video strength as well as healthy growth in non-video time on Facebook.«

Time spent on both Facebook and Instagram is accelerating in Q3!

You also had the continuation of the great growth trend of both direct subscriptions and WhatsApp messaging, as well as click-to-WhatsApp ads:

»Family of Apps other revenue was $690 million, up 59%, driven by WhatsApp paid messaging revenue growth as well as meta verified subscriptions«

»We’re seeing strong growth across our portfolio of solutions, including with click-to-WhatsApp ads, which grew revenue 60% year-over-year in Q3.«

We also got word that Meta’s Ray-Ban Display glasses are sold out.

CapEx bomb

But then we moved to the portion where Zuck is going all in and ready to burn a ton of cash. The 2026 »soft« guidance was the key for many investors and their fears:

»As we have begun to plan for next year, it’s become clear that our compute needs have continued to expand meaningfully, including versus our own expectations last quarter. We are still working through our capacity plans for next year, but we expect to invest aggressively to meet these needs, both by building our own infrastructure and contracting with third-party cloud providers. We anticipate this will provide further upward pressure on our CapEx and expense plans next year. As a result, our current expectation is that CapEx dollar growth will be notably larger in 2026 than 2025. We also anticipate total expenses will grow at a significantly faster percentage rate a than 2025, with growth primarily driven by infrastructure costs, including incremental cloud expenses and depreciation.«

Zuck added things like:

»We’re also building what we expect to be an industry-leading amount of compute.«

»I think that it’s the right strategy to aggressively frontload building capacity so that way we’re prepared for the most optimistic cases.«

»If it takes longer, then we’ll use the extra compute to accelerate our core business which continues to be able to profitably use much more compute than we’ve been able to throw at it.«

»But any compute that we don’t need for that we feel pretty good that we’re going to be able to absorb a very large amount of that to just convert into more intelligence and better recommendations in our family of apps and ads in a profitable way.«

Meta is saying in the best case scenario, we have the compute and are the next OAI, in the worst case, we are frontloading some of the CapEx that we will need in the future for our core:

»So we’re really trying to plan ahead not only to ensure that we have the capacity we need in 2026, but also to give ourselves the sort of flexibility and option value to have the capacity that we think we could need in ‘27 and ‘28«

This strategy could be risky: a data center investment is fine because it is a long-duration asset, but frontloading too many GPUs could be dangerous, as Microsoft is doing the opposite. It all comes down to the fact that Meta wants to be OpenAI, and when you want to be OpenAI, you also have to have a similar P&L profile in the coming years.

A comment that made me jump was that Mark wants to calm down investors, so he even hinted at the fact that if Meta overbuilds, they are open to becoming a compute provider to others:

»Now I mean, it’s, of course, possible to overshoot that, right? And if we do, I mean, this is what I mentioned in my comments, then we see that there’s just a lot of demand for other new things that we build internally, externally, like almost every week, people come to us from outside the company asking us to stand up an API service or asking if we have different compute that they could get from us and we haven’t done that yet. But obviously, if you got to a point where you ever built, you could have that as an option.«

Meta Summary

For me, this quarter shows just how strong Meta's core business is and how AI is a huge enabler of it. If Zuck hadn’t been ambitious with the LLM model builder, the stock would probably be up this quarter, but the fact is, he has. The reasons are quite obvious: if Meta delivers and is the frontier LLM provider, its long-term margin and growth profiles are much better than if it were not; but in the short term, this means Meta has to risk current profits and cash flow to even have a chance at becoming that.

As already said before for me this outcome was expected and my positon in Meta is minimal as it ever was coming into this quarter, but as investors digest this new short-term reality for Meta over the next 2-3 years where profits and FCF will shrink drastically I will be looking for the chances to build up my position again and make it a core holding of mine.

As always, I hope you found this article valuable. I would appreciate it if you could share it with people you know who might find it interesting. I also invite you to become a paid subscriber, as paid subscribers get additional articles covering both big tech companies in more detail, as well as mid-cap and small-cap companies that I find interesting.

Subscribe to Paid

Thank you!

Disclaimer:

I own Meta (META), Google (GOOGL), , Microsoft (MSFT) stock.

Q3 2025 Channel checks & other alternative data: AWS, Azure, GCP, Meta, OpenAI, Anthropic

UncoverAlpha — Mon, 27 Oct 2025 13:22:33 GMT

Hey everyone,

Posting some of my expectations for the upcoming Big Tech earnings and what some of the alternative data reports are saying. The focus of this report is Microsoft, Amazon, Google, and Meta, along with some interesting data on OpenAI and Anthropic.

In this report, you will get insights into:

Channel checks from research houses, expert interviews, and other alternative data sources for 3Q25 on AWS, Azure, and GCP, along with my own expectations
ChatGPT vs Claude: Some interesting user behavior studies
Meta’s Instagram engagement soaring, and the state of the ad market as organic search is losing eyeballs

Let’s dive in.

Too Much AI, Too Soon

UncoverAlpha — Thu, 09 Oct 2025 15:00:03 GMT

I decided to share some thoughts about the current state of the market regarding AI.

I have become very cautious due to recent financing developments, the projected amount of capital to be raised, and the general valuation levels of many of these companies. As a result, some of you already know, I have trimmed or sold many of my tech/AI positions lately.

I want to start off and say that my conviction in AI in the long term has not changed one bit. I continue to think it will be the biggest transformation in history for society and the economy. That being said, I believe the stock market expectations in the short term have gotten ahead of the reality that we face. The issues I see can be segmented into the following categories:

1. We are running out of organic capital and entering the phase of »creative« deals

2. GPUs are a faster depreciating asset than what is thought

3. Valuations are factoring in a very small chance of things slowing down

We are running out of organic capital, so »creativity « has taken front stage

When funding unicorn startups, the classical setup used to be: big VC firms at $1B-$10B; at $10-$30B, someone like Softbank, and then you do an IPO. But AI labs like OpenAI and Anthropic don’t want the IPO route for now, as going that way means your business model and economics get dissected, and analysts dive deeper into what makes sense and what doesn’t. Even if they did an IPO, it wouldn’t raise nearly enough capital, as now we are entering a stage where AI labs need over $100 billion of new investments on an annual basis. OpenAI, with its deal with Nvidia and AMD on top of their Stargate datacenter, plans to build a total of 26GW of data centers in the next few years. And this is just a current number that we know so far. One GW of an AI data center costs around $60B, so we are talking about raising more than $ 1.5 trillion. To put that number in perspective, the most profitable business models from the big tech companies, Amazon, Google, Meta, Microsoft, and Apple, in the last 5 years produced a total of $1.4T in Free Cash Flow. And this was in a pandemic environment where usage and profits soared from the increased demand. So now we are talking about a company needing to raise more than the combined 5-year Free Cash Flow of Big Tech.

OpenAI is on track to make around $15-$20B in revenue this year. Even if that number doubles or triples next year, it is not even remotely enough to justify the investment size, so OpenAI will, of course, have to continue raising capital and possibly debt. On top of that revenue, they are expected to lose around $9B, with losses continuing to rise to $47B in 2028. Bloomberg also reported that xAI, another AI lab, is losing around $1B per month.

Financing trillions of CapEx via their own FCF will be very hard to do, so it’s clear that OAI will need to raise capital and debt, but who is big enough to put down over $ 100 billion?

Nvidia decided to potentially invest $100B in OAI, structured as $10B for each GW of power that OAI brings. To me, this deal is concerning, especially as we are now entering a phase where Nvidia is the only possible financier for these types of deals, as the FCF of everyone else is already depleted from investing in heavy CapEx to build datacenters and buy Nvidia chips. In my view, the main reason Nvidia did this deal is that, as we will discuss later in this article, OpenAI is key to the entire AI sector right now. Dylan Patel from Semianalysis recently said in a podcast that OpenAI and Anthropic are the end buyers of 1/3 of all Nvidia GPUs right now.

The problem is not just the circular type of deal itself; the problem is that, at a size of over $100B, the only possible investor is a company like Nvidia or perhaps Apple. Even the hyperscalers have all of their cash already committed to their own CapEx, which is hitting the $70-$100B annual range. Even the hyperscalers are on their limits when it comes to spend as the CapEx is rising much faster than the revenues and FCFs. On top of it, even with Nvidia’s $100B OAI, it still needs $1.4T? Who will finance that?

Another concerning sign for me is that debt financing has started to roll in in these deals. We just got information on $20B financing for xAI, where the $20B is provided by an SPV. Of that $20B, $12.5B is debt and $7.5B is capital, with Nvidia contributing $2B of the $7.5B. xAI will then rent those chips from the SPV for 5 years, where the GPUs act as collateral. Meta has also raised $29B for a data center recently, with $26B of that $29B being debt, and that data center is expected to be the collateral. Oracle has also completed a $38B debt raise. Hyperscalers like Microsoft and others are going to the neoclouds. Nebius has signed a $17.4B (potentially expandable to $19.4B) deal with Microsoft. Looking at the deal, debt is in play again:

»Nebius expects to finance the capital expenditure associated with the contract through a combination of cash flow coming from the deal and the issuance of debt secured against the contract in the near term, at terms enhanced by the credit quality of the counterparty. «

So we have entered a market stage where debt and a company like Nvidia, with its own motivations, act as the lender of last resort.

The creative deals where chips are collateral are also a big problem, as I will explain further here. I expect to see more of these GPU collateral deals until the market finally figures out the problems with those.

Subscribe now

GPUs are a faster depreciating asset than what is thought

The life of the current generation of GPUs is shorter than most think, and what many companies are projecting in their amortization plans. We are entering the inference phase of the AI cycle, where we are running out of data centers and energy. The most important metric has become tokens per watt. Nvidia has also moved to a 1-year upgrade cycle, which means that each year you will get a much more capable and energy-efficient accelerator than the previous generation. And this is not at the scale of anything we had in history with Moore’s law and chips. Jensen said it himself: between Hopper and Blackwell, they are driving the cost of tokens down 10 to 20x. Moore’s law would have achieved that by just 20%, so this is much faster, and the amortization of these GPUs should be much, much faster than what the neoclouds and hyperscalers are modeling. On a recent podcast, Jonathan Ross, the CEO of Groq and one of the founders of Google TPUs, said that at Groq, they are using a 1-year cycle in terms of amortization, as the people who are using 3-5 year amortization cycles are wrong. With chips, you don’t just have an upfront investment in CapEx; you also incur the OpEx of running that chip, along with the electricity and water costs that come with it. Not to forget, electricity costs are going up because these AI factories require a lot of electricity. Looking at the statements and financials of Neoclouds and hyperscalers, you can see that their numbers differ. The hyperscalers follow a 3-4 year amortization cycle for GPUs, whereas Corewave and some neoclouds follow a 6-year depreciation of Nvidia GPUs, as stated by their leadership. The losses on these neoclouds would have been much, much bigger if the amortization cycle were 1-2 years instead of 6, which is another concerning pressure point in the whole ecosystem.

But some might say, well, you still see people renting Nvidia H100, which are chips that Nvidia started selling 3 years ago. Yes, but there are two factors to that. The first one is that you have two clients pushing demands sky high, as they are subsidizing the end users, as the computing to do the services that they offer is much more expensive than the price that they are charging the end users. This works out only to the point where investors are willing to give you the money to continue doing that. And the second, even more important point is that the H100 is still useful despite being 3 years old, because NVDA switched to a 1-year product cycle between H100 and Blackwell, so this is in late 2024. Before that, the cycle was 18-24 months. So, in terms of cycle times, the chip isn’t that old from a generation perspective compared to looking at it in years. However, with Nvidia now on a one-year product cycle, this change affects things significantly. In my view, the real amortization of these chips should be in 1-2 years.

For the sake of math, let’s take Coreweave’s amortization of 6 years. This means that when Nvidia Vera Rubin comes out in late 2026, people will still want to rent Ampere A100, which started shipping in late 2020. That is crazy and not going to happen. Even the hyperscalers ammorization rates of 3-4 years are a stretch in my view, especially as we go to a world where we don’t have any »free« AI data centers waiting around, we have to build new ones and get new power to them, which takes time, so for all the comapanies that will want to scale they will have to switch up their old GPUs at the data centers they already have running for new GPUs to get more tokens per watt as their watt usage is limited.

The problem with extending your amortization cycle is that it shows higher profits today than they really are. So here is another concern of mine, as the profits of all the hyperscalers in the cloud space are going to come under pressure as the true amortization rate shows up in the coming years. It becomes a broader industry problem when investors start to focus more on this and see that the neoclouds are losing even more money than they state. Again, for the AI circle to continue, investors need to pour money into these neoclouds as well…

Nvidia is well aware of this problem, so this comment from an NVDA employee doesn’t come as a surprise:

»…taking out the old ones and put in the new ones, and those old ones we’ll actually buy back. If a customer has A100s and they want to go to H100s, we’ll buy back the servers and the chips and then resell them overseas.

Source: AlphaSense

My speculation is that overseas means China, but now that they can’t sell to the Chinese market, the question is, who will buy these old chips? At the end, someone has to take those useless chips on their books. And Nvidia is already committed to taking on some of these potential problems if they arise. As recently reported in a CoreWeave deal, Nvidia is obligated to pay the company up to $6.3 billion through 2032 if the cloud provider has unsold capacity. The agreement was actually signed in 2023, but was only publicly revealed in an SEC filing this month. So Nvidia is already acting as a backstop to some extent, although Coreweave’s debt by itself is much larger than the $6.3B.

Why do you think Microsoft is doing deals with Neoclouds? Because they are seeing a surge in demand for compute from their clients. Microsoft wants to maintain the client relationship and keep the client happy, but they are not confident enough in the CapEx growing even further, so they would rather offload some of the risk to someone else. The client doesn’t know or care that Microsoft doesn’t own the physical infrastructure, and when the hype fades, Microsoft doesn’t have to write those old chips off as a loss, since the neoclouds have taken over that risk. It’s a win for Microsoft as they keep the client, and if the demand turns out to be durable in the long term, they have more than enough time to build out their own data center and switch back to their own infrastructure. In the meantime, in the frenzy cycle we are in right now, they can offload the risk of chips becoming obsolete faster than expected. One of the main reasons Microsoft wants to work with neoclouds is that they are uncertain about CapEx and prefer to take OpEx.

On top of everything already stated about these creative deals, we are now even doing deals where GPUs are in SPVs that serve as collateral. As already stated before, if the real GPU depreciation rate is 1-2 years, which I believe is correct, then the collateral on many of these deals will be a problem.

Valuations are factoring in a very small chance of things slowing down

Current valuations of many of the technology companies are factoring in very little risk. First is the customer concentration risk. Groq CEO said that 35-36 companies are currently responsible for 99% of token spending in AI right now. And even among those 35 companies, 2 are by far the most significant spenders: OpenAI and Anthropic. We already mentioned the stat from Patel that 1/3 of Nvidia GPUs end up going to OpenAI or Anthropic. The demand from these two companies is reflected not only in Nvidia but also throughout the semiconductor chain and in the revenues of hyperscalers and neoclouds. This means that a big chunk of the market is dependent on the success and progress of these two companies.

Both OAI and Anthropic need to continue growing at a very high clip, in terms of users, user engagement, and model performance. In addition, both of these companies (OAI & Anthropic) have to continue to raise enormous amounts of new capital at +$500B valuations, which we already talked about, and it is going to be very challenging, to say the least. We haven’t even mentioned the rate of progress of these models. I am not an AI tech skeptic, but I believe that, as with anything, there is a risk of things not working out. Right now, the market is pricing in a perfect execution of future roadmaps. It is also telling that Microsoft, which had complete access to OAI (even their IP), chose not to fulfill OAI’s future compute demand needs at the rate OAI wanted. Keep in mind, Microsoft has rights of first refusal, meaning they could have the Oracle cloud purchase order if they wanted it, but they didn’t. One has to at least think about why that is. Microsoft’s Satya has, over the years, proven to us that he is one of the best CEOs out there.

Also, I don’t see a future where 5 companies are spending on $100B training runs for the next frontier AI. I believe that it will become much more narrow in the future with 3 or even fewer players forming the market, which means that a lot of the current compute spend for training is being wasted as they create similar functioning models, and in terms of the model performance layer, the moat doesn’t seem to be sticky or long-lasting.

The market is also discounting the risk of disruption for many public technology companies, in my view. When it comes to disruption, everyone thinks only of Google Search, but this potential disruption has now expanded further. The business models of companies like OAI, Anthropic, and xAI are expanding into areas such as social media, e-commerce distribution, productivity tools, and even cloud infrastructure. Information retrieval (the Google Search alternative) is only the first step.

If we just look at the cloud market, most of us, including myself, thought that we would have an oligopoly of Amazon, Microsoft, and Google just a few years ago, as it was unimaginable to expect that anyone would raise enough funds to invest +$100B to build out an AI cloud infrastructure. Well, today, if companies like OAI actually achieve at least half of what they have in plan, they will have the same, if not even more, capacity than some of those hyperscalers. The direct deals they are doing with Nvidia, AMD, and SK Hynix also mean they are skipping cloud providers. A current employee at Nvidia even said that xAI’s goal is to actually become a compute provider:

NVDA employee: »xAi Elon’s company. They’re building up a tremendous salesforce. They’ve probably called me like 10 times in the last 6 months, and they’re building out there. They want to make a massive disruption…

Analyst: They want to become Oracle?

NVDA employee: Bingo.«

source: AlphaSense

We also have companies like Oracle, which are willing to take big risks, with debt and OAI orders to build out capacity. We have the neoclouds. So, for the three hyperscalers, if the market doesn’t soon cool down in terms of funding for these neoclouds and AI labs, they could face serious competitors down the line.

The flip side is that when the market cools down, hyperscalers with positive FCF will have opportunities to buy some of these competitors today, as they might become distressed assets. Nonetheless, the disruption risk with this technological shift is significant, affecting the entire technology industry, and the valuations currently do not reflect that, in my view.

What also doesn’t get enough attention is that much of the spending by current tech leaders is not tied to new revenue streams but actually to defend the moats and business models they already have. They are in a race that has gotten out of hand, but as Meta’s Zuckerberg has recently stated, the risk of overspending a few hundred billion on infrastructure is smaller than the risk of being left out. I agree with Mark on this point and understand why all of these companies have to be in this race. However, the capital market’s job would be to properly reflect that risk in valuation multiples, and right now, they are not.

To be clear, I am not calling for a 2000s-like bubble drawdown of +50%, but I do believe that we are reaching financial limits that will cause the market to reevaluate some of the multiples it has given to companies today, and that we are about to enter a consolidation phase. In this phase, it will also become much clearer who has a sustainable moat and what the new business opportunities are.

For AI to reach its economic potential, we need better and more efficient hardware and more efficient software for inference and training these models. I believe we will get that, but right now the market is in a race with itself, and short-term expectations have gone far too high, especially as we consider that most tech companies are going to go through heavy CapEx cycles, and profits and FCFs will shrink. On top of that, you have moats being shaken all across the industry, and many will even question not only the moats but the capex-light business models, as everyone needs AI infrastructure. The trigger point for stopping this AI race is in the hands of the capital markets. Once they decide we will no longer fund this at this pace, it will signal to both private and public companies that the normalization phase has begun, and I believe we are very close to that point.

Subscribe to Paid

Thank you!

Disclaimer:

AI compute: Nvidia’s Grip and AMD’s Chance

UncoverAlpha — Fri, 22 Aug 2025 13:41:04 GMT

Hey everyone,

In this article, I am sharing my findings regarding Nvidia and AMD, as well as the advantages and weaknesses of both in terms of AI training and inference data center workloads. I will also share my view on the future demand and risks for both in this evolving AI landscape. For this article, I gathered multiple research reports and read & conducted interviews with former employees and industry insiders, including an interview with Jim Sangster, a former Nvidia director. The conversation with Jim is recorded and available to listen to for free on this LINK. I HIGHLY recommend listening to it as it was really packed with insights.

Now let's start with the article.

Nvidia and its moat

Over the last years, Nvidia has formed a moat around its business when it comes to AI workloads. It has a monopoly over AI training, but with new AI models skewing more towards reasoning, inference, becomes an important use-case as connecting multiple GPUs in a large cluster is so far the best way to serve both training as well as more complex inference (such as reasoning models). Nvidia's moat is formed from its full-stack service approach. Still today, many view Nvidia as a GPU provider, but it is far from that; instead, it offers a full-stack AI accelerator. If we peel down the most essential layers that form the stack, we can separate them into the following groups:

Hardware: GPU/AI accelerator
Software that optimizes the hardware (in Nvidia's case, that is CUDA and the software ecosystem that supports many libraries and integrations)
Networking with scale-up and scale-out, making it possible to connect the GPUs in a unified cluster that can handle more complex tasks.

In reality, the hardware component, which is the GPU, is the least interesting from a moat perspective for Nvidia because, for example, AMD GPUs often provide better performance characteristics on an individual GPU level. The real moat for Nvidia lies in the CUDA software and Nvidia's networking, which are the two areas I want to focus on.

Nvidia CUDA vs AMD ROCm

So starting with CUDA. The key thing when it comes to CUDA is that it has a significant head start, as it was developed by Nvidia several years ago. Since then, CUDA has evolved into a strong ecosystem, with great software libraries and a significant mindshare among engineers. Engineers are taught CUDA at a university level, so every engineer has been either taught or become familiar with CUDA. Besides being the best software optimization stack for GPUs, what is even harder for competitors to break down is the ecosystem that CUDA has built. Many industry experts share the same view that because CUDA knowledge is so entrenched in engineers, companies are not willing to switch to other software stacks, as they are not comfortable finding employees who know those other stacks. The numbers paint the picture. A Microsoft employee who works at Azure recently said this regarding CUDA/ROCm adoption:

»…within the NVIDIA developer program, and around 50,000 people had listed machine learning and CUDA as their skills on LinkedIn. I think we're talking to tens of thousands of people who have good proficiency in CUDA, specifically around training, computer vision, reinforcement learning, and GenAI. exact numbers, difficult to pinpoint, but you should be looking at somewhere around the 10,000-50,000 range.

ROCm, specifically from an AMD perspective, we have a lesser footprint. I think my number globally would be less than 5,000. If I look at LinkedIn, around 2,000 profiles we had found. I think you're talking about 1,000-2,000 people, max. I'm talking about engineers who have hands-on experience. I'm not talking about engineers who know theoretically about ROCm.«

source: AlphaSense

One of the key benefits of CUDA is that it continually improves your old Nvidia hardware over time with new software updates. Semianalysis has just conducted a benchmark comparison of running a training run on the H100 and the new Blackwell GB200 NVL72, and the results demonstrate why CUDA and its software improvements over time are so crucial, especially in AI training workloads. When they calculated the Total cost of ownership (TCO) of H100 compared to GB200 NVL72, the GB200 NVL72 TCO is 1.6x-1.7x higher than the H100 one, so the performance gains should be at least 1.6x-1.7x for it to make sense from an investment standpoint. GB200 NVL72 first shipment started in mid-February this year. By May 2025, the GB200 NVL72's performance per TCO had not surpassed that of the H100; however, by July, Semianalysis began to see the performance per TCO reach 1.5x that of the H100, which is close to the target range. That improvement is driven by software optimizations on CUDA for GB200 NVL72. By the end of the year, Semianalysis expects the performance per TCO to reach a 2.7x that of H100, making the GB200 NVL72 a clear choice for model training.

This shows how important the software CUDA layer is for chip performance, especially when it comes to training workloads. In terms of inference, the CUDA layer is less important, but other aspects of Nvidia's moat, such as scale-up and scale-out networking, remain key, especially for more complex inference workloads like reasoning models.

»In inference, the primary strength of people who can run inference on scale is the compute capacity. How much capacity do you have and what type of scale up and scale down services you have…Unlike training, where training is very technical. It's about doing math at a very precision level. It's about running training algorithms and running neural networks. That is where CUDA really shines. That is the key difference here in inference. You're not necessarily married to the CUDA level.«

Source: Microsoft Azure employee found on AlphaSense

That being said, AMD has made significant strides in developing ROCm over the past few months. Many industry experts are reporting that ROCm, with its recent update, has become much more stable.

Even AMD has stated that ROCm 7 achieves a 3.5x improvement in inference throughput performance over ROCm 6.

An Azure employee also emphasized AMD's recent kernel and compiler improvements:

»I think from a platform perspective is the compiler and the kernel improvement. AMD's HIP programming model is coming very close to CUDA in terms of portability and syntax. That's something which they need to continue to do so that people who are using CUDA can actually easily migrate to ROCm if they want to do.«

source: AlphaSense

So, ROCm is improving significantly, becoming more stable, and is now supporting a wider range of integrations and libraries. However, the biggest challenge it faces is the mindshare and ecosystem buildout that can't be fast-forwarded and will take years to develop. While I do see some clients, such as hyperscalers, joining the ROCm train sooner, as they are motivated to have an alternative for the rest of the ecosystem, AMD has to make significant performance leaps over Nvidia or build for years to develop a similar ecosystem.

Some industry experts suggest that CUDA converters could be a viable solution. However, those who have actually tried these converters claim that they convert the CUDA code at around 80%, while the remaining 20% must be done manually by kernel engineers, who, as you might expect, are not very inexpensive. In the end, the calculus for using converters from CUDA to, say, ROCm often doesn't make economic sense, as it is more costly due to kernel engineer costs than to simply go with Nvidia's products. Jim, the former Director at Nvidia, adds that the problem with converters is also that you have to stay on top of the stacks all the time, so you, as a customer relying on these converters to do their job in future updates, are risking it and bringing the problem of technical support to keep these converters relevant.

It is also interesting to note that while other companies are forming alliances to build alternatives to parts of Nvidia's full stack, no alliance has yet been formed to compete with CUDA. Everyone is developing based on their own stack. AMD has ROCm, Google's TPU runs on XLA (programmed by TensorFlow and JAX), Amazon Trainium has Neuron SDK, Meta MTIA has PyTorch/XLA backend, etc. Therefore, there is no broad coalition that would rally around a single stack to compete with CUDA.

AMD's decision to invest more heavily in ROCm is the right move, especially when its full rack solution, the MI400X, is released, as it is intended to be a competitive product for training workloads; there, the software optimization stack is crucial. So far, they have made good progress, and clients like the hyperscalers, with sufficient motivation, can start developing on ROCm. Still, the CUDA moat appears to be intact for now and remains a significant driver of Nvidia's adoption.

Networking

The next big moat of Nvidia is networking. With networking, we are referring to two segments: scale-up networking and scale-out networking. Scale-up networking enables GPUs in a single »box« to be connected to each other to form a single GPU server/node and make it as powerful as possible. Scale-out networking then enables these GPU nodes to connect to other GPU nodes and, together, form a large GPU cluster. For scale-up, Nvidia uses their proprietary NVLink and NVswitches, and for scale out, they use either InfiniBand (which they got from their Mellanox acquisition) or Ethernet (uses RDMA over Converged Ethernet) as a secondary »good enough« option.

To combat Nvidia's strong grip on the networking market for scale-up, a consortium of companies formed an alternative, known as the UALink consortium. The consortium consists of companies such as AMD, Amazon (AWS), Google, Intel, Meta, Microsoft, Cisco, Apple, Astera Labs, and many others. The goal is to establish an open standard for networking and an alternative to Nvidia, and make it so that you successfully connect GPUs and custom ASICs inside »on box«. While the consortium is relatively new, it is important for AMD, as one of its most significant disadvantages compared to Nvidia is networking. And networking matters not only for training AI workloads but also for inference. As inference becomes more complex with reasoning models, having good scale-up and scale-out is key. AMD has learned from its mistake on the MI300X in deploying an Infinity Fabric that was much worse than NVLink. They have also recognized that they lack the hardware talent to execute on an NVSwitch equivalent, and at the same time, to solve this challenge, they want to support every alternative available. That is why they have flexible I/O lanes. These flexible I/O lanes enable AMD to support different standards (Infinity Fabric, PCIe, UAlink, etc.). It is clear that AMD desperately wants a performant alternative to NVLink.

While the UALink consortium is still young, it has already had a big setback. At first, Broadcom was one of the key companies involved, but later backed off because they decided to develop their own proprietary alternative called SUE. This was a significant setback, as AMD now has to rely on Astera Labs and Marvell to produce the UALink Switches, which won't be ready until 2027. That is why we can see that while AMD's MI400x has UALink Serdes, it is not a complete UALink scale-up network; instead, AMD had to go with Broadcom's Tomahawk 6 Ethernet switches, and that is why it's named as »UALink over Ethernet«. Despite this setback on the scale-up side, based on AMD's specifications, MI400x looks to be very competitive.

Nvidia is not just watching this development, though, as one month after UALink 1.0 was announced (April 8th), they announced NVLink Fusion, which, on paper, opens up the NVLink ecosystem. This is a big step for Nvidia, as a Former high-ranking Nvidia employee explained how challenging it was to implement this step internally, as Meta wanted to use NVLinks for their MTIA back when he was working there, and the answer from Nvidia was a firm »NO«.

However, there is a catch with NVLink Fusion. The former Nvidia employee mentions that Nvidia will not provide public specs; instead, they only provide the soft IP to specific vendors, but the specs remain proprietary. To implement NVLink Fusion in your accelerator, such as a TPU or MTIA, you still need to integrate both NVLink and Nvidia's C2C (chip-to-chip). Part of it remains proprietary to Nvidia as the NVLink IP communicates with the chip in a proprietary manner. With it, Nvidia forces you to use their C2C. Clients are now realizing this, as the former Nvidia employee mentions that they are concerned this will further entrench them in the Nvidia ecosystem, even with their custom ASICs, so UALink remains the alternative.

A key point for both Nvidia and UALink is the role of Astera Labs now that Broadcom has taken its own route. The consortium now depends on Astera Labs to provide the switches. At the same time, I don't view it as a coincidence that Nvidia, in its NVLink Fusion announcements, listed the following companies as the first partners to adopt the technology: MediaTek, Marvell, Alchip Technologies, Astera Labs, Synopsys, and Cadence. Nvidia knows that Astera Labs is now the key piece in the consortium and might be motivated to give them more orders of NVLink Fusion, where they limit their capabilities to serve the UALink consortium. Time will tell.

On the scale-out part, the alternative to Nvidia's InfiniBand is Ethernet with RDMA (RoCE). Nvidia also supports this alternative, but as a secondary, less performant option to their proprietary InfiniBand solution. Nvidia even has a Spectrum X Ethernet platform, as they have Spectrum switches from their Mellanox acquisition. In addition to Ethernet with RDMA, a consortium known as the Ultra Ethernet Consortium (UEC) has also been formed. It was formed by companies like AMD, Broadcom, Arista Networks, Cisco, Intel, Meta, Microsoft, Oracle, and many others. The goal again is to make an extension to Ethernet and reduce its weaknesses vs InfiniBand. Many hyperscalers also support Ethernet because it is cost-effective, already widely deployed in data centers, and has multiple vendors (Broadcom, Cisco, Arista, Marvell). Ethernet with RDMA has gained significant traction, as both hyperscalers and companies like Meta are willing to adopt it to reduce Nvidia's grip. While Ethernet networking still lags behind InfiniBand, many industry insiders agree that the performance gap has narrowed significantly in recent years. Seminanalysis confirmed this recently in one of their reports, saying that »Even Nvidia recognizes the dominance of Ethernet and with the Blackwell generation, Spectrum-X Ethernet is out shipping their Quantum InfiniBand by a large amount«.

To summarize, when it comes to networking, so far, the scale-out consortium UEC appears to be progressing well. For AMD, the key thing is for UEC to continue its progress, while on the scale-up, there are more challenges. If UALink doesn't ship and start progressing faster, they still have a solution with Broadcom with SUE, but again, that will mean Broadcom will be the only viable alternative out there for scale-up, which gives them a lot of power.

HBM design a possible next battlefield for the »full stack« offering of Nvidia/AMD?

While we did cover the two most critical layers with CUDA and networking, there appears to be one more that is just starting to form, and it is HBM. HBM is one of the key pieces when it comes to AI accelerators. Its importance is only growing with bigger and more complex models. SK Hynix and Micron primarily supply HBM3, although Samsung is expected to finalize its certification process and join them as well. A key change is coming soon as we transition to HBM4 memory. Here, the base die will move to a modern logic process, which means SK Hynix and Micron can't manufacture this internally but must outsource it to TSMC. Memory providers will also have to partner with logic design companies or IP vendors to help with these designs. This opens up a window where custom HBM implementations will happen. What that means is that both Nvidia and AMD will release their custom HBM4 implementations. This process again opens up a door where one company can achieve competitive advantages as the process becomes more complex and customizable. For AMD and Nvidia, this is a key step where again they have to be on top of it, avoid missing out, and remain competitive. This also means that the life for custom ASICs is becoming even harder and more complex, as they will have to handle that part as well. The biggest chances are that they will do so by partnering with one of the existing memory providers and choosing some default plans.

Subscribe now

AMD's chances to compete with Nvidia?

First, let's start with some of the problems that AMD has compared to Nvidia:

AMD MI350X series shipments have just started, so those haven't shown up significantly in their Q2 numbers yet. Most of the »AI GPU sales« in the Q2 report were attributed to their MI300 or MI325X, which was a product that wasn't launched at the right time, as AMD wanted to ship it in Q3 2024 the same quarter as Nvidia H200 started shipping, but due to delays, they had to start shipping it in volumes only in Q2 2025. However, this window indicates that the MI325X was competing with Nvidia's B200 orders, which is a significantly better chip than the H200.
AMD doesn't have a strong presence with neocloud players, as even though their GPUs are cheaper than Nvidia's, the costs of renting AMD GPUs for clients on the cloud are higher than that of Nvidia because there are not enough of AMD instances out there, which results in those who are being priced higher than Nvidia GPU instances.
ROCm is a far less developed software layer and has a much smaller ecosystem compared to Nvidia. Until recently, it also had many stability issues.
Networking is/was also a problem for AMD, as we already discussed; they are now addressing those challenges in new generations of their chips with consortium and outside partners.
AMD doesn't have a full rack-scale solution out yet. MI400X is expected to be the first one

Now, some of the positive developments for AMD:

The latest version of ROCm has become much more stable, and AMD is investing heavily in its ROCm stack.
AMD addresses its lack of presence with Neoclouds, now trying to mimic Nvidia's approach of selling its GPUs and then renting them back from Neoclouds, making more capacity available, and with it helping to reduce the prices for their instances for customers.
Most of the big Nvidia clients are looking for an alternative to Nvidia as they all »feel« the Nvidia grip. Therefore, the motivation to try other options, especially those from AMD, is high, despite some of the weaknesses that AMD has compared to Nvidia.
Nvidia's launch of its Project Lepton, which is a compute orchestration platform between cloud providers, is angering a lot of neoclouds as well as hyperscalers, as it is an effort to commoditize the cloud industry. Jim, the Former Nvidia director, in our talk mentioned that for hyperscalers, the moat/lock-in is still the data and the software apps and tools that you, as a customer, use and have on these hyperscalers, but for neoclouds, that argument doesn't hold, as they are more just a compute provider. With Neoclouds angered, the chances of them doing deals with AMD have increased.
Inference workloads are becoming more and more critical. Inference is where the chances of AMD's success are higher.
The recent surge in inference will probably cause Nvidia's chips, especially the GB200, to be sold out soon. For everyone besides Google (that has mature TPUs), AMD is the best second option. Furthermore, the limited leap forward of GPT5 suggests that GB200 NVL72 clusters will be required for the next significant model advancements in training (GPT5 was trained on H100 and H200), which will lead to an additional shortage of GB200 and GB300.
The MI350 chip and MI400X look promising on a few levels

MI350 series and MI400x

With its new MI350X series, AMD is releasing two versions: the MI350X and the MI355X. The MI350X is a 1,000W air-cooled version, while the MI355X supports both air and liquid cooling but is a 1,400W version. On paper, the MI355x is 10% faster, but it consumes 1.4x more power (in practice, we will see the real performance, though).

I actually think the » air-cooled « market segment is something where AMD can get a lot of traction before its MI400X release, as many customers are very limited on liquid-cooled data centers and are increasingly adopting NVDA's GB200 NVL72 in them (which require liquid cooling). Numerous air-cooled data centers could be harnessed with the MI350 series, competing directly with NVDA's HGX B200 NVL8 and HGX B300 NVL8. That probably means the MI350 series will be used most for inference by smaller LLMs and less so by frontier reasoning ones, where GB200 NVL72 will dominate.

The MI355 appears to offer a significant total cost of ownership (TCO) advantage over Nvidia’s B200/ B300 series. Its all-in capital cost per chip is 45% lower than that of Nvidia’s B200 HGX, primarily due to significantly lower chip pricing and lower transceiver and switch pricing resulting from not relying solely on Nvidia-supplied components.

So far, the interest from clients, from what I've heard, is quite strong and AMD’s CEO mentioned that on their last earnings call. Also worth noting is that OpenAI's CEO was on stage at AMD's event. Oracle has already announced that it will deploy 30,000 MI355Xs, and AWS was a sponsor at AMD's AI event. We also know already that Meta is internally using some AMD for their inference cases. Lisa Su even said on their earnings call that they are ramping production faster than expected for the MI350 series.

All in all, the MI350 and MI355, in my view, are the products that will gain the most traction in AI workloads for AMD, especially as mentioned earlier, as a great way to use air-cooled data centers. It is still not a rack-scale solution, so for more complex reasoning inference, Nvidia's GB200 NVL72 will still be the dominant choice.

Turning now to MI400x. Based on the specs, the MI400x should be very competitive with Nvidia's Rubin, which is expected to be released at a similar time. However, considering TCOs and power consumption, it will be competitive but won't be significantly cheaper than Nvidia's Rubin, which is surprising. The power consumption might be an issue for AMD, as most customers are seeking energy-efficient solutions, as they are limited by power availability. The cost of power itself is not a problem, but building the infrastructure to support it is. Then, when MI400X comes out, the usage of power might be one of the most critical metrics. The positive aspect of MI400X is that it is the first true rack-scale solution for AMD, and clients will have a serious alternative to Nvidia at that point.

Nvidia is still king even when it comes to inference

Inference is becoming one of the most important workloads, as it is far larger than the training workloads that have dominated thus far. As Jim, the Former Director at Nvidia, said, it was once thought that training would be done on a massive cluster and that inference would be performed on a number of smaller devices. The old thinking was that inference would be 10x the size of training, but with reasoning models this has changed to an order of magnitude bigger difference (some even say 1:100).

With reasoning models, inference underwent significant changes. Jim provided an example of comparing two models: a traditional one and a reasoning one, both for the same task (optimizing seating for a wedding). The reasoning model provided a better answer, but here is the catch: the traditional model generated 449 tokens to produce the answer, while the reasoning model generated 8,595 tokens (20x more tokens). The reasoning model also hit the cluster 150x more than the traditional model. This shows that, even for inference when it comes to reasoning models, we will now need big clusters, where Nvidia shines.

The catch is that reasoning models, especially now with GPT-5 and its routing feature, will be used even more, as most ChatGPT free users haven't used them until now. To add to that, AI agents only add to more reasoning model use-cases, even for inference, when it comes to reasoning models,

Altman even shared some data to back that up:

Within just a few days of the GPT-5 introduction, the usage of reasoning models increased by more than 7x for free users and 3x for paid users. This, of course, is causing a rapid surge in the need for compute, especially cluster-like compute, such as Nvidia's GB200 NVL72. So Altman's comments later are not a surprise, as he said that they are out of GPUs, had a big GPU crunch, and that you should expect OpenAI to spend trillions of dollars on data center construction. Just a few days ago, OpenAI's CFO appeared on CNBC and stated that we are in the very early stages of the data center and compute buildout, comparing it to the construction of railroads and energy infrastructure, and discussed the internet as a smaller-scale buildout.

While there will still be on-edge inference use-cases in robotics and other forms, when it comes to data center and running front-tier models, which, as of right now, look as though they will be very dominated by reasoning models, you will need a cluster-like system to handle the inference. Nvidia's GB200 NVL72, with its 72 GPUs, is the ideal system for running that inference right now.

SemiAnalysis in one of their reports tested inference of Llama 70B (so not a small model), and the results of GB200 with TensorRT LLM blow every other GPU out of the park in terms of tokens generated:

source: SemiAnalysis

This again positions Nvidia very well, as they are leaders in both scale-up and scale-out networking.

There is room for AMD to get some inference workloads from smaller models or ex-reasoning models, at least until their MI400X offerings are available and the networking issues are fully resolved.

Why would clients upgrade their old Nvidia cluster for new ones?

One key element that I came across via research answered my question of why Nvidia clients would upgrade their older versions of Nvidia servers to newer ones. The fact that CUDA improves the performance of older hardware over time adds to the argument that you might be well off with the older versions. While an obvious answer would be getting better performances from new servers, that answer is only partial. The real motivation for this is the limited amount of energy that clients have available for data centers, and here we reach a key point. Jim, the former Nvidia director, explained it well during our discussion:

»When you have a data center or build a data center, you work with a power company, you have a power footprint, that power footprint is limited, you can't just call and say Hey, I need 10% more. If we compare current generations and future generations of Nvidia servers limited to 100MW of data center power, how many tokens per second can I generate? H100s limited to 100MW of data center power, generate 300M tokens/seconds, GB200s NVL72 with the same 100MW limitation generate 12B tokens/second. That is the staggering change of moving forward.«

Jim is not the only one thinking about energy efficiencies, driving upgrades. ARM's CEO recently said the following: »The Grace Blackwell platform is 25x more efficient on power than the Hopper with x86.«

So, the real main selling point of Nvidia is not necessarily more performance; it comes down to less energy consumption, which in turn means more tokens per second. There is a reason why Nvidia introduced GB300 NVL72 on social media as: »Our latest platform, the Nvidia GB300 NVL72, reduces peak grid demand by up to 30% - smoothing energy spikes, optimizing power use, boosting compute density and cutting operating costs.«

Overall, the leaps of performance and energy consumption that Nvidia has made with its Blackwell infrastructure are, in my view, enough to entice its customer base to upgrade. For training GB200 NVL72 with continuous CUDA improvements, it has just hit thresholds where, even from a cost perspective, it makes sense for clients to upgrade, and now even on complex inference like reasoning models, which are on the rise, GB200 NVL72 is the best inference solution out there. The fact that we also see limited leaps in performance progress from GPT5, which was trained on H100 and H200, suggests that AI research labs will need to leverage the GB200 NVL72 to create the next frontier models with better performance.

I expect Nvidia's Blackwell servers to be sold out for the foreseeable future.

Risks to the demand story for Nvidia and AMD

When it comes to the risks to AI data center demand for both Nvidia and AMD, there are three key risks that I look at:

AI CapEx runs out of Capital
Scaling laws ceiling & New significant architectural change in inferencing AI models
Data center limitations and energy equipment shortage

If we take a deeper look at that, first starting with AI CapEx running out of capital. First of all, the demand for data center GPUs is real and doesn't appear to be slowing down anytime soon. An OpenAI employee recently shared that OpenAI has 15x its compute from 2024 to the present. We have already mentioned the recent statements from OpenAI's CEO & CFO about experiencing a significant GPU crunch and being out of GPUs. However, it's not just OpenAI; Anthropic is also experiencing a surge in demand for coding AI assistants. Additionally, Meta signed a $10 billion cloud compute deal with Google Cloud just yesterday, despite investing over $70 billion in data center capital expenditures. From the earnings of all the hyperscalers, including Google, Amazon, Microsoft, and Oracle, we heard the same thing: demand is outstripping supply; we need more data centers and more compute.

Despite that, we have to acknowledge the concentration of clients for Nvidia and AMD. Dylan Patel recently mentioned on a podcast that OpenAI and Anthropic account for around 30% of Nvidia's chip demand. The other 1/3 is going to ad-based companies like Meta, ByteDance & the rest is non-economic providers.

We also can't go past the fact that the concentration of customers for Nvidia will continue to be high, as many enterprises that have on-prem infrastructure are starting to figure out that the compute they need is really expensive to buy and secondly more important that their data centers are not ready to be AI data centers primarily as they are not liquid cooled data centers. As we discussed today, the new Nvidia servers are liquid-cooled, and most data centers outside of the hyperscalers do not have liquid-cooled facilities. This will accelerate the process of migrating from on-prem to cloud for even those enterprises that have been hesitant to do so for many years. The complexity of running an AI data center with a severe thermal load is also a magnitude harder than running a traditional data center.

As cloud providers report high demand spikes, a significant portion of that comes from two companies: OpenAI and Anthropic. As long as these two companies, and I would add xAI at this point, can continue to raise new rounds and gather capital, the demand from them will continue to rise. Just yesterday, Anthropic was planning to raise $5 billion at a $170 billion valuation. Due to high investor interest, they decided to increase the raise to $10 billion. From this move, you can see that all of that cash is going towards compute as they will need a lot more to continue growing on this trajectory. OpenAI is not only looking to raise the next rounds at +$500B valuations, but it is also exploring debt instruments to gather even more capital. The fact is that while these companies have over $10 billion in revenues, they still require significantly more compute to develop the next frontier model, as well as to serve inference for the usage spikes they are experiencing (OpenAI now has over 700 million monthly active users). As long as we see these companies continue to gather capital without problems, the demand for Nvidia and AMD will be there.

The other part of the demand comes from Meta and other Big Tech companies that have core businesses that generate a ton of FCF. Here, the trajectory of that CapEx growth will start to hit some limitations. We are now in the $70-$90B range, given how core businesses are growing. I think we may have a 30-40% growth possible before we start to see these companies come under pressure from their shareholders for the massive investments they are making.

Now the third part of demand that is really starting to take off and is a big question mark, which we might get some clarity on the size of it, is Sovereign. These are countries or companies connected to countries that are building their own data centers as they understand the importance of AI, for security and independence. This demand is coming from the Middle East countries, Europe, and many others. We are still early with Sovereigns, so that might become a significant driver of AI compute demand in the coming quarters and years.

Moving on to the second risk, which is »Scaling laws ceiling & New significant architectural change in inferencing AI models«. Regarding scaling laws hitting a wall, this is a topic on which a lot has already been written, so I don't want to delve too deeply into the details. The key thing to follow is the progress of the new frontier model. GPT5's less impressive performance is something to keep an eye on, especially as we see what comes from Google's Gemini, xAI's Grok, and others. So far, I don't think we are at that wall yet, as OpenAI's problems seem to be siloed to them. With their rapid user growth, it appears that they had to make a hard compromise: give compute to inference users or give compute to training and limit current usage. Because GPT5 didn't really get the amount of compute that it could get. OpenAI's success in terms of usage of their products is now hindering its new model development, as there simply isn't enough compute available to serve both. It is also important to note that we already mentioned that GPT5 was trained on H100 and H200, not GB200. Nonetheless, if we see small model progresses from the other frontier labs in the next few months, that is a sign to watch out for.

The other risk that I mentioned was a significant architectural change in inference AI models. Here, I am not talking about a DeepSeek-type change, where we obtain more efficient models that still require GPUs. This is positive for Nvidia/AMD, as more efficient models unlock additional usage, as we have seen with DeepSeek. The change that would be concerning for these companies is if we would get tech breakthroughs that would allow CPUs to effectively inference even the frontier reasoning models. So far there is nothing to point to that.

The last risks on my list are Data center limitations and energy equipment shortage. This is a risk that will slow down potential growth but not stop it, as we can already see that providers like Nvidia understand the importance of power consumption on these AI servers, with GB300 and many others. Given the constraints of their customers, Nvidia and AMD might find a niche, as we previously discussed, offering significant performance per watt improvements every year, which could spark a major upgrade cycle.

With all of these mentioned risks, we still need to acknowledge that this article focuses on AI data center compute; we haven't covered edge compute, robotics, AVs, which are new emerging markets.

Summary

Nvidia continues to be the king when it comes to training as well as inference. For inference, GB200 NVL72 is the superior solution, especially when it comes to big reasoning models. Despite that, AMD has a chance to find its spot in the sun. AMD ROCm is progressing well in becoming more stable, and most Nvidia customers are looking for a serious alternative, especially with Nvidia's Lepton project. As the complexity of the full-stack AI server increases with HBM4 and other components, ASIC alternatives will become even more challenging to manage. Therefore, leaning on AMD as the second choice is a natural one. AMD's air-cooled MI350X series also appears to be a promising product that could be widely utilized, as numerous air-cooled data centers are waiting to be leveraged more effectively for AI workloads. AMD is also finally establishing better relationships, and the sell-and-lease-back model with Neoclouds will make its cloud instances more affordable, which should attract more usage.

There are no signs of demand for GPUs cooling down anytime soon; on the contrary, it is accelerating, with inference now becoming a critical factor. Nvidia reports its earnings in a few days. I expect strong numbers, as GB200 represents a significant leap forward, especially in inference, compared to H100 and H200. This is the perfect timing, as inference is gaining momentum. The amount of Blackwells sold will, in my view, surprise even the most bullish analysts in the next few months.

Subscribe to Paid

Thank you!

Disclaimer:

I own Meta (META), Google (GOOGL), Amazon (AMZN), Microsoft (MSFT), Nvidia (NVDA), AMD (AMD), and TSMC (TSM) stock.

Nothing contained in this website and newsletter should be understood as investment or financial advice. All investment strategies and investments involve the risk of loss. Past performance does not guarantee future results. Everything written and expressed in this newsletter is only the writer's opinion and should not be considered investment advice. Before investing in anything, know your risk profile and if needed, consult a professional. Nothing on this site should ever be considered advice, research, or an invitation to buy or sell any securities.

Meta’s AI Ambitions: What’s Really Going On

UncoverAlpha — Thu, 31 Jul 2025 12:49:48 GMT

Hi everyone,

Meta just reported its earnings yesterday, showing why AI is the company's core focus. For this article, I focus on two areas: first, why Meta is building its superintelligence group and who these individuals are, and second, what AI does for Meta in terms of current and future revenue and profit opportunities.

Let's dive into it.

Meta's Super Intelligence unit

As most of you have likely heard from various news reports, Meta has been engaged in a massive AI talent poaching spree over the last two months. Many reported that Meta is offering massive compensation packages of over $100M to these talents for them to join Meta's new Superintelligence unit. The reason, bluntly put, is that in the last few months, Meta has fallen behind the frontier AI labs in terms of their AI models and wants to course-correct significantly. The primary reason for falling behind is due to suboptimal training decisions, inadequate post-training strategies, and ineffective leadership. Semianalysis already mentioned some of those mistakes:

Chunked attention – a long-context attention mechanism, Meta chose that introduced blind spots.
Expert choice routing – a Mixture-of-Experts (MoE) training strategy that was altered mid-run.
Pretraining data quality issues – problems with the scale and cleanliness of the training data.
Scaling strategy and coordination – disorganized research experiments and poor leadership decisions.
Underdeveloped Internal evaluation frameworks

Due to these mistakes and the importance of AI in Mark Zuckerberg's eyes, Meta made a significant course correction. On the 12th of June, they announced that they will invest $14.3B in Scale AI for a 49% equity stake. More importantly, Scale AI's CEO, Alexandr Wang, will join Meta in leading its superintelligence unit; they have also taken on some other talented people from Scale AI. Scale AI is an important company in the AI model development ecosystem. They are a data labeling company. Scale AI's clients were both Google and OpenAI. Following the acquisition, Google has reportedly ceased working with Scale AI, and OpenAI is likely also exploring alternative options. But without going into too detail about all of these, the ScaleAI aquihire started all of this, and today, here is the list of the employees and their knowledge from OpenAI, Apple, Google, Anthropic that Meta poached in the 2 month time frame:

Alexandr Wang (Scale AI) - high-profile Silicon Valley entrepreneur, had founded Scale AI in 2016 (after dropping out of MIT) and built it into a premier platform for training data – supplying labeled datasets to firms like OpenAI’s ChatGPT. Unlike many academic AI researchers, Wang is known as an adept business leader who scaled AI services and even advised U.S. policymakers on AI.
Nat Friedman - former GitHub CEO.
Lucas Beyer, Alexander Kolesnikov, Xiaohua Zhai (OpenAI Zurich). All three are renowned specialists in computer vision and large-scale machine learning – in fact, during their prior tenure at Google Brain, they co-authored influential research (such as pioneering work on the Vision Transformer model in 2020). At OpenAI, they helped stand up the Swiss branch focused on cutting-edge model training. All three are considered high-profile in the AI research community for their work advancing computer vision and model scaling techniques.
Shengjia Zhao (OpenAI) – A co-creator of ChatGPT and GPT-4, Zhao had led OpenAI’s synthetic data generation efforts (crucial for training robust models). His work underpinned ChatGPT’s development, making him a high-profile hire. In late July, Zuckerberg announced Zhao would become Chief Scientist of Meta’s Superintelligence Labs.
Jiahui Yu (OpenAI) – An expert in multimodal AI, Yu co-created several of OpenAI’s scaled-down “GPT-4 mini” models and GPT-4.1. He previously led OpenAI’s perception team (working on vision capabilities for models) and later co-led multimodal model development for Google DeepMind’s Gemini project. Yu brings experience in bridging image and language AI.
Shuchao Bi (OpenAI) – A leading engineer in AI voice and multimodal training at OpenAI, Bi was the co-creator of GPT-4’s voice mode and of the smaller “GPT-4o” and “o4-mini” models. He headed OpenAI’s multimodal post-training team, fine-tuning models to handle inputs like speech and images.
Hongyu Ren (OpenAI) – An OpenAI research lead who co-developed the “O-series” internal models (such as GPT-4o and various “mini” GPT-4 prototypes). Ren led a group focused on post-training optimization at OpenAI, refining model reasoning.
Trapit Bansal (OpenAI). Bansal is known for pioneering “RL-on-chain-of-thought”, a technique combining reinforcement learning with chain-of-thought prompting to improve reasoning in AI. He was a co-creator of OpenAI’s “O-series” reasoning models (internal experimental models that contributed to GPT-4’’s development).
Jack Rae (Google DeepMind). Rae was the pre-training tech lead for Google’s upcoming Gemini AI model and led the reasoning efforts for DeepMind’s Gemini 2.5 project. He had also spearheaded earlier LLM projects at DeepMind, serving as a lead on the Gopher and Chinchilla language models that helped establish scaling laws for AI.
Johan Schalkwyk (Google) – A veteran Googler, Schalkwyk is a former Google Fellow (a title for Google’s top engineers) who was an early contributor to Google’s voice-AI initiatives (codenamed “Sesame”) and the technical lead for a conversational AI project known internally as “Maya”. He brings deep expertise in speech recognition and voice-driven AI.
Huiwen Chang (Google Research & OpenAI). Chang is a researcher known for innovations in image generation AI. At Google Research, she invented MaskIT and Muse, two novel text-to-image generation architectures. She also co-developed GPT-4’’s image generation component during a stint collaborating with OpenAI.
Pei Sun (Google DeepMind/Waymo). Pei Sun joins from Google’s ranks as well; he worked on DeepMind’s Gemini project focusing on post-training, coding, and reasoning modules. Earlier, Pei Sun was at Alphabet’s Waymo, where he created the last two generations of Waymo’s perception models for self-driving cars.
Joel Pobar (Anthropic). Pobar is an engineering veteran who moved from OpenAI’s rival Anthropic. At Anthropic, he worked on model inference (optimizing how AI models run.
Ruoming Pang (Apple). Apple’s top executive for AI models. Pang had been the head of Apple’s Foundation Models team, meaning he oversaw development of Apple’s large-scale AI systems that power features like on-device Siri intelligence and multimodal capabilities. A distinguished engineer, Pang was responsible for Apple’s most advanced AI initiatives (and had previously been at Google before Apple hired him in 2021). His defection to Meta came with a massive compensation package – reportedly on the order of $200 million in total value.
Mark Lee and Tom Gunter (Apple), both senior Apple researchers in the foundation models team. At Apple, Lee and Gunter worked under Pang on developing advanced AI features – likely including multimodal AI systems and on-device machine learning capabilities for future Apple products.
Bowen Zhang (Apple). Zhang was instrumental in Apple’s internal efforts to build multimodal AI systems (combining text, vision, etc.) and was part of Pang’s group.

I probably missed some, but these give you a good sense of these people as they will be key for Meta's AI success going forward, as Zuckerberg sees a small team as the vision for going forward with AI:

»In terms of the shape of the effort overall, I guess I've just gotten a little bit more convinced around the ability for small talent-dense teams to be the optimal configuration for driving frontier research. And it's a bit of a different setup than we have on our other world-class machine learning system.So if you look at like what we do in Instagram or Facebook or our ad system, we can very productively have many hundreds or thousands of people basically working on improving those systems, and we have very well-developed systems for kind of individuals to run tests and be able to test a bunch of different things. You don't need every researcher there to have the whole system in their head. But I think for this -- for the leading research on superintelligence, you really want the smallest group that can hold the whole thing in their head, which drives, I think, some of the physics around the team size and how -- and the dynamics around how that works.«

Interesting the CEO of Nvidia was recently asked about the strategy of small teams on the All in Pod and actually agreed with it saying:

»150 AI researchers with enough funding behind them can create an OpenAI«

So the strategy of gathering the top AI talent paying them billions is actually a very good strategy as the small team is an advantage when it comes to frontier AI model developement, this move by Meta not only improved their chances of sucess but also hindered a lot of their competitors as they lost some key AI talent (which gives Meta more time to come back to the frontier). It also helps Meta be faster and more flexible, even with it being a large tech company to compete with more flexible, smaller teams like OpenAI.

But why does Meta want to be the one developing AI so badly that they need this Superinteligence unit? Lets dive into the areas where AI is already helping Meta and how this is what are new opportuntiies for Meta with GenAI.

The effect of AI on Meta

Back in April last year, I published an article and this chart around Meta and AI:

And I have to admit, despite the fast-changing pace of AI, most of the things still stand true today.

But in this earnings call, Mark Zuckerberg framed it into 5 pillars:

1. Improved advertising

2. More engaging experiences

3. Business messaging

4. Meta AI

5. AI devices

6. Cost savings (added by me, not Zuck)

Let's dive into each of these

Subscribe now

Improved advertising

Meta in this quarter reported a 22% YoY growth in revenue, which was well above estimates. The company describes a lot of the reasoning behind this as being driven by improvements from advertising unlocked by AI:

»In advertising, the strong performance this quarter is largely thanks to AI unlocking greater efficiency and gains across our ad system. This quarter, we expanded our new AI-powered recommendation model for ads to new surfaces and improved its performance by using more signals and longer context. It's driven roughly 5% more ad conversions on Instagram and 3% on Facebook. «

Meta, given that it is not a high-intent platform like Google Search, should be one of the biggest beneficiaries of AI, as ads that are better targeted, as well as more creative, affect the impulsive buying decisions of people, which helps drive conversions. If advertisers see better conversions on their ads, they increase their ad budgets. A trend that is already happening is that Meta's Andromeda model architecture:

»Impression growth accelerated across all regions due primarily to engagement tailwinds on both Facebook and Instagram and to a lesser extent, ad load optimizations on Facebook. The average price per ad increased 9%, benefiting from increased advertiser demand, largely driven by improved ad performance.«

»The Andromeda model architecture we began introducing in the second half of 2024 powers the ads retrieval stage of our ad system, where we select the few thousand most relevant ads from tens of millions of potential candidates. In Q2, we made enhancements to Andromeda that enabled it to select more relevant and more personalized ads candidates while also expanding coverage to Facebook Reels. These improvements have driven nearly 4% higher conversions on Facebook Mobile Feed and Reels.«

Meta also has a new Generative Ads Recommendation system (GEM) that is showing results:

»Our new Generative Ads Recommendation system, or GEM, powers the ranking stage of our ad system, which is the part of the process after ads retrieval where we determine which ads to show someone from candidates suggested by our retrieval engine. In Q2, we improved the performance of GEM by further scaling our training capacity and adding organic and ads engagement data on Instagram. We also incorporated new advanced sequence modeling techniques that helped us double the length of event sequences we use, enabling our systems to consider a longer history of the content or ads that a person has engaged with in order to provide better ad selections. The combination of these improvements increased ad conversions by approximately 5% on Instagram and 3% on Facebook Feed and Reels in Q2.«

In addition, advertisers with ad creative tools can now

July monthly alternative data report: NVDA, AMD, ASICs, AWS, GCP, Azure

UncoverAlpha — Wed, 23 Jul 2025 11:50:53 GMT

Hey everyone,

Posting a monthly alternative data insights report. In this report, I cover two fields:

- The first one is the Nvidia, ASICs, and AMD battle, where I found numerous valuable insights that have shaped some of my future thinking.

- The second section of this report provides insights into the cloud positions of Amazon, Microsoft, and Google before their earnings announcements.

Let's start with the AI infrastructure Nvidia topic.

What is clear from the last few weeks is that the AI infrastructure spending is continuing to ramp up at a rapid pace. We had Meta's CEO, Mark Zuckerberg, talking about investing »hundreds of billions« into AI compute and having the most compute per researcher of any company. Then you had OpenAI's Sam Altman posting on X that by the end of the year, OpenAI will have a million GPUs running, and that the next step for their talents is to figure out how to 100x that. OpenAI also announced an expanded partnership with Oracle for its Stargate data center, where it will have 2 million GPUs. Just yesterday, Elon Musk, who runs xAI, posted about xAI's goal of having 50 million units of H100 equivalent to AI compute within five years. If we look at GPT-4, which was trained on 25,000 A100 Nvidia chips, roughly equivalent to 12,500 H100, the 50M mark indicates how the industry is approaching scale in the future.

Despite this topic being widely covered by many analysts, I have concluded that we are still underestimating the future compute needs of AI and, with it, the infrastructure spending. We are just starting to see the first uses of AI agents with OpenAI's new release, and the compute demands for these agents are significantly greater than those required for information retrieval. (today's version of Search or even LLM prompt).

With that in mind, the positioning of Nvidia versus AMD and custom ASICs developed by hyperscalers such as Amazon, Microsoft, Google, and Meta remains a crucial topic. The most interesting insights that I came across over the month are the following:

A high-ranking employee at Micron, speaking during an expert call, discussed GPU utilization rates. They have increased from the 40% range, where they were just 2-3 years ago, to now being at 70-80%. This is beneficial for both cloud providers and chip providers, as the ROI for cloud providers has improved, and according to him, the payback period for their investment has now improved to 6-9 months. With improved ROI, the cloud providers might feel more comfortable with future chip orders.

A consensus I read over the last month from the most valuable insights is that most agree that while AMD can close the gap to Nvidia in terms of hardware it still lacks in two key areas which is networking/interconnect (picture it as a way for GPUs to communicate with each other and with it forming a big GPU cluster) and on the software side for optimization, so AMD having ROCm and not CUDA.

On the networking and interconnect side, there is some hope with AMD. As a high-ranking former Nvidia employee mentions, the adoption of UALink would help AMD with its scale-out problem. The fact that both Apple and Amazon have joined the UALink consortium is a positive sign for its adoption, despite Broadcom's departure from it. In general, the consensus is also that most big clients want to have more options than just Nvidia.

»There is a strong desire to develop alternatives to Nvidia. There isn't a customer in the world that would not leap at a viable alternative to Nvidia.«

High-ranking Dell employee (source AlphaSense)

For AMD, on the inference workload side, there is a chance; however, most agree that it ultimately comes down to the stability of ROCm. In the past, I have already written about ROCm's problems, including its instability and the ROCm to CUDA converters not fully performing their intended function, where a significant portion of the conversion still requires intervention by kernel engineers, ultimately resulting in a more costly solution than simply purchasing Nvidia's stack. AMD has released an updated version of ROCm, and the first signals from customers look good:

»The word I'm getting now is that with the latest releases of ROCm, they are much more stable and they've worked through a lot of the bugs and stuff….«

High-ranking Dell employee

This was confirmed during a call with a former AI engineer at Nvidia, who mentioned that ROCm has become easier to work with compared to two years ago, but it still lags behind CUDA.

The consensus from these calls is also that ROCm, while it has become more stable, faces the biggest hurdle in the mindshare that CUDA has built over the years. The problem is that companies lack the confidence to build on ROCm because they are uncertain about the availability of skilled personnel with knowledge of ROCm. Nvidia's CUDA has been taught in universities for 15 years, so the entire talent pool and ecosystem have learned to build things on CUDA, which is a significant hurdle to overcome if you are a competitor to Nvidia. Even if you manage to do it, in the best-case scenario, it will probably take years.

Subscribe now

The best thing AMD has going currently is that there is high motivation for an alternative, and big clients like the hyperscalers are willing to try things out, as they don't want to end up being dependent on one company having a monopoly on intelligence – Nvidia.

Nvidia's moves with their own cloud and now with Project Lepton are also angering hyperscalers, according to a Dell employee, as Nvidia might divert a significant amount of traffic from the Lepton project to those cloud clients that they prefer more or in which they have an equity interest, such as Coreweave. Nvidia knows that right now, the way they distribute GPUs is key, as companies are willing to work with most cloud providers as long as they have GPU capacity. Nvidia wants to commoditize the cloud industry and shift more of the power to them.

Now turning to custom ASICs. An interesting consensus also among these interviews is that a lot of the experts, despite all that was said, still see ASICs taking a significant market share in the next 3-5 years:

»In 5 years GPU/ASIC in inference 50/50 split, in training 60/40«.

High-ranking Meta employee

A high-ranking Micron employee believes that by 2028, Nvidia's market share will be 50-60%, with the majority of the remaining pie going to hyperscaler custom ASICs.

The consensus is also that the most mature and capable ASIC out there right now is Google's TPUs, followed by Amazon's offering. Some experts even estimate that TPUs already power 30-50% of Google's inference needs.

In summary, based on the data and insights I have recently read, the consensus of industry experts is that AMD's ROCm has become more stable. AMD has started to move in the right direction, but this isn't stopping Nvidia, as its GPUs are still the most sought-after asset. In terms of ASICs, most expect them to take a significant market share in the future years, but this will take time, as apart from Google, many of the custom ASIC development efforts are still in early maturity stages.

If there are no significant architectural changes to the way AI models are trained and run inference on, then ASICs have a better chance of rising in adoption. However, if we start to see major changes, the flexibility of GPUs will make them the only viable option again.

There was an interesting comment from a Meta employee confirming this thinking:

»The world up until now has not had a very good handle on what type of compute was needed for any AI/ML use-case. They were changing so fast that it takes too long to build new silicon….We're finally getting into a space where we all in the industry have a much better idea of what we need to run. While people freak out a little bit because models are constantly changing, we got a much better idea of what data movement looks like in different types of AI models, and that leans us towards wanting something different than a traditional GPU.«

The hyperscalers and the cloud market check

First, an interesting data point is that, upon examining job postings, there is an uptick in jobs and companies seeking engineers who are familiar with GCP. A slight uptick is also seen with Azure, but not with AWS.

source: Revealera

Additionally, reviewing analysts' reports and channel checks, the data appears strong for the upcoming earnings reports of the hyperscalers. Here are some comments that recently stood out to me.

On Azure:

RBC Capital Markets:

»Azure core workloads remain stable, Al-related interest is building, and Copilot is starting to show up more frequently in renewal and expansion conversations. Most customers remain in pilot or department-level usage, with internal readiness still limiting broader adoption.«

»Azure demand remains steady; GPU constraints improved late in the quarter.«

»No signs of field-level OpenAl friction. Several noted that OpenAl integration is still a differentiator in enterprise bakeoffs, and Azure continues to win share in GenAl-heavy workloads.«

TD COWEN:

»Azure checks were strong, we're expecting capacity constraints to be easing, and our new bottoms-up model gives us confidence in Azure growth trending well above Street in the qtrs ahead.«

»Separately, 3rd party data we track points to 2Q growth for the Big 3 hyperscalers tracking above year-ago levels for the 2nd qtr in a row, alongside a return to Q/Q growth«

On AWS:

Bernstein:

»AWS is back in focus, and we expect to see top-line acceleration driven by improving performance in core and modest AI-related benefits as GPU supply constraints eased intraquarter.«

Morgan Stanley:

»Expect Larger Anthropic Growth Contribution Ahead: Notably, AWS's growth, excluding our estimated Anthropic inference revenue, has remained healthy (16%-19% over the past 5 quarters)...speaking to the durability of the business even through GPU-related supply constraints that are expected to ease in 2H.«

Truist Securities (on AWS):

»…and we believe an acceleration in Gen-AI related revenue could be on the horizon in the back half of the year«

The trend I am observing is that, due to its OpenAI partnership, Microsoft is gaining more market share than its competitors. However, at the same time, the need for GPUs has accelerated the adoption of hybrid cloud environments.

I found an interesting comment from a former Amazon employee on this trend:

»What we saw in 2022 is the average start-up had 1.7, 1.8 cloud, if you looked at the number of cloud providers connected per start-up customer in our start-up customer base. That accelerated to 3.5 clouds, 3.4 clouds entering 2024.«

Alphawise and Morgan Stanley Research confirm this trend, where Azure continues to dominate, although Amazon has made some progress, according to this data, in the last quarter:

Another insightful piece was reading the JPMorgan CIO survey.

The key takeaway is that companies are migrating more towards cloud environments, as AI workloads require specialized compute, power, and cooling, which companies often don't want or can't handle on-premises.

I also saw in various data reports, including this one, that current GPU availability is sufficient, which might indicate that, except for large clients like OpenAI and the AI research labs, other companies have enough compute power to run their current AI workloads. This might also point to still limited usage of AI in real-world cases (outside of the start-up ecosystem).

»CIOs appear to primarily agree that the public cloud providers currently offer sufficient GPU availability to meet their organizations’ AI needs (no GPU shortage among enterprises), and that they will primarily buy AI Agents from SaaS providers, much more so than they plan to build their own custom AI Agents.«

The surprise to me was that there wasn't much talk about Google's GCP from the analysts, which I think might surprise on the positive, with the Gemini models being one of the best performing ones recently, and with Google's internal use of TPUs, giving them the ability to offer more GPUs to their customers.

Subscribe to Paid

Thank you!

Disclaimer:

I own Meta (META), Google (GOOGL), Amazon (AMZN), Microsoft (MSFT), Nvidia (NVDA), AMD (AMD), and TSMC (TSM) stock.

Nothing contained in this website and newsletter should be understood as investment or financial advice. All investment strategies and investments involve the risk of loss. Past performance does not guarantee future results. Everything written and expressed in this newsletter is only the writer's opinion and should not be considered investment advice. Before investing in anything, know your risk profile and if needed, consult a professional. Nothing on this site should ever be considered advice, research, or an invitation to buy or sell any securities.