The $20 Billion Admission: Why NVIDIA Just Bought Into the ASIC Revolution with Groq
Hey everyone,
As the AI industry is never sleeping, yesterday we got news that Nvidia was »acquiring« (more acquihire) the chip ASIC startup Groq for around $20B. If you have been a reader of our publication for some time, you know we have mentioned Groq multiple times. A little more than a year ago, I also did an exclusive interview with my friend Groq’s General Manager, Sunny Madra, which you can go back and check out.
While many people are speculating on why Nvidia would essentially buy (license: the formal term used) a $20B ASIC startup, I wanted to add my thoughts to the mix, as I believe the Groq acquisition is highly strategic for Nvidia and sends an important signal to the market.
How is the Groq chip different than a GPU/TPU?
First, let’s dismiss the argument that Nvidia bought Groq because its CEO, Jonathan Ross, is one of Google’s TPU founders. Groq’s chip, also called the Language Processing Unit (LPU), is very different from a TPU or a GPU.
Let me quickly explain the GPU, the TPU, and the LPU in terms of how they differ:
The GPU
The GPU architecture was originally designed for graphics—calculating thousands of pixels at once. For AI, it treats a Large Language Model (LLM) as a massive parallel processing job.
The Bottleneck: GPUs rely on HBM (High Bandwidth Memory), which sits outside the processing core. Every time the GPU needs to calculate a word (token), it has to “fetch” the model weights from that external memory. This creates a “memory wall” where the processor is often sitting idle, waiting for data to arrive.
The Logic: It uses a “hub and spoke” model. It is incredibly versatile and can do everything from training to gaming, but it isn’t “perfectly” efficient for the specific sequential nature of generating text.
The TPU
You can read in detail my piece on Google TPU to get a detailed understanding, but to summarize the key points from this article, the TPU is an ASIC (Application-Specific Integrated Circuit) designed specifically for Tensor math (linear algebra). It uses a Systolic Array. Imagine a “heart” that pumps data through a grid of processors. Once a piece of data enters the grid, it is passed from one processor to the next without needing to go back to main memory.
The Logic: TPUs are much more efficient than GPUs for massive batches of data. This makes them very effective in Training and complex inference (similar to the GPU)—where you are feeding the machine billions of data points at once. However, for a single user asking a question (Inference), they often still face latency issues.
The Groq LPU
Groq’s LPU is a complete departure from the other two. It doesn’t use HBM (External Memory) at all. Instead, it uses SRAM (Static Random Access Memory), which is built directly into the silicon of the chip.
The biggest differentiation from that is the Speed. SRAM is up to 100x faster than the HBM found in GPUs. Because the data is right there on the chip, there is zero “fetch time.”
In a GPU, the hardware decides when to process data (probabilistic). In an LPU, the software/compiler decides exactly where every piece of data will be at every billionth of a second (deterministic). It’s like a perfectly timed assembly line where no one ever has to wait for a part. The unique part of the LPU is that Groq first designed an automated compiler, and only then designed the chip. The reason is that Jonathan, who worked at Google on the TPU, knew the software was the biggest pain and that the Groq startup couldn’t compete with 10k Nvidia software engineers who write low-level assembly routines (kernels) all day. Because of that automated compiler, you don’t write any manual kernel optimizations for LPUs, as every token’s path is predetermined.
So where does the LPU excel? LLMs generate text one word at a time. The LPU is designed to stream these words through its “conveyor belt” architecture, which is why you see Groq generating hundreds of tokens per second while GPUs struggle to hit 50.
But the LPU is not the »GPU killer« some might think.
The LPUs strength for some use-cases but a weakness for others is its tiny memory capacity. Even an Nvidia H200 GPU has 141GB of HBM3e memory. A single Groq LPU chip has only 230MB of SRAM. Because 230MB isn’t enough to hold even a small AI model, you have to link hundreds of LPU chips together just to run one model. For example, to run Llama-3 70B at full speed, you might need hundreds of LPUs (multiple server racks), whereas you can fit that same model onto just two or four Nvidia GPUs in a single small box. Because you need so many LPU chips to handle the memory requirements of modern models, the initial hardware investment can be big and the data center footprint much larger than the one with the GPU.
Because the LPU is also deterministic, as the software must map out every single calculation before it starts, it is more difficult to handle dynamic workloads or changing underlying architecture (from Transformer to something else).
But there is upside to the LPU. Even though a single Groq LPU system (a GroqRack) is more expensive to buy than a single Nvidia server, it can be significantly cheaper to run if you have high-volume traffic.
To get ultra-low latency on a GPU, you have to use a “Batch Size of 1” (meaning you process only one user’s request at a time). This makes the GPU incredibly expensive per token because most of its processing power is sitting idle while it waits for memory to move. But the LPU is designed for a Batch Size of 1. It achieves 300–500 tokens per second while keeping its internal “assembly line” nearly 100% full.
And then there is the very important energy aspect.
Because the LPU doesn’t have to power external HBM (High Bandwidth Memory), it is fundamentally more energy-efficient for the actual math it performs. Moving data from external HBM to a GPU core costs about 6 picojoules per bit. Retrieving it from Groq’s local SRAM costs only 0.3 picojoules per bit. On an architectural level, Groq is roughly 10x more energy-efficient per token than a GPU for inference.
But as we talked about before, the downside is that while LPUs are cheaper to run, you are paying more for floor space, networking cables, and physical maintenance. So why did Nvidia decide to buy Groq?
The Groq strategic play from Nvidia
There are five main reasons Nvidia bought Groq: Energy Bottleneck, HBM Bottleneck, CoWoS Bottleneck, liquid-cooled Data Center Bottleneck, and the competition aspect.
While we already discussed the energy benefits of LPU vs. GPU in the previous section, we are now in an age where energy is the limiting factor for Nvidia’s growth. Having a second option that is more energy-efficient, especially for simpler inference workloads, is important. To add context, Groq’s LPUs don’t require liquid cooling, which is an important aspect of the whole deal. In the world, there are far more air-cooled data centers than liquid-cooled ones. Nvidia’s latest Blackwell, as well as other future products, will be mostly liquid-cooled as they are meant for maximum performance. In the cloud industry, many air-cooled data centers that can’t be repurposed for liquid cooling are being left. In fact, in a recent interview with Groq CEO Ross, he mentioned that Groq has just landed a big European data center project where LPUs will be hosted, the data center was actually left vacant by a hyperscalers who didn’t want to extend the lease as it didn’t have the options to be a liquid-cooled DC.
While in an Nvidia perfect world, Nvidia would surely prefer that all DCs be liquid-cooled, the reality is different, as securing a reliable water source is often a problem and will take time. Nvidia’s reliance on liquid DCs could also lead to growth problems, as liquid cooling adds complexity that many DC operators struggle with (the latest CoreWeave delay is just one example). So Groq adds an air-cooled option for Nvidia to sell in the future and capture more short-term revenue. So the fact that Groq LPUs take more data center footprint is not a problem, as they can be used in air-cooled DCs that are not being utilized that much. In my view, Nvidia’s air-cooled option is also important, as many competitors, such as AWS’s Trainium, which is a strong alternative as I discussed in the last article, are air-cooled chips.
Moving to another key aspect of this deal: the HBM bottleneck. While HBM has been a bottleneck for some time now with Google TPUs, AMD MI400s, and AWS Tranium 3 and 4 starting to become more competitive and »eating« more and more HBM, the availability of HBM has become worse and worse. HBM for 2026 is sold out, and a real question is how long it will take for 2027 to sell out, too. The three players, SK Hynix, Samsung, and Micron, are also not eager to expand capacity too much in the future, as they know their industry is cyclical and has recently seen major overbuilds. Now that more chip design companies are competing strongly for HBM capacity, the negotiating power of Micron, SK Hynix, and Samsung will only increase. For Nvidia to secure a viable option for non-complex inference workloads like LPUs is a big positive, as they don’t use any HBM. Again, the play for Nvidia here is to continue its revenue growth and sales of compute units, without being 100% constrained by available HBM.
Another strategic advantage is that Groq’s chips perform well even when fabbed on older nodes. The reason for this is SRAM: since they don’t have external memory, they don’t need the densest transistors to achieve high speed. Groq’s latest generation of LPUs is, in fact, fabbed at a 14nm node at GlobalFoundries. While they are transitioning to newer nodes at Samsung, the fact that you can produce capable chips on an older node, not at TSMC, is another big advantage for someone like Nvidia, as it bypasses another bottleneck: TSMC and CoWoS. The chances of a Groq state-of-the-art chip being fabbed outside of TSMC are much higher than a B300 or Vera Rubin. So, again, with this move, Nvidia is opening a new avenue for growth that doesn’t face the same bottlenecks as Blackwell or Vera Rubin.
Now, to the last point: competition. Nvidia knows that if the HBM-energy-liquid cooling-CoWoS bottlenecks squeeze the market and cause a significant shortage of compute, customers and competitors will start looking for alternatives to bypass those bottlenecks, and a Groq with a supply chain not bottlenecked by the same factors is a prime candidate for that. Groq, going into this »acquisition« was growing fast, and more importantly, its capacity was growing fast.
Groq CEO 4 months ago:
»18 months ago, we had 1/10000 of the token capacity. Today we have about 20M tokens per second capacity a month and a half ago we had 10M«
So, rather than Meta or Microsoft buying Groq and opening an alternative path beyond the limited GPU path, Nvidia decided to pull the trigger itself.
What does this mean for Nvidia?
Did Nvidia acknowledge that GPUs are not the best hardware for every AI workload? Yes. At the same time, Nvidia is signaling that they expect their GPUs to be completely sold out for years and that they want to grow outside of their bottlenecks.
More inference revenue will also mean a different margin. Inference margins for Nvidia will not be as high as even the Groq CEO acknowledges this recently:
»Inference is going to be a high-volume, low-margin market. Nvidia is going to build every single GPU that they can physically manufacture this year, AMD is going to do the same thing; they are limited by the HBM, and they are going to sell every single GPU that they build. The thing is that it is not enough. On top of that, every time they sell for inference, when you are paying the 70-80% margin on a GPU, you have to charge that to your end users. Inference is a high-volume, low-margin business. Now, when we start deploying a large number of inference chips, Nvidia, AMD, they can sell their chips for training, which they are really good at, and they can keep that margin high as you can amortize that over 10-20x more compute that you are going to need for inference.«
What does this mean for you as an investor? In the next few days, I will publish my 2026 outlook and the most interesting names I am investing in or watching. Nvidia's Groq move definitely added a new subsector to my list, as a new supply chain is opening up. If you have not yet consider becoming a paid subscriber, as most of that list of names will be for paid subs only.
Until next time, happy holidays!
As always, I hope you found this article valuable. I would appreciate it if you could share it with people you know who might find it interesting.
Thank you!
Disclaimer:
I own Meta (META), Google (GOOGL), Amazon (AMZN), Microsoft (MSFT), TSMC (TSM), Intel (INTC) stock.
Nothing contained in this website and newsletter should be understood as investment or financial advice. All investment strategies and investments involve the risk of loss. Past performance does not guarantee future results. Everything written and expressed in this newsletter is only the writer’s opinion and should not be considered investment advice. Before investing in anything, know your risk profile and if needed, consult a professional. Nothing on this site should ever be considered advice, research, or an invitation to buy or sell any securities.


