At a Glance: On December 24, 2025, Nvidia finalized a $20 billion non-exclusive perpetual license for Groq's full LPU patent portfolio and software stack, the largest transaction in Nvidia's history. The deal transferred 80-90% of Groq's workforce, including founder Jonathan Ross and President Sunny Madra. At GTC 2026 in San Jose, Nvidia will unveil the LPX inference platform with 64-256 LPUs per rack. OpenAI has committed 3GW of dedicated inference capacity using the new system.
What Happened | The $20 Billion Christmas Eve Deal
Nvidia finalized a $20 billion non-exclusive perpetual license with Groq on December 24, 2025, gaining access to Groq's full patent portfolio and software stack for inference optimization. The agreement is structured as a license, not an acquisition, meaning Groq continues to operate independently under new leadership with its GroqCloud service.
The deal included transfer of physical assets and roughly 80 to 90 percent of Groq's workforce, including core engineering teams. Key personnel who moved to Nvidia include founder Jonathan Ross and President Sunny Madra. Groq had reached a $6.9 billion valuation after a $750 million funding round in September 2025, just three months before the deal closed.
| Deal Parameter | Detail |
|---|---|
Deal value | $20 billion (non-exclusive perpetual license) |
Closing date | December 24, 2025 |
Structure | Patent license + workforce transfer, not full acquisition |
Scope | Full Groq patent portfolio, software stack, physical assets |
Workforce transferred | 80-90% of Groq employees, including engineering core |
Key personnel | Jonathan Ross (founder), Sunny Madra (President) |
Groq post-deal | Independent operations continue under new leadership (GroqCloud) |
Prior Groq valuation | $6.9 billion (September 2025, $750M round) |
Nvidia deal rank | Largest transaction in company history |
What Is Groq's LPU | Why Nvidia Paid $20 Billion for It
Groq's Language Processing Unit (LPU) employs deterministic execution with large on-chip SRAM, hundreds of megabits per chip, to eliminate the bandwidth bottlenecks common in GPU-based inference. Unlike GPUs, which rely on high-bandwidth memory (HBM) that creates latency during sequential token generation, the LPU keeps all active data on-die.
In public demonstrations prior to the deal, Groq showed 10,000 thought tokens generated in approximately 2 seconds, a throughput rate that outpaced GPU-based inference by an order of magnitude for sequential decode workloads. The architecture is purpose-built for the kind of token-by-token generation that dominates deployed AI applications: chatbots, code assistants, search, and agentic AI systems requiring real-time responses.
| LPU Feature | GPU Comparison |
|---|---|
Execution model | Deterministic (predictable latency) vs Stochastic (variable) |
Primary memory | Large on-chip SRAM vs Off-chip HBM (HBM2e/HBM3) |
Memory bottleneck | Eliminated (on-die) vs Primary constraint (bandwidth-bound) |
Token generation speed | ~5,000 tokens/sec demonstrated vs ~200-800 tokens/sec typical |
Latency profile | Near-constant per token vs Variable, increases with context |
Optimal workload | Sequential inference (decode) vs Parallel compute (training + prefill) |
Power per token | Significantly lower vs Higher (GPU overhead) |
LPX Inference Platform | What Nvidia Will Unveil at GTC 2026
The new platform, referred to as LPX in industry analyses, builds on Groq's LPU for dedicated inference racks. Nvidia is expected to reveal the full specifications at its GTC developer conference in San Jose during March 2026.
| LPX Specification | Detail |
|---|---|
Base configuration | 64 LPUs per rack |
Packaging | 32 RealScale ASIC tiles (2 LPUs per tile) |
Scaled configuration | 256 LPUs per rack (4x base) |
Target workload | Low-latency decode, agentic AI, real-time inference |
Integration approach | Groq LPU silicon + Nvidia networking/software stack |
Customer anchor | OpenAI (3GW dedicated inference capacity committed) |
Market positioning | Complementary to Blackwell GPUs, not replacement |
The LPX platform is positioned as complementary to Nvidia's existing Blackwell B300 GPU racks , which dominate training workloads. Jensen Huang has described the strategy as offering “the right silicon for the right workload”: Blackwell for training and prefill, LPX for decode and real-time inference. This mirrors the approach Nvidia took after acquiring Mellanox in 2020 for networking, where the acquired technology became an accelerator within the broader Nvidia ecosystem rather than a standalone product line.
OpenAI's 3GW Commitment | Why It Matters
OpenAI has committed to 3 gigawatts of dedicated inference capacity using the LPX platform, positioning itself as the anchor customer. To contextualize that number: 3GW is roughly equivalent to the power output of three nuclear power plants or the total electricity consumption of a city of 2 million people.
Prior to the Nvidia-Groq deal, OpenAI had been exploring inference alternatives with both Cerebras and Groq directly. The deal effectively channeled those relationships through Nvidia, giving OpenAI a single vendor for both training (Blackwell) and inference (LPX) hardware.
| Metric | Context |
|---|---|
OpenAI inference commitment | 3GW dedicated capacity |
Power equivalence | ~3 nuclear power plants or ~2 million households |
Prior OpenAI inference partners | Cerebras (explored), Groq (explored), now Nvidia LPX |
Training hardware | Nvidia Blackwell B200/B300 (unchanged) |
Combined Nvidia relationship | Training + Inference from single vendor |
Inference cost driver | Each ChatGPT query costs ~10x a Google search in compute |
Deal Timeline | From Groq Funding to GTC Unveil
Groq raises $750M at $6.9B valuation
Series D round values Groq at $6.9 billion, validating LPU architecture and GroqCloud traction.
Nvidia-Groq negotiations accelerate
Nvidia approaches Groq for licensing deal as inference demand projections surge beyond GPU capacity.
$20 billion licensing deal finalized
Non-exclusive perpetual license signed. Jonathan Ross and Sunny Madra transfer to Nvidia. 80-90% of workforce follows.
Nvidia stock declines ~7% over two sessions
Despite record Q4 earnings, market reacts to $20B deal cost and inference market uncertainty.
OpenAI commits 3GW inference capacity
OpenAI signs as anchor customer for LPX platform, committing 3 gigawatts of dedicated inference power.
GTC 2026 in San Jose
Nvidia expected to unveil full LPX platform specifications, pricing, and deployment timeline at annual developer conference.
Nvidia's Stock Reaction | Why Markets Sold Off Despite Record Earnings
Nvidia's stock declined approximately 7 percent over two trading sessions following the deal announcement, despite the company reporting record quarterly earnings in the same period. The market reaction reflected several concerns.
| Concern | Market Interpretation |
|---|---|
$20B deal size | Largest in Nvidia history, questions about capital allocation discipline |
Non-exclusive license | Groq retains right to license LPU to other chipmakers, Nvidia does not get exclusivity |
Inference market uncertainty | Unproven at scale whether LPU outperforms next-gen GPUs long-term |
GroqCloud independence | Groq continues competing in cloud inference under new leadership |
Integration risk | Merging LPU architecture with Nvidia software stack (CUDA) is non-trivial |
Valuation multiple | $20B for a company valued at $6.9B three months earlier (2.9x premium) |
Competitive Landscape | Who Else Builds Inference Hardware
The Nvidia-Groq deal neutralized one of the most promising inference-focused competitors, but several others remain. The inference hardware market is fragmenting as the AI industry recognizes that training and inference require fundamentally different chip architectures.
| Company | Inference Approach |
|---|---|
Nvidia (LPX, post-Groq) | LPU-based deterministic inference racks, 64-256 LPUs per rack |
Nvidia (Blackwell GPUs) | GPU-based inference via H100/B200/B300, flexible but less efficient for decode |
Cerebras | Wafer-Scale Engine (WSE-3), 900,000 cores per chip, targeting both training and inference |
SambaNova | Reconfigurable dataflow architecture (SN40L), targets enterprise inference |
D-Matrix | Digital in-memory computing for inference, early-stage |
Qualcomm (Cloud AI 100) | ARM-based inference accelerator for edge and cloud |
Intel (Gaudi 3) | AI accelerator targeting price-competitive inference workloads |
AMD (MI300X) | 192GB HBM3 GPU, inference-capable but optimized for training |
Google TPU v5e | Custom inference-optimized TPU for internal Google workloads |
Training vs Inference | Why the AI Industry Is Splitting
The AI hardware market is undergoing a structural shift. For the first five years of the deep learning era (2018-2023), nearly all compute spending went to training: building models. But as models reach production scale, inference, running those models to answer queries, has become the dominant cost center.
| Factor | Training vs Inference |
|---|---|
Compute pattern | Massively parallel matrix math vs Sequential token generation |
Memory access | Batch-optimized, high throughput vs Latency-sensitive, per-token |
Cost scaling | Fixed (train once) vs Variable (scales with users and queries) |
Hardware optimized for it | GPUs (A100, H100, B200) vs LPUs, TPUs, custom ASICs |
Revenue model | One-time capex vs Per-query opex (cost-per-token) |
Market share 2026 (est.) | ~40% of AI compute spend vs ~60% of AI compute spend |
This is why Nvidia paid $20 billion for Groq's technology: the company that dominates training (Nvidia holds >80% GPU market share) recognized it needed a fundamentally different architecture to win the inference side. The $4 billion photonics deals with Lumentum and Coherent address the networking bottleneck; the Groq deal addresses the compute bottleneck for inference specifically.
📰 Related Stories
Nvidia | News, Coverage, and Analysis Hub
March 2026Nvidia-Groq LPU Inference Platform | Full GTC 2026 Coverage
March 2026Nvidia Blackwell B300 | Data Center Demand Surge 2026
March 2026Nvidia $4B Photonics | Lumentum, Coherent, AI Bottleneck
March 2026OpenAI | News and Coverage Hub
March 2026Filed under
Discussion
Every comment appears live in our Discord server.
Join to see the full conversation and connect with the community.
Comments sync to our ObjectWire Discord · Nvidia Groq $20B Inference Deal | LPX Platform, 3GW OpenAI Commitment at GTC 2026.