Nvidia Groq LPU Deal | $20B Inference Chip at GTC 2026

⚡

At a Glance: On December 24, 2025, Nvidia finalized a $20 billion non-exclusive perpetual license for Groq's full LPU patent portfolio and software stack, the largest transaction in Nvidia's history. The deal transferred 80-90% of Groq's workforce, including founder Jonathan Ross and President Sunny Madra. At GTC 2026 in San Jose, Nvidia will unveil the LPX inference platform with 64-256 LPUs per rack. OpenAI has committed 3GW of dedicated inference capacity using the new system.

What Happened | The $20 Billion Christmas Eve Deal

Nvidia finalized a $20 billion non-exclusive perpetual license with Groq on December 24, 2025, gaining access to Groq's full patent portfolio and software stack for inference optimization. The agreement is structured as a license, not an acquisition, meaning Groq continues to operate independently under new leadership with its GroqCloud service.

The deal included transfer of physical assets and roughly 80 to 90 percent of Groq's workforce, including core engineering teams. Key personnel who moved to Nvidia include founder Jonathan Ross and President Sunny Madra. Groq had reached a $6.9 billion valuation after a $750 million funding round in September 2025, just three months before the deal closed.

Deal Parameter	Detail
Deal value	$20 billion (non-exclusive perpetual license)
Closing date	December 24, 2025
Structure	Patent license + workforce transfer, not full acquisition
Scope	Full Groq patent portfolio, software stack, physical assets
Workforce transferred	80-90% of Groq employees, including engineering core
Key personnel	Jonathan Ross (founder), Sunny Madra (President)
Groq post-deal	Independent operations continue under new leadership (GroqCloud)
Prior Groq valuation	$6.9 billion (September 2025, $750M round)
Nvidia deal rank	Largest transaction in company history

Nvidia-Groq deal structure, December 2025

What Is Groq's LPU | Why Nvidia Paid $20 Billion for It

Groq's Language Processing Unit (LPU) employs deterministic execution with large on-chip SRAM, hundreds of megabits per chip, to eliminate the bandwidth bottlenecks common in GPU-based inference. Unlike GPUs, which rely on high-bandwidth memory (HBM) that creates latency during sequential token generation, the LPU keeps all active data on-die.

In public demonstrations prior to the deal, Groq showed 10,000 thought tokens generated in approximately 2 seconds, a throughput rate that outpaced GPU-based inference by an order of magnitude for sequential decode workloads. The architecture is purpose-built for the kind of token-by-token generation that dominates deployed AI applications: chatbots, code assistants, search, and agentic AI systems requiring real-time responses.

LPU Feature	GPU Comparison
Execution model	Deterministic (predictable latency) vs Stochastic (variable)
Primary memory	Large on-chip SRAM vs Off-chip HBM (HBM2e/HBM3)
Memory bottleneck	Eliminated (on-die) vs Primary constraint (bandwidth-bound)
Token generation speed	~5,000 tokens/sec demonstrated vs ~200-800 tokens/sec typical
Latency profile	Near-constant per token vs Variable, increases with context
Optimal workload	Sequential inference (decode) vs Parallel compute (training + prefill)
Power per token	Significantly lower vs Higher (GPU overhead)

Groq LPU vs traditional GPU architecture for inference workloads

LPX Inference Platform | What Nvidia Will Unveil at GTC 2026

The new platform, referred to as LPX in industry analyses, builds on Groq's LPU for dedicated inference racks. Nvidia is expected to reveal the full specifications at its GTC developer conference in San Jose during March 2026.

LPX Specification	Detail
Base configuration	64 LPUs per rack
Packaging	32 RealScale ASIC tiles (2 LPUs per tile)
Scaled configuration	256 LPUs per rack (4x base)
Target workload	Low-latency decode, agentic AI, real-time inference
Integration approach	Groq LPU silicon + Nvidia networking/software stack
Customer anchor	OpenAI (3GW dedicated inference capacity committed)
Market positioning	Complementary to Blackwell GPUs, not replacement

Reported LPX inference platform specifications, based on industry analyses

The LPX platform is positioned as complementary to Nvidia's existing Blackwell B300 GPU racks , which dominate training workloads. Jensen Huang has described the strategy as offering “the right silicon for the right workload”: Blackwell for training and prefill, LPX for decode and real-time inference. This mirrors the approach Nvidia took after acquiring Mellanox in 2020 for networking, where the acquired technology became an accelerator within the broader Nvidia ecosystem rather than a standalone product line.

OpenAI's 3GW Commitment | Why It Matters

OpenAI has committed to 3 gigawatts of dedicated inference capacity using the LPX platform, positioning itself as the anchor customer. To contextualize that number: 3GW is roughly equivalent to the power output of three nuclear power plants or the total electricity consumption of a city of 2 million people.

Prior to the Nvidia-Groq deal, OpenAI had been exploring inference alternatives with both Cerebras and Groq directly. The deal effectively channeled those relationships through Nvidia, giving OpenAI a single vendor for both training (Blackwell) and inference (LPX) hardware.

Metric	Context
OpenAI inference commitment	3GW dedicated capacity
Power equivalence	~3 nuclear power plants or ~2 million households
Prior OpenAI inference partners	Cerebras (explored), Groq (explored), now Nvidia LPX
Training hardware	Nvidia Blackwell B200/B300 (unchanged)
Combined Nvidia relationship	Training + Inference from single vendor
Inference cost driver	Each ChatGPT query costs ~10x a Google search in compute

OpenAI inference capacity commitment and power context

Deal Timeline | From Groq Funding to GTC Unveil

September 2025

Groq raises $750M at $6.9B valuation

Series D round values Groq at $6.9 billion, validating LPU architecture and GroqCloud traction.

October-November 2025

Nvidia-Groq negotiations accelerate

Nvidia approaches Groq for licensing deal as inference demand projections surge beyond GPU capacity.

December 24, 2025

$20 billion licensing deal finalized

Non-exclusive perpetual license signed. Jonathan Ross and Sunny Madra transfer to Nvidia. 80-90% of workforce follows.

January 2026

Nvidia stock declines ~7% over two sessions

Despite record Q4 earnings, market reacts to $20B deal cost and inference market uncertainty.

February 2026

OpenAI commits 3GW inference capacity

OpenAI signs as anchor customer for LPX platform, committing 3 gigawatts of dedicated inference power.

March 2026

GTC 2026 in San Jose

Nvidia expected to unveil full LPX platform specifications, pricing, and deployment timeline at annual developer conference.

Nvidia's Stock Reaction | Why Markets Sold Off Despite Record Earnings

Nvidia's stock declined approximately 7 percent over two trading sessions following the deal announcement, despite the company reporting record quarterly earnings in the same period. The market reaction reflected several concerns.

Concern	Market Interpretation
$20B deal size	Largest in Nvidia history, questions about capital allocation discipline
Non-exclusive license	Groq retains right to license LPU to other chipmakers, Nvidia does not get exclusivity
Inference market uncertainty	Unproven at scale whether LPU outperforms next-gen GPUs long-term
GroqCloud independence	Groq continues competing in cloud inference under new leadership
Integration risk	Merging LPU architecture with Nvidia software stack (CUDA) is non-trivial
Valuation multiple	$20B for a company valued at $6.9B three months earlier (2.9x premium)

Market concerns driving Nvidia stock decline after Groq deal

Competitive Landscape | Who Else Builds Inference Hardware

The Nvidia-Groq deal neutralized one of the most promising inference-focused competitors, but several others remain. The inference hardware market is fragmenting as the AI industry recognizes that training and inference require fundamentally different chip architectures.

Company	Inference Approach
Nvidia (LPX, post-Groq)	LPU-based deterministic inference racks, 64-256 LPUs per rack
Nvidia (Blackwell GPUs)	GPU-based inference via H100/B200/B300, flexible but less efficient for decode
Cerebras	Wafer-Scale Engine (WSE-3), 900,000 cores per chip, targeting both training and inference
SambaNova	Reconfigurable dataflow architecture (SN40L), targets enterprise inference
D-Matrix	Digital in-memory computing for inference, early-stage
Qualcomm (Cloud AI 100)	ARM-based inference accelerator for edge and cloud
Intel (Gaudi 3)	AI accelerator targeting price-competitive inference workloads
AMD (MI300X)	192GB HBM3 GPU, inference-capable but optimized for training
Google TPU v5e	Custom inference-optimized TPU for internal Google workloads

AI inference hardware competitive landscape, early 2026

Training vs Inference | Why the AI Industry Is Splitting

The AI hardware market is undergoing a structural shift. For the first five years of the deep learning era (2018-2023), nearly all compute spending went to training: building models. But as models reach production scale, inference, running those models to answer queries, has become the dominant cost center.

Factor	Training vs Inference
Compute pattern	Massively parallel matrix math vs Sequential token generation
Memory access	Batch-optimized, high throughput vs Latency-sensitive, per-token
Cost scaling	Fixed (train once) vs Variable (scales with users and queries)
Hardware optimized for it	GPUs (A100, H100, B200) vs LPUs, TPUs, custom ASICs
Revenue model	One-time capex vs Per-query opex (cost-per-token)
Market share 2026 (est.)	~40% of AI compute spend vs ~60% of AI compute spend

Structural differences between AI training and inference workloads

This is why Nvidia paid $20 billion for Groq's technology: the company that dominates training (Nvidia holds >80% GPU market share) recognized it needed a fundamentally different architecture to win the inference side. The $4 billion photonics deals with Lumentum and Coherent address the networking bottleneck; the Groq deal addresses the compute bottleneck for inference specifically.

💬

“Nvidia buying Groq's inference tech is the clearest signal yet that GPUs alone cannot serve the next billion AI queries. The company that built the training era just admitted it needs different silicon for the inference era.”

📰 Related Stories

Tech

Nvidia Groq $20B Inference Deal | LPX Platform, 3GW OpenAI Commitment at GTC 2026

What Happened | The $20 Billion Christmas Eve Deal

What Is Groq's LPU | Why Nvidia Paid $20 Billion for It

LPX Inference Platform | What Nvidia Will Unveil at GTC 2026

OpenAI's 3GW Commitment | Why It Matters

Deal Timeline | From Groq Funding to GTC Unveil

Groq raises $750M at $6.9B valuation

Nvidia-Groq negotiations accelerate

$20 billion licensing deal finalized

Nvidia stock declines ~7% over two sessions

OpenAI commits 3GW inference capacity

GTC 2026 in San Jose

Nvidia's Stock Reaction | Why Markets Sold Off Despite Record Earnings

Competitive Landscape | Who Else Builds Inference Hardware

Training vs Inference | Why the AI Industry Is Splitting

📰 Related Stories

Nvidia | News, Coverage, and Analysis Hub

Nvidia-Groq LPU Inference Platform | Full GTC 2026 Coverage

Nvidia Blackwell B300 | Data Center Demand Surge 2026

Nvidia $4B Photonics | Lumentum, Coherent, AI Bottleneck

OpenAI | News and Coverage Hub

Discussion