ObjectWire | Tech, Gaming, Crypto & Culture News 2026

⚡

At a Glance: Nvidia's DGX B300 delivers 144 PFLOPS FP4 and 2.1 TB total GPU memory in an 8-GPU system at ~14 kW. AMD's MI300X offers 192 GB HBM3 per GPU with 5.3 TB/s bandwidth, the highest single-chip memory in the group. Google's TPU v6 Trillium improves peak compute by 4.7x over TPU v5e, doubles HBM and interconnect bandwidth, and is over 67% more energy-efficient. Each chip wins a different race, and no single vendor-published benchmark covers all three.

What Each Chip Is Best At | The One-Table Summary

Before diving into specifications, here is the fundamental positioning of each chip. These are not interchangeable products. Each was designed for a different primary workload, and the “best” choice depends entirely on what you are building.

Full Specification Comparison | B300 vs MI300X vs TPU v6

The following table compiles every published specification from Nvidia, AMD, and Google. Where vendors report different metrics (system-level vs chip-level, FP4 vs FP8 vs BF16), we note the difference. This is the core reference table.

Memory Capacity | Why 192 GB Per GPU Matters for LLM Inference

AMD's MI300X stands out with 192 GB of HBM3 per GPU, the highest single-chip memory in this comparison. For LLM inference, memory capacity determines the largest model you can host on a single accelerator without splitting it across multiple chips via tensor parallelism.

A 70-billion parameter model in FP16 requires approximately 140 GB of VRAM just for the model weights. Add the KV cache for a 100K-token context window (as detailed in our TurboQuant KV cache analysis ), and a single MI300X can host the entire model plus a substantial context window on one GPU, no tensor parallelism needed. Neither the B300 nor the TPU v6 publishes a per-chip memory figure that matches 192 GB (though the B300 system total of 2.1 TB across 8 GPUs is higher in aggregate).

This is why MI300X has found traction with inference-heavy deployments: a single GPU can host a 70B model with room to spare, while Nvidia's H100 (80 GB) and even H200 (141 GB) require multi-GPU setups for the same workload.

Raw Compute | 144 PFLOPS FP4 and What It Actually Means

Nvidia's headline figure of 144 PFLOPS FP4 for the DGX B300 is the most aggressive throughput claim in this comparison. However, context matters: FP4 (4-bit floating point) is a low-precision format primarily useful for inference workloads where quantized models can tolerate reduced precision. Training typically runs at BF16 or FP8, where the raw PFLOPS number would be significantly lower.

When Nvidia quotes 144 PFLOPS FP4 and AMD quotes MI300X compute in FP8 or BF16, they are measuring different things. A direct comparison requires normalizing to the same precision, which neither vendor does in their marketing materials. This is the single biggest caveat in any chip-vs-chip comparison.

The precision problem: Nvidia's 144 PFLOPS FP4 headline would be roughly 72 PFLOPS at FP8 and ~36 PFLOPS at BF16 if precision scaling were linear (it is not exactly linear in practice, but the order of magnitude holds). AMD's MI300X at ~2.6 PFLOPS FP8 per GPU would be ~20.8 PFLOPS FP8 in an 8-GPU system. The gap is real, but not 7x, it is closer to 3-4x at equivalent precision.

Energy Efficiency | Google's 67% Advantage and What It Costs

Google positions TPU v6 Trillium primarily on efficiency rather than raw performance. The claim of 67%+ energy efficiency improvement over TPU v5e is significant because power cost is increasingly the dominant line item in AI infrastructure budgets, often exceeding the cost of the hardware itself over a 3-year deployment.

The critical difference: Nvidia and AMD sell hardware you deploy in your own datacenter (or a colo). Google's TPU v6 is only available on Google Cloud, where the efficiency gains are baked into Google's per-hour pricing rather than your electricity bill. This makes direct cost comparison between TPU v6 and GPU-based systems fundamentally different, you are comparing capex + opex (GPU) versus pure opex (TPU cloud).

Interconnect and Scaling | NVLink 5 vs Infinity Fabric vs ICI

For multi-chip training at scale, the interconnect between GPUs/TPUs often matters more than per-chip compute. A fast chip connected by a slow link becomes a fast chip that spends most of its time waiting for data.

Google's advantage here is architectural: TPU pods scale to 9,216 chips in a single interconnected fabric without requiring external InfiniBand switches. Nvidia and AMD systems require expensive InfiniBand or Ethernet networking hardware to scale beyond a single 8-GPU node, adding cost, complexity, and latency. For organizations building 10,000+ chip clusters, this difference is substantial.

Software Stack | CUDA Dominance vs ROCm Momentum vs JAX Lock-in

Hardware specifications only matter if the software can use them. The software ecosystem is often the deciding factor in chip selection, especially for teams with existing codebases and trained engineers.

The practical reality: most AI teams have CUDA expertise and CUDA-optimized code. Switching to ROCm or JAX/TPU carries real migration costs. This is why Nvidia maintains >80% market share despite AMD and Google offering competitive hardware, CUDA's moat is the ecosystem, not the silicon.

Practical Decision Framework | Which Chip for Which Workload

Based on the published specifications and positioning from all three vendors, here is how to choose.

The Benchmark Problem | Why No Fair Comparison Exists

A critical caveat underpins this entire comparison: there is no single third-party benchmark that tests all three chips on the same workload under the same conditions.

The best available proxy is MLPerf, the industry's closest thing to a standardized AI benchmark. Nvidia dominates MLPerf submissions. AMD has submitted MI300X results that show competitive inference performance on Llama 2 workloads. Google submits TPU results but often in categories or configurations that do not directly overlap with GPU submissions.

If you want a truly rigorous comparison, the best next step is to benchmark all three on one workload, such as MLPerf inference on Llama 3 70B, measuring tokens per second per dollar. No vendor publishes this figure, which tells you everything about how competitive the market actually is.

What Is Coming Next | B300 Ultra, MI350X, TPU v7

256 GB per GPU.', }, { time: 'Late 2026', title: 'Nvidia B300 Ultra / B400 rumors', description: 'Nvidia roadmap suggests annual cadence. B300 Ultra or next-gen expected announcement Q4 2026 or GTC 2027.', }, { time: '2027', title: 'Google TPU v7 expected', description: 'Next-generation TPU following Trillium. Google Cloud Next 2027 expected venue for announcement.', }, ]} />

Related Coverage Across ObjectWire

This comparison connects to a broader set of AI infrastructure reporting across ObjectWire. The following articles provide deeper context on specific aspects of the chip war.

Discussion

Comments post live to the ObjectWire Discord server.

Join server →

Every comment appears live in our Discord server.

Join to see the full conversation and connect with the community.

Join ObjectWire Discord

Comments sync to our ObjectWire Discord · Nvidia B300 vs AMD MI300X vs Google TPU v6 | 2026 AI Chip Specs, Workloads, Cost Comparison.

Nvidia B300 vs AMD MI300X vs Google TPU v6 | 2026 AI Chip Specs, Workloads, Cost Comparison

What Each Chip Is Best At | The One-Table Summary

Full Specification Comparison | B300 vs MI300X vs TPU v6

Memory Capacity | Why 192 GB Per GPU Matters for LLM Inference

Raw Compute | 144 PFLOPS FP4 and What It Actually Means

Energy Efficiency | Google's 67% Advantage and What It Costs

Interconnect and Scaling | NVLink 5 vs Infinity Fabric vs ICI

Software Stack | CUDA Dominance vs ROCm Momentum vs JAX Lock-in

Practical Decision Framework | Which Chip for Which Workload

The Benchmark Problem | Why No Fair Comparison Exists

What Is Coming Next | B300 Ultra, MI350X, TPU v7

Related Coverage Across ObjectWire

More from Nvidia

Indie Devs Call for Nvidia Boycott Over DLSS 5

NVIDIA Earth Day 2026 | 5 Breakthrough AI Initiatives Saving the Planet

NVIDIA\u2019s Blackwell B300 Demand Has \u2018Completely Broken\u2019 Data Center Planning Models

NVIDIA to Grant Engineers

Qualcomm and Nvidia Launch Competing 6G Coalitions at MWC 2026 in Barcelona

Jensen Huang Says AI Will Boost Enterprise Software

Discussion