SANTA CLARA, CA — On Tuesday, March 10, 2026, Jim Keller's Tenstorrent delivered the most credible challenge yet to Nvidia's dominance of the AI workstation market. The company unveiled the TT-QuietBox 2 — a liquid-cooled desktop AI powerhouse built on the RISC-V architecture, packaged with a fully open-source software stack and priced at $9,999.
That price point is nearly $2,000 cheaper than the original QuietBox, signaling a deliberate push to move RISC-V AI hardware from niche research labs into the mainstream developer's office. The QuietBox 2 is the first system of its kind to deliver teraflop-class local LLM inference without relying on a traditional GPU — and without locking developers into Nvidia's proprietary CUDA ecosystem.
TT-QuietBox 2 — At a Glance
| Specification | Detail |
|---|---|
| Price (base) | $9,999 — ~$2,000 less than the original QuietBox |
| Architecture | RISC-V — open instruction set, full hardware transparency |
| Performance class | Teraflop-class inference — first RISC-V workstation to reach this tier |
| Max model size (base config) | 120 billion parameters (e.g., Llama 3 variants, Grok-1) |
| Cooling | High-efficiency liquid cooling loop — designed for silent office deployment |
| Software stack | TT-Buda (compiler) + TT-Metalium (kernel suite) — fully open source |
| GPU dependency | None — no Nvidia GPU, no CUDA required |
| Announced | March 10, 2026 — Santa Clara, CA |
Breaking the CUDA Lock
The QuietBox 2's most strategically significant feature is not its performance — it is its independence. CUDA, Nvidia's proprietary parallel computing platform, has been the de facto standard for AI model training and inference since the first deep learning boom. The result is a decade-long lock-in: most AI frameworks, model checkpoints, and production pipelines are optimized specifically for CUDA, making migration to alternative hardware extremely costly even when alternatives exist.
Tenstorrent's RISC-V approach offers a fundamentally different value proposition. Because RISC-V is an open instruction set architecture, developers have full visibility — and full control — from the compiler layer down to the kernel. Tenstorrent describes this as transparency "from compiler to kernel," enabling engineers to optimize AI models at a granular hardware level that proprietary GPU architectures actively prevent.
The Open-Source Software Edge: TT-Buda & TT-Metalium
Tenstorrent's software strategy is built on two open-source components:
TT-Buda — The Compiler
TT-Buda is Tenstorrent's open-source AI compiler suite. It handles the translation of high-level model definitions (PyTorch, JAX, ONNX) into optimized instruction sequences for the RISC-V hardware. Because the compiler is open source, developers can inspect, modify, and contribute optimization passes — something impossible with Nvidia's proprietary TensorRT.
TT-Metalium — The Kernel Suite
TT-Metalium provides the low-level kernel primitives that execute directly on the Tenstorrent silicon. The suite's open-source nature means that debugging complex model behavior — tracing a numerical instability, for example, or profiling a memory bandwidth bottleneck — can be done with full hardware visibility rather than relying on opaque vendor profiling tools.
| Advantage | What It Means in Practice |
|---|---|
| Transparency | Full access to hardware instructions — debug model behavior at the silicon level, not just the framework level |
| Portability | Code written for QuietBox 2 avoids CUDA-specific idioms, making it easier to port to other RISC-V server architectures as they emerge |
| Community optimization | Open compiler and kernel repos allow the broader developer community to contribute performance improvements Nvidia cannot |
| No licensing risk | No proprietary SDK terms — model optimization work is owned entirely by the developer |
120 Billion Parameters — Locally, Silently
The base configuration of the QuietBox 2 supports running models with up to 120 billion parameters entirely locally — covering specialized variants of Llama 3, Grok-1, and comparable open-weight models. For enterprises and researchers with data privacy requirements that prohibit sending inference requests to cloud APIs, this is the critical capability: frontier-class reasoning on hardware you physically control, in a room you can work in.
The liquid cooling system is central to that last point. Previous RISC-V AI systems capable of this parameter scale have required server rack hardware — noisy, power-hungry, and incompatible with standard office infrastructure. The QuietBox 2's high-efficiency liquid cooling loop keeps the processors at peak performance levels without the acoustic footprint of air-cooled alternatives, making it deployable in a standard developer workspace.
Market Context: Who Is This For?
The QuietBox 2 is not positioned as a consumer product. At $9,999, it targets three specific buyer profiles:
- AI researchers who need full hardware transparency to debug and understand model behavior — not just benchmark it
- Enterprise security and compliance teams with data residency requirements that prohibit cloud inference for sensitive workloads
- Independent developers and labs exploring RISC-V as an alternative to Nvidia-dependent infrastructure ahead of potential CUDA ecosystem disruption
The competitive framing is explicit: Tenstorrent is not trying to match Nvidia's training throughput on large clusters. It is targeting the inference and local deployment market — specifically the developers building production applications who currently rent cloud GPU time for every inference call and want an alternative.