Technology

Tenstorrent Disrupts AI Workstation Market with $9,999 RISC-V "QuietBox 2"

Jim Keller's Tenstorrent has unveiled the TT-QuietBox 2 — the first developer-ready AI workstation built on RISC-V with a fully open-source software stack. At $9,999, it runs 120-billion-parameter models locally and delivers teraflop-class inference without a single line of CUDA.

March 12, 2026📖 6 min read

SANTA CLARA, CA — On Tuesday, March 10, 2026, Jim Keller's Tenstorrent delivered the most credible challenge yet to Nvidia's dominance of the AI workstation market. The company unveiled the TT-QuietBox 2 — a liquid-cooled desktop AI powerhouse built on the RISC-V architecture, packaged with a fully open-source software stack and priced at $9,999.

That price point is nearly $2,000 cheaper than the original QuietBox, signaling a deliberate push to move RISC-V AI hardware from niche research labs into the mainstream developer's office. The QuietBox 2 is the first system of its kind to deliver teraflop-class local LLM inference without relying on a traditional GPU — and without locking developers into Nvidia's proprietary CUDA ecosystem.

The TT-QuietBox 2 is the first developer-ready AI workstation to combine RISC-V architecture, teraflop-class local inference, 120B-parameter model support, liquid cooling, and a fully open-source compiler and kernel stack — at a sub-$10,000 price point.

TT-QuietBox 2 — At a Glance

SpecificationDetail
Price (base)$9,999 — ~$2,000 less than the original QuietBox
ArchitectureRISC-V — open instruction set, full hardware transparency
Performance classTeraflop-class inference — first RISC-V workstation to reach this tier
Max model size (base config)120 billion parameters (e.g., Llama 3 variants, Grok-1)
CoolingHigh-efficiency liquid cooling loop — designed for silent office deployment
Software stackTT-Buda (compiler) + TT-Metalium (kernel suite) — fully open source
GPU dependencyNone — no Nvidia GPU, no CUDA required
AnnouncedMarch 10, 2026 — Santa Clara, CA

Breaking the CUDA Lock

The QuietBox 2's most strategically significant feature is not its performance — it is its independence. CUDA, Nvidia's proprietary parallel computing platform, has been the de facto standard for AI model training and inference since the first deep learning boom. The result is a decade-long lock-in: most AI frameworks, model checkpoints, and production pipelines are optimized specifically for CUDA, making migration to alternative hardware extremely costly even when alternatives exist.

Tenstorrent's RISC-V approach offers a fundamentally different value proposition. Because RISC-V is an open instruction set architecture, developers have full visibility — and full control — from the compiler layer down to the kernel. Tenstorrent describes this as transparency "from compiler to kernel," enabling engineers to optimize AI models at a granular hardware level that proprietary GPU architectures actively prevent.

CUDA lock-in is not just a licensing problem — it is an optimization ceiling. On Nvidia hardware, developers can only tune what Nvidia's closed drivers expose. On a RISC-V system with an open-source stack, the ceiling does not exist in the same way. That matters most for researchers and engineers who need to understand exactly what their model is doing at the hardware level.

The Open-Source Software Edge: TT-Buda & TT-Metalium

Tenstorrent's software strategy is built on two open-source components:

TT-Buda — The Compiler

TT-Buda is Tenstorrent's open-source AI compiler suite. It handles the translation of high-level model definitions (PyTorch, JAX, ONNX) into optimized instruction sequences for the RISC-V hardware. Because the compiler is open source, developers can inspect, modify, and contribute optimization passes — something impossible with Nvidia's proprietary TensorRT.

TT-Metalium — The Kernel Suite

TT-Metalium provides the low-level kernel primitives that execute directly on the Tenstorrent silicon. The suite's open-source nature means that debugging complex model behavior — tracing a numerical instability, for example, or profiling a memory bandwidth bottleneck — can be done with full hardware visibility rather than relying on opaque vendor profiling tools.

AdvantageWhat It Means in Practice
TransparencyFull access to hardware instructions — debug model behavior at the silicon level, not just the framework level
PortabilityCode written for QuietBox 2 avoids CUDA-specific idioms, making it easier to port to other RISC-V server architectures as they emerge
Community optimizationOpen compiler and kernel repos allow the broader developer community to contribute performance improvements Nvidia cannot
No licensing riskNo proprietary SDK terms — model optimization work is owned entirely by the developer

120 Billion Parameters — Locally, Silently

The base configuration of the QuietBox 2 supports running models with up to 120 billion parameters entirely locally — covering specialized variants of Llama 3, Grok-1, and comparable open-weight models. For enterprises and researchers with data privacy requirements that prohibit sending inference requests to cloud APIs, this is the critical capability: frontier-class reasoning on hardware you physically control, in a room you can work in.

The liquid cooling system is central to that last point. Previous RISC-V AI systems capable of this parameter scale have required server rack hardware — noisy, power-hungry, and incompatible with standard office infrastructure. The QuietBox 2's high-efficiency liquid cooling loop keeps the processors at peak performance levels without the acoustic footprint of air-cooled alternatives, making it deployable in a standard developer workspace.

💬
For developers who need to run 100B+ models locally — for privacy, for latency, for offline capability — the QuietBox 2 is the first time that option has been available in a form factor that fits under a desk rather than in a data center.

Market Context: Who Is This For?

The QuietBox 2 is not positioned as a consumer product. At $9,999, it targets three specific buyer profiles:

  • AI researchers who need full hardware transparency to debug and understand model behavior — not just benchmark it
  • Enterprise security and compliance teams with data residency requirements that prohibit cloud inference for sensitive workloads
  • Independent developers and labs exploring RISC-V as an alternative to Nvidia-dependent infrastructure ahead of potential CUDA ecosystem disruption

The competitive framing is explicit: Tenstorrent is not trying to match Nvidia's training throughput on large clusters. It is targeting the inference and local deployment market — specifically the developers building production applications who currently rent cloud GPU time for every inference call and want an alternative.

📊
The TT-QuietBox 2 is available for order as of March 10, 2026, at $9,999 for the base configuration. Tenstorrent has not announced an upper configuration tier or a shipping timeline for enterprise bulk orders.

Tags

#Tenstorrent#RISC-V#AI Hardware#QuietBox 2#Jim Keller#Nvidia#Open Source AI#TT-Buda#LLM Inference#AI Workstation
J

Written by

Jack Wang

Technology Desk

Part ofObjectWirecoverage
📩 Newsletter

Stay ahead of every story

Breaking news, deep-dives, and editor picks — delivered straight to your inbox. No spam, ever.

Free · Unsubscribe anytime · No ads