NVIDIA Vera Rubin opens gates to agentic AI frontier
![]() |
| Source: NVIDIA. Seven different chips are part of the NVIDIA Vera Rubin platform. |
AI infrastructure is evolving from discrete
chips and standalone servers to fully-integrated rack-scale systems, pod-scale deployments, AI factories and sovereign AI, according to NVIDIA. These advances are
driving dramatic gains in performance, improving cost efficiency for
organisations of all sizes and across industries, while helping
democratise access to AI and improve energy efficiency to power the
world’s most demanding workloads.
AI
labs and frontier model developers including Anthropic, Meta, Mistral
AI and OpenAI are looking to use the NVIDIA Vera Rubin platform to train
larger, more capable models and to serve long-context, multimodal
systems at lower latency and cost than with prior GPU generations.
New are:
NVIDIA Vera Rubin NVL72 rack
Integrating
72 Rubin GPUs and 36 Vera CPUs connected by NVLink 6, along with
ConnectX-9 SuperNICs and BlueField-4 DPUs, Vera Rubin NVL72 delivers
breakthrough efficiency — training large mixture-of-experts models with
one-fourth the number of GPUs compared with the NVIDIA Blackwell
platform, and achieving up to 10x higher inference throughput per watt
at one-tenth the cost per token.
Designed for hyperscale AI
factories worldwide, NVL72 scales seamlessly with NVIDIA Quantum-X800
InfiniBand and Spectrum-X Ethernet to sustain high utilisation across
massive GPU clusters while reducing time to train and total cost of
ownership.
NVIDIA Vera CPU rack
Reinforcement
learning and agentic AI workloads rely on large numbers of CPU-based
environments to test and validate the results generated by models
running on GPU systems. The NVIDIA Vera CPU rack delivers dense,
liquid-cooled infrastructure built on NVIDIA MGX, integrating 256 Vera
CPUs to provide scalable, energy-efficient capacity with world- class
single-threaded performance, unlocking agentic AI at scale.
Integrated
with Spectrum-X Ethernet networking, Vera CPU racks keep CPU
environments tightly synchronized across the AI factory. Together with
GPU compute racks, they provide the CPU foundation for large-scale
agentic AI and reinforcement learning — with Vera delivering results
twice as efficiently and 50% faster than traditional CPUs.
NVIDIA Groq 3 LPX rack
NVIDIA Groq 3 LPX marks a milestone in accelerated computing. Designed
for the low-latency and large-context demands of agentic systems, LPX
and Vera Rubin unite the extreme performance of both processors to
deliver up to 35x higher inference throughput per megawatt and up to 10x
more revenue opportunity for trillion-parameter models.
At
scale, a fleet of language processing units (LPUs) function as a giant single processor for fast,
deterministic inference acceleration. The LPX rack with 256 LPU
processors features 128 GB of on-chip SRAM and 640 TBps of scale-up
bandwidth. Deployed with Vera Rubin NVL72, Rubin GPUs and LPUs boost
decode by jointly computing every layer of the AI model for every output
token.
Optimised for trillion-parameter models and
million-token context, the codesigned LPX architecture pairs with Vera
Rubin to maximise efficiency across power, memory and compute. The
additional throughput per watt and token performance unlocks a new tier
of ultra-premium, trillion-parameter, million-context inference,
expanding revenue opportunity for all AI providers.
Fully liquid-cooled
and built on MGX reference server infrastructure, LPX integrates seamlessly into
next-generation Vera Rubin AI factories, to be available in 2H26.
NVIDIA BlueField-4 STX storage rack
The NVIDIA BlueField-4 STX rack-scale system is an AI-native storage infrastructure that extends
GPU memory across the pod. Powered by BlueField-4 — combining
the NVIDIA Vera CPU and NVIDIA ConnectX-9 SuperNIC — STX delivers a
high-bandwidth shared layer optimised for storing and retrieving the
massive key-value cache data generated by large language models and
agentic AI workflows.
NVIDIA DOCA Memos — a new DOCA framework
that supercharges BlueField-4 storage — enables dedicated key value (KV) cache
storage processing to boost inference throughput by up to 5x while
significantly improving power efficiency compared with general-purpose
storage architectures. The result is pod-wide context that delivers
faster multiturn interactions with AI agents, more scalable AI services
and higher overall infrastructure utilisation.
“The NVIDIA
BlueField-4 STX rack-scale context memory storage system will enable a
critical performance boost needed to exponentially scale our agentic AI
efforts,” said Timothée Lacroix, cofounder and CTO of Mistral AI. Mistral AI has models for the Middle East and Asia.
“By delivering a new storage tier purpose-built for AI
agents memory, STX is ideally positioned to ensure that our models can
maintain coherence and speed when reasoning across massive datasets.”
NVIDIA Spectrum-6 SPX Ethernet rack
Spectrum-6 SPX Ethernet is engineered to accelerate east-west traffic
across AI factories. Configurable with either Spectrum-X Ethernet or
NVIDIA Quantum-X800 InfiniBand switches, it delivers low-latency,
high-throughput rack-to-rack connectivity at scale.
Spectrum-X
Ethernet Photonics with co-packaged optics achieves up to 5x greater
optical power efficiency and 10x higher resiliency compared with
traditional pluggable transceivers.
NVIDIA, along with over 200 data centre infrastructure partners, also
announced the NVIDIA DSX platform for Vera Rubin. DSX covers AI factory design and deployment. This includes DSX
Max-Q to enable dynamic power provisioning across the entire AI factory,
resulting in the deployment of 30% more AI infrastructure within a
fixed-power data centre. The new DSX Flex software enables AI factories
to be grid-flexible assets, unlocking 100 gigawatts of stranded grid
power.
NVIDIA further released the Vera Rubin DSX AI Factory reference design,
a blueprint for codesigned AI infrastructure that maximises tokens per
watt and overall goodput, improving system resiliency and accelerating
time to first production.
By tightly integrating compute,
networking, storage, power and cooling, the architecture increases
energy efficiency and ensures AI factories can scale reliably under
continuous, high-intensity workloads with maximum uptime.
Details
Vera
Rubin-based products will be available from partners starting 2H26. This
includes cloud providers Amazon Web Services, Google Cloud, Microsoft
Azure and Oracle Cloud Infrastructure, along with NVIDIA Cloud Partners
CoreWeave, Crusoe, Lambda, Nebius, Nscale and Together AI.
Global
system manufacturers Cisco, Dell Technologies, HPE, Lenovo and
Supermicro are expected to deliver a wide range of servers based on Vera
Rubin products, as well as Aivres, ASUS, Foxconn, GIGABYTE, Inventec,
Pegatron, Quanta Cloud Technology (QCT), Wistron and Wiwynn.
Hashtags: #GTC, GTC2026
*NIC stands for network interface card.

Comments
Post a Comment