NVIDIA Vera Rubin opens gates to agentic AI frontier

Source: NVIDIA. Seven different chips are part of the NVIDIA Vera Rubin platform.
Source: NVIDIA. Seven different chips are part of the NVIDIA Vera Rubin platform.


AI infrastructure is evolving from discrete chips and standalone servers to fully-integrated rack-scale systems,  pod-scale deployments, AI factories and sovereign AI, according to NVIDIA. These advances are driving dramatic gains in performance, improving cost efficiency for organisations of all sizes and across industries, while helping democratise access to AI and improve energy efficiency to power the world’s most demanding workloads. 

AI labs and frontier model developers including Anthropic, Meta, Mistral AI and OpenAI are looking to use the NVIDIA Vera Rubin platform to train larger, more capable models and to serve long-context, multimodal systems at lower latency and cost than with prior GPU generations.

New are:

NVIDIA Vera Rubin NVL72 rack

Integrating 72 Rubin GPUs and 36 Vera CPUs connected by NVLink 6, along with ConnectX-9 SuperNICs and BlueField-4 DPUs, Vera Rubin NVL72 delivers breakthrough efficiency — training large mixture-of-experts models with one-fourth the number of GPUs compared with the NVIDIA Blackwell platform, and achieving up to 10x higher inference throughput per watt at one-tenth the cost per token.

Designed for hyperscale AI factories worldwide, NVL72 scales seamlessly with NVIDIA Quantum-X800 InfiniBand and Spectrum-X Ethernet to sustain high utilisation across massive GPU clusters while reducing time to train and total cost of ownership.

NVIDIA Vera CPU rack

Reinforcement learning and agentic AI workloads rely on large numbers of CPU-based environments to test and validate the results generated by models running on GPU systems. The NVIDIA Vera CPU rack delivers dense, liquid-cooled infrastructure built on NVIDIA MGX, integrating 256 Vera CPUs to provide scalable, energy-efficient capacity with world- class single-threaded performance, unlocking agentic AI at scale.

Integrated with Spectrum-X Ethernet networking, Vera CPU racks keep CPU environments tightly synchronized across the AI factory. Together with GPU compute racks, they provide the CPU foundation for large-scale agentic AI and reinforcement learning — with Vera delivering results twice as efficiently and 50% faster than traditional CPUs.

NVIDIA Groq 3 LPX rack 

NVIDIA Groq 3 LPX marks a milestone in accelerated computing. Designed for the low-latency and large-context demands of agentic systems, LPX and Vera Rubin unite the extreme performance of both processors to deliver up to 35x higher inference throughput per megawatt and up to 10x more revenue opportunity for trillion-parameter models.

At scale, a fleet of language processing units (LPUs) function as a giant single processor for fast, deterministic inference acceleration. The LPX rack with 256 LPU processors features 128 GB of on-chip SRAM and 640 TBps of scale-up bandwidth. Deployed with Vera Rubin NVL72, Rubin GPUs and LPUs boost decode by jointly computing every layer of the AI model for every output token.

Optimised for trillion-parameter models and million-token context, the codesigned LPX architecture pairs with Vera Rubin to maximise efficiency across power, memory and compute. The additional throughput per watt and token performance unlocks a new tier of ultra-premium, trillion-parameter, million-context inference, expanding revenue opportunity for all AI providers. 

Fully liquid-cooled and built on MGX reference server infrastructure, LPX integrates seamlessly into next-generation Vera Rubin AI factories, to be available in 2H26.

NVIDIA BlueField-4 STX storage rack

The NVIDIA BlueField-4 STX rack-scale system is an AI-native storage infrastructure that extends GPU memory across the pod. Powered by BlueField-4 — combining the NVIDIA Vera CPU and NVIDIA ConnectX-9 SuperNIC — STX delivers a high-bandwidth shared layer optimised for storing and retrieving the massive key-value cache data generated by large language models and agentic AI workflows.

NVIDIA DOCA Memos — a new DOCA framework that supercharges BlueField-4 storage — enables dedicated key value (KV) cache storage processing to boost inference throughput by up to 5x while significantly improving power efficiency compared with general-purpose storage architectures. The result is pod-wide context that delivers faster multiturn interactions with AI agents, more scalable AI services and higher overall infrastructure utilisation.

“The NVIDIA BlueField-4 STX rack-scale context memory storage system will enable a critical performance boost needed to exponentially scale our agentic AI efforts,” said Timothée Lacroix, cofounder and CTO of Mistral AI. Mistral AI has models for the Middle East and Asia.

“By delivering a new storage tier purpose-built for AI agents memory, STX is ideally positioned to ensure that our models can maintain coherence and speed when reasoning across massive datasets.”

NVIDIA Spectrum-6 SPX Ethernet rack

Spectrum-6 SPX Ethernet is engineered to accelerate east-west traffic across AI factories. Configurable with either Spectrum-X Ethernet or NVIDIA Quantum-X800 InfiniBand switches, it delivers low-latency, high-throughput rack-to-rack connectivity at scale.

Spectrum-X Ethernet Photonics with co-packaged optics achieves up to 5x greater optical power efficiency and 10x higher resiliency compared with traditional pluggable transceivers. 

NVIDIA, along with over 200 data centre infrastructure partners, also announced the NVIDIA DSX platform for Vera Rubin. DSX covers AI factory design and deployment. This includes DSX Max-Q to enable dynamic power provisioning across the entire AI factory, resulting in the deployment of 30% more AI infrastructure within a fixed-power data centre. The new DSX Flex software enables AI factories to be grid-flexible assets, unlocking 100 gigawatts of stranded grid power.

NVIDIA further released the Vera Rubin DSX AI Factory reference design, a blueprint for codesigned AI infrastructure that maximises tokens per watt and overall goodput, improving system resiliency and accelerating time to first production. 

By tightly integrating compute, networking, storage, power and cooling, the architecture increases energy efficiency and ensures AI factories can scale reliably under continuous, high-intensity workloads with maximum uptime.

Details

Vera Rubin-based products will be available from partners starting 2H26. This includes cloud providers Amazon Web Services, Google Cloud, Microsoft Azure and Oracle Cloud Infrastructure, along with NVIDIA Cloud Partners CoreWeave, Crusoe, Lambda, Nebius, Nscale and Together AI.

Global system manufacturers Cisco, Dell Technologies, HPE, Lenovo and Supermicro are expected to deliver a wide range of servers based on Vera Rubin products, as well as Aivres, ASUS, Foxconn, GIGABYTE, Inventec, Pegatron, Quanta Cloud Technology (QCT), Wistron and Wiwynn. 

Hashtags: #GTC, GTC2026

*NIC stands for network interface card.

Comments

Popular posts from this blog

Fortinet enhances FortiRecon to align with CTEM framework

SentinelOne recognised as a 2025 Gartner Peer Insights Customers’ Choice for XDR

AWS: AI adoption grows 20% in Singapore