The NVIDIA DGX A100 could replace what's in your AI data centre now

NVIDIA has unveiled NVIDIA DGX A100, a third-generation artificial intelligence (AI) system delivering 5 petaflops of AI performance and consolidates the power and capabilities of an entire data centre into a single flexible platform.

Source: NVIDIA. The NVIDIA DGX A100 AI system.
Source: NVIDIA. The NVIDIA DGX A100 AI system.

A single rack of five DGX A100 systems replaces a data centre of AI training and inference infrastructure, with 1/20th the power consumed, 1/25th the space and at 1/10th the cost. Prices begin at under US$200,000. Immediately available, DGX A100 systems have begun shipping worldwide.

“NVIDIA DGX A100 is the ultimate instrument for advancing AI,” said Jensen Huang, founder and CEO of NVIDIA. “NVIDIA DGX is the first AI system built for the end-to-end machine learning workflow — from data analytics to training to inference. And with the giant performance leap of the new DGX, machine learning engineers can stay ahead of the exponentially growing size of AI models and data.”

DGX A100 systems integrate eight of the new NVIDIA A100 Tensor Core GPUs, providing 320 GB of memory for training the largest AI datasets, and the latest high-speed NVIDIA Mellanox HDR 200 Gbps interconnects. Multiple smaller workloads can be accelerated by partitioning the DGX A100 into as many as 56 instances per system, using the A100 multi-instance GPU feature.

Combining these capabilities enables enterprises to optimise computing power and resources on demand to accelerate diverse workloads, including data analytics, training and inference, on a single, fully integrated, software-defined platform.

A number of the world’s largest companies, service providers and government agencies have placed initial orders for the DGX A100. Among other early adopters in the Asia Pacific Middle East region are:

● Chulalongkorn University — Thailand’s top research-intensive university — which will use DGX A100 to accelerate pioneering research such as Thai natural language processing, automatic speech recognition, computer vision, and medical imaging.

● Harrison.ai — a Sydney-based healthcare AI company — will deploy Australia’s first DGX A100 systems to accelerate the development of its AI-as-medical-device.

● The UAE Artificial Intelligence Office — first in the Middle East to deploy the new DGX A100 — is building a national infrastructure to accelerate AI research, development and adoption across the public and private sector.

● VinAI Research — A major AI research lab, based in Hanoi and Ho Chi Minh City in Vietnam, will use DGX A100 to conduct high-impact research and accelerate the application of AI.

NVIDIA also revealed its DGX SuperPOD, a cluster of 140 DGX A100 systems capable of achieving 700 petaflops of AI computing power. Combining 140 DGX A100 systems with Mellanox HDR 200 Gbps InfiniBand interconnects, NVIDIA built the DGX SuperPOD AI supercomputer for internal research in areas such as conversational AI, genomics and autonomous driving. The cluster is one of the world’s fastest AI supercomputers — achieving a level of performance that previously required thousands of servers.

The enterprise-ready architecture and performance of the DGX A100 enabled NVIDIA to build the system in less than a month, instead of taking months or years of planning and procurement of specialised components previously required to deliver these supercomputing capabilities.

To help customers build their own A100-powered data centres, NVIDIA has released a new DGX SuperPOD reference architecture. It gives customers a blueprint that follows the same design principles and best practices NVIDIA used to build its DGX A100-based AI supercomputing cluster. 

NVIDIA also launched the NVIDIA DGXpert programme, which brings together DGX customers with the company’s AI experts; and the NVIDIA DGX-Ready Software programme, which helps customers take advantage of certified, enterprise-grade software for AI workflows. DGXperts are AI-fluent specialists who can help guide clients on AI deployments, from planning to implementation to ongoing optimisation. These individuals can help DGX A100 customers build and maintain state-of-the-art AI infrastructure.

The NVIDIA DGX-Ready Software programme helps customers quickly identify and take advantage of NVIDIA-tested third-party MLOps software that can help them increase data science productivity, accelerate AI workflows and improve accessibility and utilisation of AI infrastructure. MLOps refers to practices that facilitate collaboration and communications between data scientists and the operations teams when it comes to machine learning (ML).

The first programme partners certified by NVIDIA are Allegro AI, cnvrg.io, Core Scientific, Domino Data Lab, Iguazio, and Paperspace.

Storage technology providers DDN Storage, Dell Technologies, IBM, NetApp, Pure Storage, and Vast plan to integrate DGX A100 into their offerings, including those based on the NVIDIA DGX POD and DGX SuperPOD reference architectures.

NVIDIA DGX-Ready Data Centre partners offer colocation services in more than 122 locations across 26 countries to help customers seeking cost-effective facilities to host their DGX infrastructure. Customers can take advantage of these services to house and access DGX A100 infrastructure inside validated, world-class data centre facilities.

Details:

DGX A100 technical specifications

● Eight NVIDIA A100 Tensor Core GPUs, delivering 5 petaflops of AI power, with 320 GB in total GPU memory with 12.4 TB per second bandwidth.

● Six NVIDIA NVSwitch interconnect fabrics with third-generation NVIDIA NVLink technology for 4.8 TB per second of bidirectional bandwidth.

● Nine Mellanox ConnectX-6 HDR 200 Gb per second network interfaces, offering a total of 3.6 Tb per second of bidirectional bandwidth.

● Mellanox In-Network Computing and network acceleration engines such as RDMA, GPUDirect and Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) to enable the highest performance and scalability.

● Fifteen TB Gen4 NVMe internal storage, which is 2x faster than Gen3 NVMe solid state drives (SSDs).

● NVIDIA DGX software stack, which includes optimised software for AI and data science workloads, delivering maximised performance and enabling enterprises to achieve a faster return on their investment in AI infrastructure.

NVIDIA DGX A100 systems start at US$199,000 and are shipping now through NVIDIA Partner Network resellers worldwide.

Comments

Popular posts from this blog

Fortinet enhances FortiRecon to align with CTEM framework

SentinelOne recognised as a 2025 Gartner Peer Insights Customers’ Choice for XDR

AWS: AI adoption grows 20% in Singapore