NVIDIA A100 GPUs and DGX SuperPOD systems are the world’s fastest commercially-available products for AI training

July 30, 2020

NVIDIA delivers the world’s fastest artificial intelligence (AI)-training performance among commercially-available products, according to just-released MLPerf Benchmarks. This is the third consecutive - and strongest showing - for NVIDIA in training tests from MLPerf, an industry benchmarking group formed in May 2018.

NVIDIA’s new DGX SuperPOD, built in less than a month and featuring more than 2,000 NVIDIA A100 GPUs, swept every MLPerf Benchmark category for at-scale performance among commercially-available products. The A100 Tensor Core GPU demonstrated the fastest performance per accelerator on all eight MLPerf Benchmarks. For overall fastest time-to-solution at scale, the DGX SuperPOD system, a cluster of DGX A100 systems connected with HDR InfiniBand*, also set eight new performance milestones.

The MLPerf Benchmarks — backed by organisations including Amazon, Baidu, Facebook, Google, Intel, and Microsoft — constantly evolve to remain relevant as AI itself evolves. The latest benchmarks featured two new tests and one substantially-revised test, all of which NVIDIA excelled in.

One ranked performance in recommendation systems, an increasingly popular AI task; another tested conversational AI using BERT, one of the most complex neural network models in use today. Finally, the reinforcement learning test used Mini-Go with the full-size 19x19 Go board and was the most complex test in this round, involving diverse operations from game play to training.

NVIDIA was the only company to field commercially-available products for all the tests. Most other submissions used the preview category for products, which means that may not be available for several months, or the research category for products, products which are not expected to be available for some time.

Of the nine companies submitting results, seven submitted with NVIDIA GPUs including cloud service providers (Alibaba Cloud, Google Cloud, Tencent Cloud) and server makers (Dell, Fujitsu, and Inspur), highlighting the strength of NVIDIA’s ecosystem.

The MLPerf partners represent part of an ecosystem of nearly two dozen cloud-service providers and original equipment manufacturers (OEMs) with products or plans for online instances, servers and PCIe cards using NVIDIA A100 GPUs. Many of these partners used containers on NGC, NVIDIA’s software hub, along with publicly-available frameworks for their submissions.

According to Paresh Kharya, Senior Director of product management, Data Center Computing, NVIDIA, the real winners are customers who enjoy these performance levels to transform their businesses more quickly and more cost-effectively with AI. "We are just getting started," he said. "Enterprises and industries are just starting to adopt AI."

In addition to breaking performance records, the A100, the first processor based on the NVIDIA Ampere architecture, has hit the market faster than previous NVIDIA GPUs. At launch, it powered NVIDIA’s third-generation DGX systems, and it became publicly available in a Google cloud service just six weeks later.

Also helping meet the strong demand for A100 are leading cloud providers, such as Amazon Web Services, Baidu Cloud, Microsoft Azure and Tencent Cloud, as well as dozens of major server makers, including Dell Technologies, Hewlett Packard Enterprise, Inspur and Supermicro.

“Users across the globe are applying the A100 to tackle the most complex challenges in AI, data science and scientific computing. Some are enabling a new wave of recommendation systems or conversational AI applications while others power the quest for treatments for COVID-19. All are enjoying the greatest generational performance leap in eight generations of NVIDIA GPUs,” Kharya said.

Source: NVIDIA. The NVIDIA DGX SuperPOD system has set new milestones for AI training at scale.

Meanwhile, the original DGX-1 system based on NVIDIA V100 can now deliver up to 2x higher performance thanks to the latest software optimisations. These gains came in under two years. As added context, NVIDIA set six records in the first MLPerf training benchmarks in December 2018 and eight in July 2019.

Companies are already reaping the benefits of these performance highs. DGX SuperPODs are driving business results for companies like Lockheed Martin in aerospace and Microsoft in cloud-computing services. Alibaba also hit a US$38 billion sales record on Singles Day in November 2019 using NVIDIA GPUs instead of CPUs to deliver more than 100x more queries per second on its recommendation systems.

Explore:

View the MLPerf Training v0.7 results

*HDR InfiniBand enables extremely low latencies and high data throughput, while offering smart deep learning computing acceleration engines via Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) technology.

Search This Blog

TechTouch Asia

NVIDIA A100 GPUs and DGX SuperPOD systems are the world’s fastest commercially-available products for AI training

Comments

Post a Comment

Popular posts from this blog

Fortinet enhances FortiRecon to align with CTEM framework

SentinelOne recognised as a 2025 Gartner Peer Insights Customers’ Choice for XDR

AWS: AI adoption grows 20% in Singapore