NVIDIA brings large language AI models to enterprises
NVIDIA is enabling enterprises to build their own domain-specific chatbots, personal assistants and other sophisticated AI applications.
The company has unveiled the NVIDIA NeMo Megatron framework for training language models with trillions of parameters. The technology includes the Megatron 530B customisable large language model (LLM) that can be trained for new domains and languages, and the NVIDIA Triton Inference Server with multi-GPU, multinode distributed inference functionality.
Combined with NVIDIA DGX systems, these tools provide a production-ready, enterprise-grade solution to simplify the development and deployment of large language models. The framework is optimised to scale out across the large-scale accelerated computing offered by the NVIDIA DGX SuperPOD.
“Large language models have proven to be flexible and capable, able to answer deep domain questions, translate languages, comprehend and summarise documents, write stories and compute programs, all without specialised training or supervision,” said Bryan Catanzaro, VP, Applied Deep Learning Research at NVIDIA.
“Building large language models for new languages and domains is likely the largest supercomputing application yet, and now these capabilities are within reach for the world’s enterprises.”
NVIDIA NeMo Megatron builds on advancements from Megatron, an open-source project led by NVIDIA researchers studying efficient training of large transformer language models at scale. Megatron 530B is the world’s largest customisable language model.
NeMo Megatron automates the complexity of LLM training with data processing libraries that ingest, curate, organise and clean data. Using advanced technologies for data, tensor and pipeline parallelisation, it enables the training of large language models to be distributed efficiently across thousands of GPUs. Enterprises can use the NeMo Megatron framework to train LLMs for their specific domains and languages.
New multi-GPU, multinode features in the latest NVIDIA Triton Inference Server will enable LLM inference workloads to scale across multiple GPUs and nodes with real-time performance. The models require more memory than is available in a single GPU or even a large server with multiple GPUs, and inference must run quickly to be useful in applications.
With the Triton Inference Server, Megatron 530B can run on two NVIDIA DGX systems to shorten the processing time from over a minute on a CPU server to half a second, making it possible to deploy LLMs for real-time applications.
Early adopters building large language models with NVIDIA DGX SuperPOD include JD Explore Academy and VinBrain. JD Explore Academy, the research and development division of JD.com, a leading supply chain-based technology and service provider, is utilising NVIDIA DGX SuperPOD to develop natural language processing (NLP) for the application of smart customer service, smart retail, smart logistics, IoT, healthcare and more.
VinBrain, a Vietnam-based healthcare AI company, has used a DGX SuperPOD to develop and deploy a clinical language model for radiologists and telehealth in 100 hospitals, where it is used by over 600 healthcare practitioners.
Explore
Enterprises can experience developing and deploying large language models at no charge in curated labs with the new NVIDIA LaunchPad.
Organisations can apply to join the early access program for the NVIDIA NeMo Megatron accelerated framework for training large language models.
NVIDIA Triton is available from the NVIDIA NGC catalogue, a hub for GPU-optimised AI software that includes frameworks, toolkits, pretrained models and Jupyter Notebooks, and as open source code from the Triton GitHub repository.
Triton is also included in the NVIDIA AI Enterprise software suite, which is optimised, certified and supported by NVIDIA. Enterprises can use the software suite to run language model inference on mainstream accelerated servers in on-prem data centres and private clouds.
NVIDIA DGX SuperPOD and NVIDIA DGX systems are available from NVIDIA’s global resellers, which can provide pricing to qualified customers upon request.
The announcement was made at NVIDIA GTC, taking place online through November 11, 2021. Watch NVIDIA founder and CEO Jensen Huang’s GTC keynote address streaming on November 9 and in replay.
Comments
Post a Comment