Delivering on the promise of advanced AI for our customers requires supercomputing infrastructure, services, and expertise to address the exponentially increasing size and complexity of the latest models. At Microsoft, we are meeting this challenge by applying a decade of experience in supercomputing and supporting the largest AI training workloads to create AI infrastructure capable of massive performance at scale. The Microsoft Azure cloud, and specifically our graphics processing unit (GPU) accelerated virtual machines (VMs), provide the foundation for many generative AI advancements from both Microsoft and our customers.
“Co-designing supercomputers with Azure has been crucial for scaling our demanding AI training needs, making our research and alignment work on systems like ChatGPT possible.”—Greg Brockman, President and Co-Founder of OpenAI.
Azure's most powerful and massively scalable AI virtual machine series
Today, Microsoft is introducing the ND H100 v5 VM which enables on-demand in sizes ranging from eight to thousands of NVIDIA H100 GPUs interconnected by NVIDIA Quantum-2 InfiniBand networking. Customers will see significantly faster performance for AI models over our last generation ND A100 v4 VMs with innovative technologies like:
◉ 8x NVIDIA H100 Tensor Core GPUs interconnected via next gen NVSwitch and NVLink 4.0
◉ 400 Gb/s NVIDIA Quantum-2 CX7 InfiniBand per GPU with 3.2Tb/s per VM in a non-blocking fat-tree network
◉ NVSwitch and NVLink 4.0 with 3.6TB/s bisectional bandwidth between 8 local GPUs within each VM
◉ 4th Gen Intel Xeon Scalable processors
◉ PCIE Gen5 host to GPU interconnect with 64GB/s bandwidth per GPU
◉ 16 Channels of 4800MHz DDR5 DIMMs
Delivering exascale AI supercomputers to the cloud
Generative AI applications are rapidly evolving and adding unique value across nearly every industry. From reinventing search with a new AI-powered Microsoft Bing and Edge to AI-powered assistance in Microsoft Dynamics 365, AI is rapidly becoming a pervasive component of software and how we interact with it, and our AI Infrastructure will be there to pave the way. With our experience of delivering multiple-ExaOP supercomputers to Azure customers around the world, customers can trust that they can achieve true supercomputer performance with our infrastructure. For Microsoft and organizations like Inflection, NVIDIA, and OpenAI that have committed to large-scale deployments, this offering will enable a new class of large-scale AI models.
"Our focus on conversational AI requires us to develop and train some of the most complex large language models. Azure's AI infrastructure provides us with the necessary performance to efficiently process these models reliably at a huge scale. We are thrilled about the new VMs on Azure and the increased performance they will bring to our AI development efforts."—Mustafa Suleyman, CEO, Inflection.
AI at scale is built into Azure’s DNA. Our initial investments in large language model research, like Turing, and engineering milestones such as building the first AI supercomputer in the cloud prepared us for the moment when generative artificial intelligence became possible. Azure services like Azure Machine Learning make our AI supercomputer accessible to customers for model training and Azure OpenAI Service enables customers to tap into the power of large-scale generative AI models. Scale has always been our north star to optimize Azure for AI. We’re now bringing supercomputing capabilities to startups and companies of all sizes, without requiring the capital for massive physical hardware or software investments.
“NVIDIA and Microsoft Azure have collaborated through multiple generations of products to bring leading AI innovations to enterprises around the world. The NDv5 H100 virtual machines will help power a new era of generative AI applications and services.”—Ian Buck, Vice President of hyperscale and high-performance computing at NVIDIA.
Today we are announcing that ND H100 v5 is available for preview and will become a standard offering in the Azure portfolio, allowing anyone to unlock the potential of AI at Scale in the cloud.
Source: azure.microsoft.com
0 comments:
Post a Comment