Home Blogs Data Center Explorer Microsoft details its ChatGPT hardware investments

Microsoft details its ChatGPT hardware investments

News Analysis

Mar 16, 20233 mins

Cloud Computing

In addition to sinking billions into OpenAI, Microsoft spent hundreds of millions on hardware, much of it from Nvidia.

Microsoft investment in ChatGPT doesn’t just involve money sunk into its maker, OpenAI, but a massive hardware investment in data centers as well which shows that for now, AI solutions are just for the very top tier companies.

The partnership between Microsoft and OpenAI dates back to 2019, when Microsoft invested $1 billion in the AI developer. It upped the ante in January with the investment of an additional $10 billion.

But ChatGPT has to run on something, and that is Azure hardware in Microsoft data centers. How much has not been disclosed, but according to a report by Bloomberg, Microsoft had already spent “several hundred million dollars” in hardware used to train ChatGPT.

In a pair of blog posts, Microsoft detailed what went into building the AI infrastructure to run ChatGPT as part of the Bing service. It already offered virtual machines for AI processing built on Nvidia’s A100 GPU, called ND A100 v4. Now it is introducing the ND H100 v5 VM based on newer hardware and offering VM sizes ranging from eight to thousands of NVIDIA H100 GPUs.

In his blog post, Matt Vegas, principal product manager of Azure HPC+AI, wrote customers will see significantly faster performance for AI models over the ND A100 v4 VMs. The new VMs are powered by Nvidia H100 Tensor Core GPUs (“Hopper” generation) interconnected via next gen NVSwitch and NVLink 4.0, Nvidia’s 400 Gb/s Quantum-2 CX7 InfiniBand networking, 4th Gen Intel Xeon Scalable processors (“Sapphire Rapids”) with PCIe Gen5 interconnects and DDR5 memory.

Just how much hardware he did not say, but he did say that Microsoft is delivering multiple exaFLOPs of supercomputing power to Azure customers. There is only one exaFLOP supercomputer that we know of, as reported by the latest TOP500 semiannual list of the world’s fastest: Frontier at the Oak Ridge National Labs. But that’s the thing about the TOP500; not everyone reports their supercomputers, so there may be other systems out there just as powerful as Frontier, but we just don’t know about them.

In a separate blog post, Microsoft talked about how the company started working with OpenAI to help create the supercomputers that are needed for ChatGPT’s large language model(and for Microsoft’s own Bing Chat. That meant linking up thousands of GPUs together in a new way that even Nvidia hadn’t thought of, according to Nidhi Chappell, Microsoft head of product for Azure high-performance computing and AI..

“This is not something that you just buy a whole bunch of GPUs, hook them together, and they’ll start working together. There is a lot of system-level optimization to get the best performance, and that comes with a lot of experience over many generations,” Chappell said.

To train a large language model, the workload is partitioned across thousands of GPUs in a cluster and at certain steps in the process, the GPUs exchange information on the work they’ve done. An InfiniBand network pushes the data around at high speed, since the validation step must be completed before the GPUs can start the next step of processing.

The Azure infrastructure is optimized for large-language model training, but it took years of incremental improvements to its AI platform to get there. The combination of GPUs, networking hardware and virtualization software required to deliver Bing AI is immense and is spread out across 60 Azure regions around the world.

ND H100 v5 instances are available for preview and will become a standard offering in the Azure portfolio, but Microsoft has not said when. Interested parties can request access to the new VMs.

by Andy Patrizio

Andy Patrizio is a freelance journalist based in southern California who has covered the computer industry for 20 years and has built every x86 PC he’s ever owned, laptops not included.

The opinions expressed in this blog are those of the author and do not necessarily represent those of ITworld, Network World, its parent, subsidiary or affiliated companies.

Americas

Topics

About

Policies

Our Network

More

Microsoft details its ChatGPT hardware investments

In addition to sinking billions into OpenAI, Microsoft spent hundreds of millions on hardware, much of it from Nvidia.

Most popular authors

Show me more

Elon Musk’s xAI to build supercomputer to power next-gen Grok

Regulators sound out users on cloud services competition concerns

Backgrounding and foregrounding processes in the Linux terminal

Has the hype around ‘Internet of Things’ paid off? | Ep. 145

Episode 1: Understanding Cisco’s Converged SDN Transport

Episode 2: Pluggable Optics and the Internet for the Future

Has the hype around ‘Internet of Things’ paid off?

Are unused IPv4 addresses a secret gold mine?

Preparing for a 6G wireless world: Exciting changes coming to the wireless industry

Microsoft details its ChatGPT hardware investments

In addition to sinking billions into OpenAI, Microsoft spent hundreds of millions on hardware, much of it from Nvidia.

Related content

AMD holds steady against Intel in Q1

Broadcom launches 400G Ethernet adapters

HPE updates block storage services

ZutaCore launches liquid cooling for advanced Nvidia chips

Newsletter Promo Module Test

Most popular authors

Show me more

Elon Musk’s xAI to build supercomputer to power next-gen Grok

Regulators sound out users on cloud services competition concerns

Backgrounding and foregrounding processes in the Linux terminal

Has the hype around ‘Internet of Things’ paid off? | Ep. 145

Episode 1: Understanding Cisco’s Converged SDN Transport

Episode 2: Pluggable Optics and the Internet for the Future

Has the hype around ‘Internet of Things’ paid off?

Are unused IPv4 addresses a secret gold mine?

Preparing for a 6G wireless world: Exciting changes coming to the wireless industry