/* ---- Google Analytics Code Below */

Wednesday, November 16, 2022

Grace Hopper Superchip Design

Nice to see this effort named after my former Pentagon colleague. See previous mentions of her in this blog at the 'Grace Hopper' label Link.   Technical. 

NVIDIA Grace Hopper Superchip Architecture In-Depth

By Jonathon Evans, Michael Andersch, Vikram Sethi, Gonzalo Brito and Vishal Mehta

CUDA, Grace Hopper Superchip, HPC / Supercomputing, Technical Walkthrough

The NVIDIA Grace Hopper Superchip Architecture is the first true heterogeneous accelerated platform for high-performance computing (HPC) and AI workloads. It accelerates applications with the strengths of both GPUs and CPUs while providing the simplest and most productive distributed heterogeneous programming model to date. Scientists and engineers can focus on solving the world’s most important problems.

In this post, you learn all about the Grace Hopper Superchip and highlight the performance breakthroughs that NVIDIA Grace Hopper delivers. For more information about the speedups that Grace Hopper achieves  over the most powerful PCIe-based accelerated platforms using NVIDIA Hopper H100 GPUs, see the NVIDIA Grace Hopper Superchip Architecture whitepaper.

Performance and productivity for strong-scaling HPC and giant AI workloads

The NVIDIA Grace Hopper Superchip architecture brings together the groundbreaking performance of the NVIDIA Hopper GPU with the versatility of the NVIDIA Grace CPU, connected with a high bandwidth and memory coherent NVIDIA NVLink Chip-2-Chip (C2C) interconnect in a single superchip, and support for the new NVIDIA NVLink Switch System.

Diagram of the NVIDIA Grace Hopper Superchip showing the LPDDR5X, HBM3, NVLink, and I/O bandwidths as well as memory capacities. Hopper has up to 96 GB HBM3 at up to 3000 GB/s bandwidth. Grace has up to 512 GB LPDDR5X at up to 546 GB/s bandwidth. Grace and Hopper are connected with NVLink C2C at up to 900 GB/s bandwidth. The Grace Hopper Superchip has up to 64 PCIe Gen 5 lanes delivering up to 512 GB/s bandwidth and up to 18x NVLink for lanes delivering up to 900 GB/s to the NVLink Switch network.

Figure 2. NVIDIA Grace Hopper Superchip logical overview

NVIDIA NVLink-C2C is an NVIDIA memory coherent, high-bandwidth, and low-latency superchip interconnect. It is the heart of the Grace Hopper Superchip and delivers up to 900 GB/s total bandwidth. This is 7x higher bandwidth than x16 PCIe Gen5 lanes commonly used in accelerated systems.

NVLink-C2C memory coherency increases developer productivity and performance and enables GPUs to access large amounts of memory.CPU and GPU threads can now concurrently and transparently access both CPU– and GPU-resident memory, enabling you to focus on algorithms instead of explicit memory management.

Memory coherency enables  you to transfer only the data you need, and not migrate entire pages to and from the GPU. It also enables lightweight synchronization primitives across GPU and CPU threads by enabling native atomic operations from both the CPU and GPU. NVLink-C2C with Address Translation Services (ATS) leverages the NVIDIA Hopper Direct Memory Access (DMA) copy engines for accelerating bulk transfers of pageable memory across host and device.  ... ' 

No comments: