
This online self-paced course provides basic training for Alliance users on using GPUs on our national systems. Modern GPUs (such as NVIDIA A100 and H100) are massively parallel and very expensive devices. Most of GPU jobs are incapable of utilizing these GPUs efficiently, either due to the problem size being too small to saturate the GPU, or due to the intermittent (bursty) GPU utilization pattern. This course will teach you how to measure the GPU utilization of your jobs on our clusters, and show how to use the two NVIDIA technologies - MPS (Multi-Process Service) and MIG (Multi-Instance GPU) - to improve GPU utilization.
Prerequisites: none
Estimated time: one hour
- Teacher: Sergey Mashchenko
Access is restricted to Digital Research Alliance of Canada (formerly Compute Canada) authenticated users only: Yes