Log In
- Create an account
  
  Please enter your email address
  
  Please enter your password
  
  Remember Me
  - Forgot Username or Password?
  - Unsubscribe

Resources

Modeling and Simulation of Collective Algorithms on HPC Network Topologies using Structural Simulation Toolkit

In the last decade, DL training has emerged as an HPC-scale workload running on large clusters. The dominant communication pattern in distributed data-parallel DL training is allreduce which is used to sum the model gradients across processes during backpropagation phase. Various allreduce algorithms have been developed to optimize communication time in DL training. Given the scale of DL workloads, it is crucial to evaluate the scaling efficiency of these algorithms on a variety of system architectures. We have extended the Structural Simulation Toolkit (SST) to simulate allreduce and barrier algorithms - Rabenseifner, ring, and, dissemination algorithms. We performed a design space exploration (DSE) study with three allreduce algorithms and two barrier algorithms running on six system network topologies for various message sizes. We quantified the performance benefits of using allreduce algorithms which preserve locality between communicating processes. In addition, we evaluated the scaling efficiency of centralized and decentralized barrier algorithms.

Event Name

SC24 IXPUG Workshop

Keywords

HPC,HPC and AI,Modeling,Simulation,Collective Algorithms,HPC Network Topologies,Structural Simulation Toolkit

Resources

Modeling and Simulation of Collective Algorithms on HPC Network Topologies using Structural Simulation Toolkit

Event Name

Keywords

Video Name