MLPerf benchmarks, which measure training and inference performance of ML hardware and software, have published three sets of ML training results so far. In all sets of results, ResNet50v1.5 was used as a standard benchmark to showcase the latest developments on image recognition tasks. The latest MLPerf training round (v0.7) featured Intel’s submission with TensorFlow. In this paper, we describe the recent optimization work that enabled this submission. In particular, we enabled BFloat16 data type in ResNet50v1.5 model as well as in Intel-optimized TensorFlow to exploit full potential of 3rd generation Intel Xeon scalable processors that have built-in BFloat16 support. We also describe the performance optimizations as well as the state-of-the-art accuracy/convergence results of ResNet50v1.5 model, achieved with large-scale distributed training (with up to 256 MPI workers) with Horovod. These results lay great foundation to support future MLPerf training submissions with large scale Intel Xeon clusters.
IXPUG Workshop at HPC Asia 2021
MLPerf,Training & Inference,TensorFlow,BFloat16,ResNet50,oneDNN,Horovod