Lightning Talks

4:00 Alexander Breuer

Title: Best Practices for the Xeon Phi Coprocessor: Tuning SG++, ls1 mardyn and SeisSol

In this presentation we show recent advances of our Intel Parallel Computing Center at Leibniz Supercomputing Centre and Technische Universität München regarding the Xeon Phi coprocessor. A broad field of applications is covered: High-dimensional problems, molecular dynamics and earthquake simulation. The first part of the talk covers characteristic challenges of SG++ and ls1 mardyn. We present solutions and best practices on the coprocessors to exploit the different levels of parallelism at highest performance. Especially novel algorithms to utilize the advanced SIMD capabilities and Phi-specific vector instructions turn out to be of major importance. In the second part we focus on the end-to-end performance tuning of the SeisSol software package. SeisSol simulates dynamic rupture and seismic wave propagation at petascale performance in production runs. Sustained machine-size performance of more than 2 DP-PFLOPS on Stampede for a Landers 1992 earthquake strong-scaling setting conclude the presentation.

4:15 Alexander Gaenko

Title: Porting GAMESS to Xeon Phi: Advances and Challenges

GAMESS is a freely available, mature, powerful quantum chemistry software package, developed at Ames Laboratory. The GAMESS is parallelized on the process, rather than thread, level, using Generalized Distributed Data Interface (GDDI) library. The GDDI library supports both two-sided (message passing) and one-sided (distributed memory) communication models, and can utilize MPI-1, TCP/IP and/or shared memory underlying transport mechanisms. The support of full-featured OS, TCP/IP stack and shared memory on Intel Xeon Phi makes it a very attractive candidate for running GAMESS in the native mode. The presentation will outline our experience with running quantum many-body methods of GAMESS on Intel Xeon Phi, the use of offload and native modes, the mixed process/tread-based parallelization with and without MPI, the challenges and successes, and the initial performance analysis results

4:30 Milind Girkar

Title: Explicit Vector Programming

As processor designs have faced increasing power constraints, processor architecture has evolved to offer multiple cores and wider vector execution units on a single die to increase performance. Exploiting these innovations in hardware requires software developers to find the parallelism in their program and to express it in high level programming languages. While constructs to express threaded parallelism to utilize the multiple cores have been available for some time, language level constructs for exploiting the vector execution units have only recently become practical. We show how vector execution can be expressed through the recently published OpenMP 4.0 standard and its implementation in Intel compilers on Intel processors supporting SIMD (Intel SSE4.2, Intel AVX, Intel AVX2, Intel AVX-512) instructions.

4:45 John Michalakes

Title: Optimizing Weather Model Physics on Phi

I will discuss objectives, challenges, strategies, and experiences porting and optimizing physics packages from the Weather Research and Forecast (WRF) model and the NOAA Non-hydrostatic Multiscale Model (NMM-B) on Xeon Phi. The current focus intra-node, dealing with improving performance on individual Phi processors, but scaling to large numbers of Phi-enabled nodes with whole codes (not just kernels) is the goal.

5:00 Paul Peltz

Title: Best Practices for Administering a Medium Sized Cluster with Intel Xeon Phi Coprocessors

This work describes the best practices for configuring and managing an Intel Xeon Phi cluster. The Xeon Phi presents a unique environment to the user and preparing this environment requires unique procedures. This work will outline these procedures and provide examples for HPC Administrators to utilize and then customize for their system. Considerable effort has been put forth to help researchers determine how to maximize their performance on the Xeon Phi, but little has been done for the administrators of these systems. Now that the Xeon Phis are being deployed on larger systems, there is a need for information on how to manage and deploy these systems. The information provided here will serve as a supplement to the documentation Intel provides in order to bridge the gap between workstation and cluster deployments. This work is based on the authors experiences deploying and maintaining the Beacon cluster at the University of Tennessee’s Application Acceleration Center of Excellence (AACE).

5:15 James Rosinski

Title: Porting and Optimizing NOAA/ESRL Weather Models on the Intel Xeon Phi Architecture

NOAA/ESRL is developing a numerical weather forecast model (named NIM) designed to run on a variety of architectures, including traditional CPUs as well as fine-grained hardware including Xeon Phi and GPU. One software constraint to this work is the need for a single-source solution for all supported platforms. In this talk we will describe the software development issues specific to porting and optimizing NIM for the Xeon Phi. In addition to performance results, tradeoffs of the symmetric vs. offload approaches to sharing the workload between the Phi and the host will be described. Issues associated with port validation, communication, and load balancing between host and coprocessor will also be discussed.