We have collected presentations from IXPUG workshops, annual meetings, and BOF sessions, and made them accessible here to view or download. You may search by event, keyword, science domain or author’s name. The database will be updated as new talks are made available.
IXPUG Annual Conference 2025 Welcome Talk & TACC Site Update presented by Lars Koesterke, Research Associate, HPC Performance & Architectures, TACC on April 15, 2025.
Keyword(s): IXPUG Annual Conference 2025,Stampede3,Frontera,National Science Foundation,open science,Icelake,Skylake,Sapphire Rapids,Ponte Vecchio
Author(s): Lars Koesterke Video(s): Read more | |IXPUG Annual Conference 2025 technical talk "Joint Matrix: A Unified SYCL Extension for Matrix Hardware Programming" presented by Dounia Khaldi, Intel Corporation on April 15, 2025. Abstract: Joint matrix is a new SYCL extension for matrix hardware programming. It unifies targets like Intel Advanced Matrix Extensions (Intel AMX), Intel Xe Matrix Extensions (Intel XMX), NVIDIA* Tensor Cores, AMD* Matrix Cores, etc. In general, ML frameworks like Tensorflow and libraries like oneAPI Deep Neural Network Library (oneDNN) are capable of heavily utilizing matrix hardware acceleration, and are the answer for many types of users and applications who want high performance from such hardware. However, for users who want to build their own neural network applications, these libraries and frameworks become too high-level, because users cannot do custom optimizations, and too heavyweight, because the size of libraries is large. Moreover, new operations are often introduced in the machine learning domain for which frameworks and libraries do not provide timely and performant solutions. For such cases, APIs are needed to write custom workload-specific optimizations and this is where joint matrix can help. Joint matrix has a lower level of abstraction than these frameworks and libraries, enabling it to provide performance, productivity, and fusion capabilities but, at the same time, offers portability by using one code to target different matrix hardware. In this talk, we present (1) the main APIs introduced as part of SYCL joint matrix extension, (2) tuning techniques to fully utilize Intel hardware using SYCL joint matrix, and (3) the application and validation of this language feature and tuning techniques using the GEMM benchmark and the ability to fuse kernels such as GEMM and GELU.
Keyword(s): IXPUG Annual Conference 2025,SYCL,Matrix/Tensor programming,Intel XMX
Author(s): Dounia Khaldi Video(s): Read more | |IXPUG Annual Conference 2025 technical talk "OpenMP in oneAPI: Empowering Scientific Computing on Intel Platforms, From Laptop to Aurora Exascale Supercomputer" presented by Jeongnim Kim, Intel Corporation and Ye Luo, Argonne Leadership Computing Facility, Argonne National Laboratory on April 15, 2025. Co-Authors: Patrick Steinbrecher and Xinmin Tian, Intel Corporation.
Keyword(s): IXPUG Annual Conference 2025,OpenMP,Shared-memory programming,oneAPI,Device offloads,Scientific HPC applications,OpenMP interoperability,Portability, performance and productivity,QMCPACK
Author(s): Jeongnim Kim, Ye Luo, Patrick Steinbrecher, Xinmin Tian Video(s): Read more | |IXPUG Annual Conference 2025 technical talk "Cornelis Networking Solution Deep Dive" presented by Matt Williams, Field CTO, Cornelis Networks on April 15, 2025.
Keyword(s): High Performance Networking,AI Networking,AI/HPC network congestion,Low Latency HPC network,Scalable RDMA,Ultra Ethernet Consortium,RoCE for AI/HPC,InfiniBand for AI/HPC,Lossless Ethernet,GPU Clusters for AI
Author(s): Matt Williams Video(s): Read more | |IXPUG Annual Conference 2025 Keynote: The Evolution of Developer Software Presenter: Sanjiv M. Shah, Vice President in the Data Center and AI Group and General Manager of Developer Software Engineering, Intel Corporation
Keyword(s): oneAPI,CPU,GPU,SYCL,UXL,LLVM,OpenMP,Sanitizers,IXPUG Annual Conference 2025
Author(s): Sanjiv Shah Video(s): Read more | |IXPUG Annual Conference 2025 technical talk "Wavenumber Recovery by 2D Frequency-Domain FWI from Slowness- and Source-Frequency-Limited Elastic Seismic Data" presented by Sasmita Mohapatra, University of Texas at Dallas on April 15, 2025. Co-Author: George McMechan, University of Texas at Dallas. Abstract: Full-Waveform Inversion (FWI) is a powerful technique for constructing high-resolution subsurface images, provided that data with broad frequency bands and wide-angle (in 2D) or azimuth (in 3D) apertures are available. Advancements in inversion techniques have significantly enhanced the efficiency of elastic full-waveform inversion (EFWI), achieving an order-of-magnitude performance gain. A multistep-length gradient approach is employed to optimally weight each parameter gradient, stabilizing nonlinear solutions and accelerating convergence to just a few tens of iterations rather than the hundreds typically required. Wavefield extrapolations are performed using parallelized, high-precision finite-element (FE) modeling in the time domain, while inversion is conducted in the frequency domain using discrete Fourier transforms at each time step. Key strategies such as frequency selection, time windowing, and source wavelet estimation mitigate cycle skipping and remove artifacts caused by variations in source spatial patterns. The inversion was validated on a two-dimensional elastic model with a circular anomaly and an elastic Marmousi-2 model using synthetic data generated by a finite-difference modeling scheme, demonstrating robust performance even under a constant-density assumption. Mohapatra and MacMechan (2021) established a numerical relationship between the wavenumbers in the illuminating wavefield and those that can be reconstructed in a target through migration or inversion. Synthetic elastic examples for models containing finite bandwidths illustrate how the ability to recover wavenumbers is limited by the wavenumber information that is contained in the illuminating wavefield, and by the sampling of the data in both time and space. The computation for all 60 iterations was parallelized across sources using MPI on 56 Intel Xeon X5650 CPU cores, completing in approximately 90 hours (Mohapatra & McMechan, 2021).
Keyword(s): IXPUG Annual Conference 2025,Full Waveform Inversion (FWI),Elastic Full Waveform Inversion (EFWI),Marmousi-2 Model,FFT,Cycle Skipping,Wavenumber,Inversion,Frequency Domain
Author(s): Sasmita Mohapatra, George McMechan Video(s): Read more | |IXPUG Annual Conference 2025 technical talk "Optimizing Performance in Parallel Discrete Event Simulations through Profile-Guided Partitioning" presented by Sunil Reddy Maram, University of Massachusetts Dartmouth on April 15, 2025. Abstract: The efficiency of large-scale Discrete Event Simulations (DES) can be significantly enhanced by leveraging parallel processing across multiple computing nodes. However, high communication latencies in distributed environments often hinder the expected performance improvements. This paper introduces a profile-driven approach to partitioning simulation models, aimed at minimizing inter-process communication while balancing computational loads. By profiling the messaging characteristics of various real-world simulation models in a sequential setting, we developed a partitioning strategy that optimizes the distribution of simulation objects across processors. Our experimental results demonstrate that this Profile Guided Partitioning technique can yield substantial performance gains, with observed speedups of up to sixfold in runtime during concurrent execution. This study not only contributes to the understanding of effective simulation model partitioning but also highlights the potential of minimizing network traffic to enhance the overall efficiency of Parallel Discrete Event Simulations.
Keyword(s): IXPUG Annual Conference 2025,Parallel Processing,Distributed Computing,Performance Optimization,Profile-driven Partitioning
Author(s): Sunil Reddy Maram Video(s): Read more | |IXPUG Annual Conference 2025 technical talk "Multi-Scale Light-Matter Dynamics in Quantum Materials on Aurora PVC" presented by Nariman Piroozan, Intel Corporation April 15, 2025. Co-Authors: Taufeq Mohammed Razakh, University of Southern California; Thomas Linker, Stanford University; Ye Luo, Argonne National Laboratory; Ken-ichi Nomura and Aiichiro Nakano, University of Southern California. Abstract: Light-matter dynamics in topological quantum materials enables ultralow-power, ultrafast devices. A challenge is simulating multiple field and particle equations for light, electrons, and atoms over vast spatiotemporal scales on Exaflop/s computers with increased heterogeneity and low-precision focus. We present a paradigm shift that solves the multiscale/multiphysics/heterogeneity challenge harnessing hardware heterogeneity and low-precision arithmetic. Divide-conquer recombine algorithms divide the problem into not only spatial but also physical subproblems of small dynamic ranges and minimal mutual information, which are mapped onto best-characteristics matching hardware units, while metamodel-space algebra minimizes communication and precision requirements.
Keyword(s): IXPUG Annual Conference 2025,Quantum Molecular Dynamics,High Performance Computing,Multiscale light-matter dynamics
Author(s): Nariman Piroozan Video(s): Read more | |IXPUG Annual Conference 2025 technical talk "Leveraging HIP on Intel PVC" presented by Brice Videau, Argonne Leadership Computing Facility, Argonne National Laboratory on April 15, 2025.
Keyword(s): IXPUG Annual Conference 2025,HIP,PVC,Aurora,OpenCL,Level-Zero
Author(s): Brice Videau Video(s): Read more | |IXPUG Annual Conference 2025 technical talk "Scaling Molecular Dynamics Simulations on Aurora with NAMD" presented by David Hardy, University of Illinois at Urbana-Champaign on April 15, 2025. Co-Authors: Eric Bohm, University of Illinois at Urbana-Champaign; Ke Yue, Intel Corporation; Wei Jiang, Argonne National Laboratory. Abstract: Molecular dynamics simulations serve as a bridge between structural data and biological function, offering an atomic-scale view of the mechanistic foundations of life. The parallel molecular dynamics application NAMD is capable of scaling simulations to tens of thousands of CPU cores and thousands of GPUs, providing a high-performance "computational microscope" to biomedical researchers that can leverage the capabilities of the Aurora supercomputer. The NAMD team in collaboration with Intel has been developing a oneAPI/SYCL code path to fully exploit the Intel Ponte Vecchio GPUs that power Aurora. In this presentation, we will review the past NAMD SYCL development efforts, show NAMD scaling benchmarks on Aurora, and discuss our ongoing work to improve single- and multi-node performance.
Keyword(s): IXPUG Annual Conference 2025,molecular dynamics,high performance computing,GPU computing,Intel oneAPI/SYCL,Intel Ponte Vecchio GPU,Aurora supercomputer
Author(s): David Hardy, Eric Bohm, Ke Yue, Wei Jiang Video(s): Read more | |