PNNL Talks

Toward a Seamless Transition to Virtual Teams

Presentation: Tuesday, November 17, 12:30 p.m. PT/ 3:30 p.m. ET, Track 7

The current pandemic has thrust the globe into a rapid transition to remote work. On top of acclimating the workforce to pandemic-driven changes, companies and organizations have the greater challenge of building effective teams.

At SC20, PNNL computer scientist Mahantesh Halappanavar and advisor Katherine E. Wolf will present with Sandia National Laboratories’ Elaine Raybourn on strategies to support the transition to virtual teams.

“Since teams form a basic unit of an organization to accomplish a common goal, it is necessary to build effective teams to have a lasting impact on science and the future of our world,” Halappanavar said.

Their presentation is part of the State of the Practice talks, which offer practical, up-to-the-minute improvements to augment standard technical presentations to address a wide variety of topics of interest to the attendees.

Scalable yet Rigorous Floating-Point Error Analysis

Sriram Krishnamoorthy poses with PNNL supercomputer

Paper: Wednesday, November 18, 8 a.m. PT/11 a.m. ET, Track 4

Automated techniques for rigorous floating-point round-off error analysis are a prerequisite to placing important activities in high performance computing (HPC). These techniques include precision allocation, verification, and code optimization on a formal footing. Yet, existing techniques cannot provide tight bounds for expressions beyond a few dozen operators—barely enough for HPC.

PNNL's Sriram Krishnamoorthy and University of Utah's Arnab, Ganesh Gopalakrishnan, Ian Briggs, and Pavel Panchekha offer an approach embedded in a new tool called SATIRE that scales error analysis by four orders of magnitude compared to today’s best-of-class tools. Their research, which won best student paper at SC20, explains how three key ideas underlying SATIRE helps it attain such scale: path strength reduction, bound optimization, and abstraction. SATIRE provides tight bounds and rigorous guarantees on significantly larger expressions with well over a hundred thousand operators, covering important examples including FFT, matrix multiplication, and PDE stencils. SATIRE enables rigorous analysis and optimization of larger floating-point intensive applications that feasible in the past.

miniVite + Metall: A Case Study of Accelerating Graph Analytics Using Persistent Memory Allocator

Poster: Thursday, November 19 5:30 a.m.-2 p.m. PT/8:30 a.m.-5 p.m. ET

Performance improvements and cost reductions are bringing improvements to memory technology.

One promising option is miniVite, a distributed-memory graph community detection and clustering mini-application that supports in-memory geometric graph generation. PNNL’s Mahantesh Halappanavar and Sayan Ghosh are presenting a poster with Roger Pearce, Maya Gokhale, and Keita Iwabuchi on miniVite and their experiments, which include improvements of up to 85× and 65× on NERSC Cori and OLCF Summit supercomputers.

Scalable Heterogeneous Execution of a Coupled-Cluster Model with Perturbative Triples

Paper: Thursday, November 19 7 a.m. PT/10 ET, Track 2

The CCSD(T) coupled-cluster model with perturbative triples is considered a gold standard for computational modeling of the correlated behavior of electrons in molecular systems. A fundamental constraint is the relatively small global-memory capacity in GPUs compared to the main-memory capacity on host nodes, necessitating relatively smaller tile sizes for high- dimensional tensor contractions in NWChem’s GPU-accelerated implementation of the CCSD(T) method. A research team, made up of PNNL's Sriram Krishnamoorthy, Bo Peng, Karol Kowalski, and Ajay Panyala, and the University of Utah's Ponnuswamy Sadayappan and Jinsung Kim presented a coordinated redesign to address this limitation and associated data movement overheads, including a novel fused GPU kernel for a set of tensor contractions, along with inter-node communication optimization and data caching. The new implementation of GPU- accelerated CCSD(T) improves overall performance by 3.4×. Finally, they discussed the trade-offs in using this fused algorithm on current and future supercomputing platforms. This approach allows more effective utilization of current and upcoming accelerator-based high performance computing systems, including the U.S. Department Of Energy supercomputers, than have been studied in the past.

PNNL @ SC20

PNNL Talks

Toward a Seamless Transition to Virtual Teams

Scalable yet Rigorous Floating-Point Error Analysis

miniVite + Metall: A Case Study of Accelerating Graph Analytics Using Persistent Memory Allocator

Scalable Heterogeneous Execution of a Coupled-Cluster Model with Perturbative Triples