October 27, 2023
Conference Paper

Design and Evaluation of GPU-FPX: A Low-Overhead tool for Floating-Point Exception Detection in NVIDIA GPUs

Abstract

Despite the growing uptake of GPUs, the fact that they do not support floating-point exceptions in hardware is a significant concern. In this work, we develop a technique to efficiently mine these exceptions from GPU programs through a novel tool, GPU-FPX. In our unique study of 115 programs, we discovered 22 unhandled exceptions including NaN, Infinity, and divide-by-0. We also investigate how these exceptions flow through the application logic, how exceptions are affected by compiler optimizations, and also result-reproducibility. GPU-FPX uses a binary-instrumentation method that allows it to collect and process exceptions on the GPU even from libraries without sources. Our work paves the way to efficient floating-point exception checking essential for trustworthy accelerated applications.

Published: October 27, 2023

Citation

Li X., I. Laguna, B. Fang, K. Swirydowicz, A. Li, and G. Gopalakrishnan. 2023. Design and Evaluation of GPU-FPX: A Low-Overhead tool for Floating-Point Exception Detection in NVIDIA GPUs. In Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing (HPDC 2023), June 16-23, 2023, Orlando, FL, 59–71. New York, New York:Association for Computing Machinery. PNNL-SA-171816. doi:10.1145/3588195.3592991