February 15, 2024
Conference Paper

A Task Based Approach for Co-Scheduling Ensemble Workloads on Heterogeneous Nodes

Abstract

Scientific workflows consist of multiple, connected applications, with data and results flowing from one to another in a pipeline. Traditionally, such workflows are executed in sequential order, storing intermediate data in storage disks. Co-scheduling application workflows concurrently on the same compute nodes would greatly reduce the cost of moving data to/from storage and allow real-time analysis of intermediate results. Nevertheless, most parallel programming runtimes do not allow seamless integration of various applications in a scientific workflow, in part due to the complexity of managing data and resources. The situation is even more complicated for heterogeneous systems. In this work we extend the Minos Computing Library (MCL) runtime to accelerate pipe-lined and parallel workloads where multiple applications are running in the same system. MCL’s asynchronous task library and runtime dynamically manages resources to allow co-scheduling of multiple processes sharing heterogeneous resources. In addition, we design a custom ex- tension of the Open Compute Language (OpenCL) to enable multiple processes to share device memory. We enable MCL to coordinate these shared buffers to allow for easy, fast data sharing between applications. Using malleable micro-benchmarks and two application workflows that combine scientific simulation and AI-based analysis, we show that our method outperforms traditional approaches.

Published: February 15, 2024

Citation

Kamatar A.V., R.D. Friese, and R. Gioiosa. 2023. A Task Based Approach for Co-Scheduling Ensemble Workloads on Heterogeneous Nodes. In IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW 2023), May 15-19, 2023, St. Petersburg, FL, 6-15. Piscataway, New Jersey:IEEE. PNNL-SA-182931. doi:10.1109/IPDPSW59300.2023.00015