Optimizing computation-communication overlap in asynchronous task-based programs
Document typeConference report
PublisherAssociation for Computing Machinery (ACM)
Rights accessOpen Access
Asynchronous task-based programming models are gaining popularity to address the programmability and performance challenges in high performance computing. One of the main attractions of these models and runtimes is their potential to automatically expose and exploit overlap of computation with communication. However, we find that inefficient interactions between these programming models and the underlying messaging layer (in most cases, MPI) limit the achievable computation-communication overlap and negatively impact the performance of parallel programs. We address this challenge by exposing and exploiting information about MPI internals in a task-based runtime system to make better task-creation and scheduling decisions. In particular, we present two mechanisms for exchanging information between MPI and a task-based runtime, and analyze their trade-offs. Further, we present a detailed evaluation of the proposed mechanisms implemented in MPI and a task-based runtime. We show performance improvements of up to 16.3% and 34.5% for proxy applications with point-to-point and collective communication, respectively.
CitationCastillo, E. [et al.]. Optimizing computation-communication overlap in asynchronous task-based programs. A: International Conference on Supercomputing. "ICS 2019: International Conference on Supercomputing: June 26-28, 2019, Phoenix, AZ". New York: Association for Computing Machinery (ACM), 2019, p. 380-391.
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder