Automatic aggregation of subtask accesses for nested OpenMP-style tasks
View/Open
Cita com:
hdl:2117/384603
Document typeConference report
Defense date2022
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Rights accessOpen Access
All rights reserved. This work is protected by the corresponding intellectual and industrial
property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public
communication or transformation of this work are prohibited without permission of the copyright holder
ProjectEuroEXA - Co-designed Innovation and System for Resilient Exascale Computing in Europe: From Applications to Silicon (EC-H2020-754337)
BSC - COMPUTACION DE ALTAS PRESTACIONES VIII (AEI-PID2019-107255GB-C21)
BSC - COMPUTACION DE ALTAS PRESTACIONES VIII (AEI-PID2019-107255GB-C21)
Abstract
Task-based programming is a high performance and productive model to express parallelism. Tasks encapsulate work to be executed across multiple cores or offloaded to GPUs, FPGAs, other accelerators or other nodes. In order to maintain parallelism and afford maximum freedom to the scheduler, the task dependency graph should be created in parallel and well in advance of task execution. A key limitation with OpenMP and OmpSs-2 tasking is that a task cannot be created until all its accesses and its descendents' accesses are known. Current approaches to work around this limitation either stop task creation and execution using a taskwait or they substitute “fake” accesses known as sentinels. This paper proposes the auto clause, which indicates that the task may create subtasks that access unspecified memory regions or it may allocate and return memory at addresses that are of course not yet known. Unlike approaches using taskwaits, there is no interruption to the concurrent creation and execution of tasks, maintaining parallelism and the scheduler's ability to optimize load balance and data locality. Unlike existing approaches using sentinels, all tasks can be given a precise specification of their own data accesses, so that a single mechanism is used to control task ordering, program data transfers on distributed memory and optimize data locality, e.g. on NUMA systems. The auto clause also provides an incremental path to develop programs with nested tasks, by removing the need for every parent task to have a complete specification of the accesses of its descendent tasks. This is redundant information that can be time consuming and error-prone to describe. We present a straightforward runtime implementation that achieves a 1.4 times speedup for n-body with OmpSs-2@Cluster task offloading to 32 nodes and <4% slowdown for three benchmarks with task offloading to 8 nodes. All code is open source.
CitationAli, O. [et al.]. Automatic aggregation of subtask accesses for nested OpenMP-style tasks. A: International Symposium on Computer Architecture and High Performance Computing. "2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2022: 2nd-4th November 2022, Bordeaux, France: proceedings". Institute of Electrical and Electronics Engineers (IEEE), 2022, p. 315-325. ISBN 978-1-6654-5155-0. DOI 10.1109/SBAC-PAD55451.2022.00042.
ISBN978-1-6654-5155-0
Publisher versionhttps://ieeexplore.ieee.org/document/9980957
Collections
- Doctorat en Arquitectura de Computadors - Ponències/Comunicacions de congressos [310]
- Computer Sciences - Ponències/Comunicacions de congressos [597]
- CAP - Grup de Computació d'Altes Prestacions - Ponències/Comunicacions de congressos [784]
- Departament d'Arquitectura de Computadors - Ponències/Comunicacions de congressos [1.976]
- PM - Programming Models - Ponències/Comunicacions de congressos [16]
Files | Description | Size | Format | View |
---|---|---|---|---|
shaaban2022sbacpad.pdf | 1,273Mb | View/Open |