Asynchronous runtime with distributed manager for task-based programming models
Fitxers
Títol de la revista
ISSN de la revista
Títol del volum
Col·laborador
Editor
Tribunal avaluador
Realitzat a/amb
Tipus de document
Data publicació
Editor
Condicions d'accés
item.page.rightslicense
Publicacions relacionades
Datasets relacionats
Projecte CCD
Projecte
info:eu-repo/grantAgreement/AGAUR/2017 SGR 1414
info:eu-repo/grantAgreement/AGAUR/2017-SGR-1328
EPEEC - European joint Effort toward a Highly Productive Programming Environment for Heterogeneous Exascale Computing (EPEEC) (EC-H2020-801051)
EuroEXA - Co-designed Innovation and System for Resilient Exascale Computing in Europe: From Applications to Silicon (EC-H2020-754337)
LEGaTO - Low Energy Toolset for Heterogeneous Computing (EC-H2020-780681)
BARCELONA SUPERCOMPUTING CENTER - CENTRO. NACIONAL DE SUPERCOMPUTACION (MINECO-SEV-2015-0493)
Abstract
Parallel task-based programming models, like OpenMP, allow application developers to easily create a parallel version of their sequential codes. The standard OpenMP 4.0 introduced the possibility of describing a set of data dependences per task that the runtime uses to order the tasks execution. This order is calculated using shared graphs, which are updated by all threads in exclusive access using synchronization mechanisms (locks) to ensure the dependence management correctness. The contention in the access to these structures becomes critical in many-core systems because several threads may be wasting computation resources waiting their turn. This paper proposes an asynchronous management of the runtime structures, like task dependence graphs, suitable for task-based programming model runtimes. In such organization, the threads request actions to the runtime instead of doing them directly. The requests are then handled by a distributed runtime manager (DDAST) which does not require dedicated resources. Instead, the manager uses the idle threads to modify the runtime structures. The paper also presents an implementation, analysis and performance evaluation of such runtime organization. The performance results show that the proposed asynchronous organization outperforms the speedup obtained by the original runtime for different benchmarks and different many-core architectures.


