General purpose task-dependence management hardware for task-based dataflow programming models
View/Open
General Purpose Task-Dependence Management Hardware for Task-based Dataflow.pdf (509,3Kb) (Restricted access)
Request copy
Què és aquest botó?
Aquest botó permet demanar una còpia d'un document restringit a l'autor. Es mostra quan:
- Disposem del correu electrònic de l'autor
- El document té una mida inferior a 20 Mb
- Es tracta d'un document d'accés restringit per decisió de l'autor o d'un document d'accés restringit per política de l'editorial
Cita com:
hdl:2117/107374
Document typeConference report
Defense date2017
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Rights accessRestricted access - publisher's policy
All rights reserved. This work is protected by the corresponding intellectual and industrial
property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public
communication or transformation of this work are prohibited without permission of the copyright holder
ProjectCOMPUTACION DE ALTAS PRESTACIONES VII (MINECO-TIN2015-65316-P)
ROMOL - Riding on Moore's Law (EC-FP7-321253)
ROMOL - Riding on Moore's Law (EC-FP7-321253)
Abstract
Task-based programming models such as OpenMP, IntelTBB and OmpSs offer the possibility of expressing dependences among tasks to drive their execution at runtime. Managing these dependences introduces noticeable overheads when targeting fine-grained tasks, diminishing the potential speedups or even introducing performance losses. To overcome this drawback, we present a general purpose hardware accelerator, Picos++, to manage the inter-task dependences efficiently in both time and energy. Our design also includes a novel nested task support. To this end, a new hardware/software co-design is presented to overcome the fact that nested tasks with dependences could result in system deadlocks due to the limited amount of resources in hardware task dependence managers. In this paper we describe a detailed implementation of this design and evaluate a parallel task-based programming model using Picos++ in a Linux embedded system with two ARM Cortex-A9 and a FPGA. The scalability and energy consumption of the real system implemented have been studied and compared against a software runtime. Even in a system limited to 2 threads, using Picos++ results in more than 1.8x speedup and 40% of energy savings in the most demanding parallelizations of real benchmarks. As a matter of fact, a hardware task dependence manager should be able to achieve much higher speedup and provide more energy savings with more threads.
CitationTan, X., Bosch, J., Vidal, M., Alvarez, C., Jimenez-Gonzalez, D., Ayguade, E., Valero, M. General purpose task-dependence management hardware for task-based dataflow programming models. A: IEEE International Parallel and Distributed Processing Symposium. "2017 IEEE 31st International Parallel and Distributed Processing Symposium: 29 May–2 June 2017, Orlando, Florida: proceedings". Orlando, Florida: Institute of Electrical and Electronics Engineers (IEEE), 2017, p. 244-253.
ISBN978-1-5386-3914-6
Publisher versionhttp://ieeexplore.ieee.org/abstract/document/7967114/
Files | Description | Size | Format | View |
---|---|---|---|---|
General Purpose ... or Task-based Dataflow.pdf | 509,3Kb | Restricted access |