Picos: A hardware runtime architecture support for OmpSs

Yazdanpanah Ahmadabadi, Fahimeh; Álvarez Martínez, Carlos; Jiménez González, Daniel; Badia Sala, Rosa Maria; Valero Cortés, Mateo

doi:10.1016/j.future.2014.12.010

dc.contributor.author	Yazdanpanah Ahmadabadi, Fahimeh
dc.contributor.author	Álvarez Martínez, Carlos
dc.contributor.author	Jiménez González, Daniel
dc.contributor.author	Badia Sala, Rosa Maria
dc.contributor.author	Valero Cortés, Mateo
dc.contributor.other	Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.contributor.other	Barcelona Supercomputing Center
dc.date.accessioned	2016-09-13T08:26:01Z
dc.date.available	2018-01-02T01:31:02Z
dc.date.issued	2015-12
dc.identifier.citation	Yazdanpanah, F., Álvarez, C., Jiménez, D., Badia, Rosa M., Valero, M. Picos: A hardware runtime architecture support for OmpSs. "Future generation computer systems", Desembre 2015, vol. 53, p. 130-139.
dc.identifier.issn	0167-739X
dc.identifier.uri	http://hdl.handle.net/2117/89843
dc.description.abstract	OmpSs is a programming model that provides a simple and powerful way of annotating sequential programs to exploit heterogeneity and task parallelism based on runtime data dependency analysis, dataflow scheduling and out-of-order task execution; it has greatly influenced Version 4.0 of the OpenMP standard. The current implementation of OmpSs achieves those capabilities with a pure-software runtime library: Nanos++. Therefore, although powerful and easy to use, the performance benefits of exploiting fine-grained (pico) task parallelism are limited by the software runtime overheads. To overcome this handicap we propose Picos, an implementation of the Task Superscalar (TSS) architecture that provides hardware support to the OmpSs programming model. Picos is a novel hardware dataflow-based task scheduler that dynamically analyzes inter-task dependencies and identifies task-level parallelism at run-time. In this paper, we describe the Picos Hardware Design and the latencies of the main functionality of its components, based on the synthesis of their VHDL design. We have implemented a full cycle-accurate simulator based on those latencies to perform a design exploration of the characteristics and number of its components in a reasonable amount of time. Finally, we present a comparison of the Picos and Nanos++ runtime performance scalability with a set of real benchmarks. With Picos, a programmer can achieve ideal scalability using aggressive parallel strategies with a large number of fine granularity tasks.
dc.description.sponsorship	This work is supported by the Spanish Government through Programa Severo Ochoa (SEV-2011-0067), by the Spanish Ministry of Science and Technology through TIN2012-34557 project, by the Generalitat de Catalunya (contract 2009-SGR-980), by the European FP7 project TERAFLUX id. 249013 and by the European Research Council under the European Union’s 7th FP, ERC Grant Agreement number 321253. We also thank the Xilinx University Program for its hardware and software donations.
dc.format.extent	10 p.
dc.language.iso	eng
dc.publisher	Elsevier
dc.rights	Attribution-NonCommercial-NoDerivs 4.0 International License
dc.rights.uri	https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject	Àrees temàtiques de la UPC::Informàtica::Programació
dc.subject.lcsh	Parallel programming (Computer science)
dc.subject.other	Hardware implementation
dc.subject.other	Task scheduling
dc.subject.other	Dataflow execution
dc.subject.other	Parallel programming model
dc.subject.other	OmpSs
dc.subject.other	OpenMP
dc.title	Picos: A hardware runtime architecture support for OmpSs
dc.type	Article
dc.subject.lemac	Programació en paral·lel (Informàtica)
dc.contributor.group	Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
dc.identifier.doi	10.1016/j.future.2014.12.010
dc.description.peerreviewed	Peer Reviewed
dc.relation.publisherversion	http://www.sciencedirect.com/science/article/pii/S0167739X14002702
dc.rights.access	Open Access
local.identifier.drac	16673627
dc.description.version	Postprint (author's final draft)
dc.relation.projectid	info:eu-repo/grantAgreement/EC/FP7/321253/EU/Riding on Moore's Law/ROMOL
dc.relation.projectid	info:eu-repo/grantAgreement/EC/FP7/249013/EU/Exploiting dataflow parallelism in Teradevice Computing/TERAFLUX
local.citation.author	Yazdanpanah, F.; Álvarez, C.; Jiménez, D.; Badia, Rosa M.; Valero, M.
local.citation.publicationName	Future generation computer systems
local.citation.volume	53
local.citation.startingPage	130
local.citation.endingPage	139

Fitxers d'aquest items

Nom:: Picos_FGCS_Rev.pdf
Mida:: 445,5Kb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

Picos: A hardware runtime architecture support for OmpSs

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora