Programmable and scalable reductions on clusters

Ciesko, Jan; Bueno Hedo, Javier; Puzovic, Nikola; Ramírez Bellido, Alejandro; Badia Sala, Rosa Maria; Labarta Mancho, Jesús José

doi:10.1109/IPDPS.2013.63

dc.contributor.author	Ciesko, Jan
dc.contributor.author	Bueno Hedo, Javier
dc.contributor.author	Puzovic, Nikola
dc.contributor.author	Ramírez Bellido, Alejandro
dc.contributor.author	Badia Sala, Rosa Maria
dc.contributor.author	Labarta Mancho, Jesús José
dc.contributor.other	Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.contributor.other	Barcelona Supercomputing Center
dc.date.accessioned	2014-06-17T09:13:00Z
dc.date.created	2013
dc.date.issued	2013
dc.identifier.citation	Ciesko, J. [et al.]. Programmable and scalable reductions on clusters. A: IEEE International Parallel and Distributed Processing Symposium. "IEEE 27th International Parallel and Distributed Processing Symposium: 20–24 May 2013, Boston, Massachusetts: proceedings". Boston: Institute of Electrical and Electronics Engineers (IEEE), 2013, p. 560-568.
dc.identifier.isbn	978-0-7685-4971-2
dc.identifier.uri	http://hdl.handle.net/2117/23241
dc.description.abstract	Reductions matter and they are here to stay. Wide adoption of parallel processing hardware in a broad range of computer applications has encouraged recent research efforts on their efficient parallelization. Furthermore, trends towards high productivity languages in mainstream computing increases the demand for efficient programming support. In this paper we present a new approach on parallel reductions for distributed memory systems that provides both scalability and programmability. Using OmpSs, a task-based parallel programming model, the developer has the ability to express scalable reductions through a single pragma annotation. This pragma annotation is applicable for tasks as well as for work-sharing constructs (with implicit tasking) and instructs the compiler to generate the required runtime calls. The supporting runtime handles data and task distribution, parallel execution and data reduction. Scalability is achieved through a software cache that maximizes local and temporal data reuse and allows overlapped computation and communication. Results confirm scalability for up to 32 12-core cluster nodes.
dc.description.sponsorship	We thankfully acknowledge the support of the European Commission through the ENCORE project (FP7-248647), the TERAFLUX project (FP7-249013), the TEXT project (FP7- 261580), and the HiPEAC-3 Network of Excellence (FP7/ICT217068), further the support of Intel-BSC Exascale Center, the Spanish Ministry of Education (TIN2007-60625, CSD2007-00050 and FPU program) and the Generalitat de Catalunya (2009-SGR-980).
dc.format.extent	9 p.
dc.language.iso	eng
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)
dc.subject	Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors::Arquitectures paral·leles
dc.subject.lcsh	Parallel programming (Computer science)
dc.subject.other	Distributed systems
dc.subject.other	Parallel programming
dc.subject.other	Reductions
dc.subject.other	Runtime systems
dc.subject.other	Software cache
dc.title	Programmable and scalable reductions on clusters
dc.type	Conference report
dc.subject.lemac	Programació en paral·lel (Informàtica)
dc.contributor.group	Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
dc.identifier.doi	10.1109/IPDPS.2013.63
dc.rights.access	Restricted access - publisher's policy
local.identifier.drac	12857456
dc.description.version	Postprint (published version)
dc.relation.projectid	info:eu-repo/grantAgreement/MEC//TIN2007-60625/ES/COMPUTACION DE ALTAS PRESTACIONES V/
dc.relation.projectid	info:eu-repo/grantAgreement/EC/FP7/249013/EU/Exploiting dataflow parallelism in Teradevice Computing/TERAFLUX
dc.date.lift	10000-01-01
local.citation.author	Ciesko, J.; Bueno, J.; Puzovic, N.; Alex Ramirez; Badia, R.M.; Labarta, J.
local.citation.contributor	IEEE International Parallel and Distributed Processing Symposium
local.citation.pubplace	Boston
local.citation.publicationName	IEEE 27th International Parallel and Distributed Processing Symposium: 20–24 May 2013, Boston, Massachusetts: proceedings
local.citation.startingPage	560
local.citation.endingPage	568

Fitxers d'aquest items

Nom:: Ciesko.pdf
Mida:: 249,2Kb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Ponències/Comunicacions de congressos [574]
Ponències/Comunicacions de congressos [784]
Ponències/Comunicacions de congressos [1.954]

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

Programmable and scalable reductions on clusters

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora