Show simple item record

dc.contributor.authorEtsion, Yoav
dc.contributor.authorCabarcas, Felipe
dc.contributor.authorRico Carro, Alejandro
dc.contributor.authorRamírez Bellido, Alejandro
dc.contributor.authorBadia Sala, Rosa Maria
dc.contributor.authorAyguadé Parra, Eduard
dc.contributor.authorLabarta Mancho, Jesús José
dc.contributor.authorValero Cortés, Mateo
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.date.accessioned2011-02-21T11:21:48Z
dc.date.available2011-02-21T11:21:48Z
dc.date.created2010
dc.date.issued2010
dc.identifier.citationEtsion, Y. [et al.]. Task superscalar: an out-of-order task pipeline. A: IEEE/ACM International Symposium on Microarchitecture. "43rd Annual ACM/IEEE International Symposium on Microarchitecture". Atlanta: IEEE Computer Society Publications, 2010, p. 89-100.
dc.identifier.isbn978-0-7695-4299-7
dc.identifier.urihttp://hdl.handle.net/2117/11445
dc.description.abstractWe present Task Superscalar, an abstraction of instruction-level out-of-order pipeline that operates at the tasklevel. Like ILP pipelines, which uncover parallelism in a sequential instruction stream, task superscalar uncovers tasklevel parallelism among tasks generated by a sequential thread. Utilizing intuitive programmer annotations of task inputs and outputs, the task superscalar pipeline dynamically detects intertask data dependencies, identifies task-level parallelism, and executes tasks out-of-order. Furthermore, we propose a design for a distributed task superscalar pipeline frontend, that can be embedded into any manycore fabric, and manages cores as functional units. We show that our proposed mechanism is capable of driving hundreds of cores simultaneously with non-speculative tasks, which allows our pipeline to sustain work windows consisting of tens of thousands of tasks. We further show that our pipeline can maintain a decode rate faster than 60ns per task and dynamically uncover data dependencies among as many as ~50,000 in-flight tasks, using 7MB of on-chip eDRAM storage. This configuration achieves speedups of 95–255x (average 183x) over sequential execution for nine scientific benchmarks, running on a simulated CMP with 256 cores. Task superscalar thus enables programmers to exploit manycore systems effectively, while simultaneously simplifying their programming model.
dc.format.extent12 p.
dc.language.isoeng
dc.publisherIEEE Computer Society Publications
dc.subjectÀrees temàtiques de la UPC::Informàtica::Arquitectura de computadors::Arquitectures paral·leles
dc.subject.lcshData structures
dc.subject.lcshParallel programming (Computer science)
dc.subject.lcshTask analysis
dc.titleTask superscalar: an out-of-order task pipeline
dc.typeConference report
dc.subject.lemacEstructures de dades (Informàtica)
dc.subject.lemacProgramació paral·lela (Informàtica)
dc.contributor.groupUniversitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
dc.identifier.doi10.1109/MICRO.2010.13
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttp://portal.acm.org/ft_gateway.cfm?id=1935014&type=pdf&CFID=8469401&CFTOKEN=60724531
dc.rights.accessOpen Access
local.identifier.drac4983284
dc.description.versionPostprint (published version)
dc.relation.projectidinfo:eu-repo/grantAgreement/EC/FP7/249013/EU/Exploiting dataflow parallelism in Teradevice Computing/TERAFLUX
local.citation.authorEtsion, Y.; Cabarcas, F.; Rico, A.; Alex Ramirez; Badia, R.; Ayguade, E.; Labarta, J.; Valero, M.
local.citation.contributorIEEE/ACM International Symposium on Microarchitecture
local.citation.pubplaceAtlanta
local.citation.publicationName43rd Annual ACM/IEEE International Symposium on Microarchitecture
local.citation.startingPage89
local.citation.endingPage100


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record