Mostra el registre d'ítem simple

dc.contributor.authorSubasi, Omer
dc.contributor.authorYalcin, Gulay
dc.contributor.authorZyulkyarov, Ferad
dc.contributor.authorUnsal, Osman Sabri
dc.contributor.authorLabarta Mancho, Jesús José
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.contributor.otherBarcelona Supercomputing Center
dc.date.accessioned2017-03-09T15:20:59Z
dc.date.available2017-03-09T15:20:59Z
dc.date.issued2016
dc.identifier.citationSubasi, O., Yalcin, G., Zyulkyarov, F., Unsal, O., Labarta, J. A runtime heuristic to selectively replicate tasks for application-specific reliability targets. A: IEEE International Conference on Cluster Computing. "2016 IEEE International Conference on Cluster Computing: 13-15 September 2016, Taipei, Taiwan: proceedings". Taipei: Institute of Electrical and Electronics Engineers (IEEE), 2016, p. 498-505.
dc.identifier.isbn978-1-5090-3653-0
dc.identifier.urihttp://hdl.handle.net/2117/102228
dc.description.abstractIn this paper we propose a runtime-based selective task replication technique for task-parallel high performance computing applications. Our selective task replication technique is automatic and does not require modification/recompilation of OS, compiler or application code. Our heuristic, we call App_FIT, selects tasks to replicate such that the specified reliability target for an application is achieved. In our experimental evaluation, we show that App FIT selective replication heuristic is low-overhead and highly scalable. In addition, results indicate that complete task replication is overkill for achieving reliability targets. We show that with App FIT, we can tolerate pessimistic exascale error rates with only 53% of the tasks being replicated.
dc.description.sponsorshipThis work was supported by FI-DGR 2013 scholarship and the European Community’s Seventh Framework Programme [FP7/2007-2013] under the Mont-blanc 2 Project (www.montblanc-project.eu), grant agreement no. 610402 and in part by the European Union (FEDER funds) under contract TIN2015-65316-P.
dc.format.extent8 p.
dc.language.isoeng
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)
dc.subjectÀrees temàtiques de la UPC::Informàtica::Arquitectura de computadors
dc.subject.lcshParallel processing (Electronic computers)
dc.subject.otherDataflow programming
dc.subject.otherSelective replication
dc.subject.otherHPC and exascale computing
dc.subject.otherTask parallelism
dc.titleA runtime heuristic to selectively replicate tasks for application-specific reliability targets
dc.typeConference report
dc.subject.lemacProcessament en paral·lel (Ordinadors)
dc.contributor.groupUniversitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
dc.identifier.doi10.1109/CLUSTER.2016.54
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttp://ieeexplore.ieee.org/document/7776550/
dc.rights.accessOpen Access
local.identifier.drac19770197
dc.description.versionPostprint (author's final draft)
dc.relation.projectidinfo:eu-repo/grantAgreement/MINECO//TIN2015-65316-P/ES/COMPUTACION DE ALTAS PRESTACIONES VII/
local.citation.authorSubasi, O.; Yalcin, G.; Zyulkyarov, F.; Unsal, O.; Labarta, J.
local.citation.contributorIEEE International Conference on Cluster Computing
local.citation.pubplaceTaipei
local.citation.publicationName2016 IEEE International Conference on Cluster Computing: 13-15 September 2016, Taipei, Taiwan: proceedings
local.citation.startingPage498
local.citation.endingPage505


Fitxers d'aquest items

Thumbnail

Aquest ítem apareix a les col·leccions següents

Mostra el registre d'ítem simple