Show simple item record

dc.contributor.authorSubasi, Omer
dc.contributor.authorYalcin, Gulay
dc.contributor.authorZyulkyarov, Ferad
dc.contributor.authorUnsal, Osman Sabri
dc.contributor.authorLabarta Mancho, Jesús José
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.contributor.otherBarcelona Supercomputing Center
dc.identifier.citationSubasi, O., Yalcin, G., Zyulkyarov, F., Unsal, O., Labarta, J. A runtime heuristic to selectively replicate tasks for application-specific reliability targets. A: IEEE International Conference on Cluster Computing. "2016 IEEE International Conference on Cluster Computing: 13-15 September 2016, Taipei, Taiwan: proceedings". Taipei: Institute of Electrical and Electronics Engineers (IEEE), 2016, p. 498-505.
dc.description.abstractIn this paper we propose a runtime-based selective task replication technique for task-parallel high performance computing applications. Our selective task replication technique is automatic and does not require modification/recompilation of OS, compiler or application code. Our heuristic, we call App_FIT, selects tasks to replicate such that the specified reliability target for an application is achieved. In our experimental evaluation, we show that App FIT selective replication heuristic is low-overhead and highly scalable. In addition, results indicate that complete task replication is overkill for achieving reliability targets. We show that with App FIT, we can tolerate pessimistic exascale error rates with only 53% of the tasks being replicated.
dc.description.sponsorshipThis work was supported by FI-DGR 2013 scholarship and the European Community’s Seventh Framework Programme [FP7/2007-2013] under the Mont-blanc 2 Project (, grant agreement no. 610402 and in part by the European Union (FEDER funds) under contract TIN2015-65316-P.
dc.format.extent8 p.
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)
dc.subjectÀrees temàtiques de la UPC::Informàtica::Arquitectura de computadors
dc.subject.lcshParallel processing (Electronic computers)
dc.subject.otherDataflow programming
dc.subject.otherSelective replication
dc.subject.otherHPC and exascale computing
dc.subject.otherTask parallelism
dc.titleA runtime heuristic to selectively replicate tasks for application-specific reliability targets
dc.typeConference report
dc.subject.lemacProcessament en paral·lel (Ordinadors)
dc.contributor.groupUniversitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
dc.description.peerreviewedPeer Reviewed
dc.rights.accessOpen Access
dc.description.versionPostprint (author's final draft)
local.citation.authorSubasi, O.; Yalcin, G.; Zyulkyarov, F.; Unsal, O.; Labarta, J.
local.citation.contributorIEEE International Conference on Cluster Computing
local.citation.publicationName2016 IEEE International Conference on Cluster Computing: 13-15 September 2016, Taipei, Taiwan: proceedings

Files in this item


This item appears in the following Collection(s)

Show simple item record

All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder