Show simple item record

dc.contributor.authorArslan, Sanem
dc.contributor.authorUnsal, Osman Sabri
dc.contributor.otherBarcelona Supercomputing Center
dc.identifier.citationArslan, S.; Unsal, O.S. Efficient selective replication of critical code regions for SDC mitigation leveraging redundant multithreading. "Journal of Supercomputing", 2021,
dc.description.abstractRedundant multithreading (RMT) is an effective reliability solution that provides thread-level replication; however, it imposes additional overheads in terms of performance loss or energy consumption. Partial-RMT is an alternative solution that provides partial redundancy of an executing thread to reduce such overheads while trading off full coverage from faults. In this study, we propose a software-level RMT approach that offers lightweight replication of partial code regions within the same application process. Our software-level RMT approach is particularly suitable for applications with varying code criticality, where we determine the critical code regions by performing a fault injection campaign in addition to execution time profile analysis. Using the results of the previous step, the application programmer annotates the source code to indicate the specific code regions that should be executed redundantly without re-implementing the application program from scratch. Our lightweight software-level RMT tool improves the average silent data corruption (SDC) rate of 30 applications of the PolyBench benchmark suite by around 7.6× with average performance and energy consumption overheads of 22 and 37%, respectively, compared to the original version of the program.
dc.description.sponsorshipThis work was completed, while the first author, Sanem Arslan, was visiting researcher at Barcelona Supercomputing Center, Barcelona, Spain. Sanem Arslan had received financial support from the Scientific and Technological Research Council of Turkey (TUBITAK) under the program BIDEB 2219 during this work.
dc.format.extent31 p.
dc.subjectÀrees temàtiques de la UPC::Informàtica::Enginyeria del software
dc.subject.lcshFault tolerance (Engineering)
dc.subject.lcshMultitasking (Computer science)
dc.subject.lcshSoft errors (Computer science)
dc.subject.otherRedundant multithreading
dc.subject.otherFault tolerance
dc.subject.otherSoft error reliability
dc.subject.otherSoftware reliability
dc.titleEfficient selective replication of critical code regions for SDC mitigation leveraging redundant multithreading
dc.description.peerreviewedPeer Reviewed
dc.rights.accessRestricted access - publisher's policy
dc.description.versionPostprint (author's final draft)
local.citation.publicationNameJournal of Supercomputing

Files in this item


This item appears in the following Collection(s)

Show simple item record

All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder