• Programmer-directed partial redundancy for resilient HPC 

      Subasi, Omer; Arias Moreno, Francisco Javier; Unsal, Osman Sabri; Labarta Mancho, Jesús José; Cristal Kestelman, Adrián (Association for Computing Machinery (ACM), 2015)
      Text en actes de congrés
      Accés restringit per política de l'editorial
      In this work we propose partial task replication and check-pointing for task-parallel HPC applications to mitigate silent data corruption (SDC) errors. As the complete replication of all application tasks can be prohibitive ...