Now showing items 1-4 of 4

    • Exploring the capabilities of support vector machines in detecting silent data corruptions 

      Subasi, Omer; Di, Sheng; Bautista-Gomez, Leonardo; Balaprakash, Prasanna; Unsal, Osman Sabri; Labarta Mancho, Jesús José; Cristal Kestelman, Adrián; Krishnamoorthy, Sriram; Cappello, Franck (Elsevier, 2018-09)
      Article
      Open Access
      As the exascale era approaches, the increasing capacity of high-performance computing (HPC) systems with targeted power and energy budget goals introduces significant challenges in reliability. Silent data corruptions ...
    • Spatial support vector regression to detect silent errors in the exascale era 

      Subasi, Omer; Di, Sheng; Bautista Gómez, Leonardo; Balaprakash, Prasanna; Unsal, Osman Sabri; Labarta Mancho, Jesús José; Cristal Kestelman, Adrián; Cappello, Franck (Institute of Electrical and Electronics Engineers (IEEE), 2016)
      Conference report
      Open Access
      As the exascale era approaches, the increasing capacity of high-performance computing (HPC) systems with targeted power and energy budget goals introduces significant challenges in reliability. Silent data corruptions ...
    • Unified fault-tolerance framework for hybrid task-parallel message-passing applications 

      Subasi, Omer; Martsinkevich, Tatiana; Zyulkyarov, Ferad; Unsal, Osman Sabri; Labarta Mancho, Jesús José; Cappello, Franck (SAGE Publications, 2016-09-26)
      Article
      Open Access
      We present a unified fault-tolerance framework for task-parallel message-passing applications to mitigate transient errors. First, we propose a fault-tolerant message-logging protocol that only requires the restart of the ...
    • Use cases of lossy compression for floating-point data 

      Cappello, Franck (Barcelona Supercomputing Center, 2019)
      Other
      Open Access