• Approximating a Multi-Grid Solver 

      Le Fèvre, Valentin; Bautista-Gomez, Leonardo; Unsal, Osman; Casas, Marc (IEEE, 2019-02-14)
      Comunicació de congrés
      Accés obert
      Multi-grid methods are numerical algorithms used in parallel and distributed processing. The main idea of multigrid solvers is to speedup the convergence of an iterative method by reducing the problem to a coarser grid a ...
    • Exploring the capabilities of support vector machines in detecting silent data corruptions 

      Subasi, Omer; Di, Sheng; Bautista-Gomez, Leonardo; Balaprakash, Prasanna; Unsal, Osman Sabri; Labarta Mancho, Jesús José; Cristal Kestelman, Adrián; Krishnamoorthy, Sriram; Cappello, Franck (Elsevier, 2018-09)
      Article
      Accés obert
      As the exascale era approaches, the increasing capacity of high-performance computing (HPC) systems with targeted power and energy budget goals introduces significant challenges in reliability. Silent data corruptions ...
    • Monitoring strategies for scalable dynamic checkpointing 

      Perarnau, Swann; Bautista-Gomez, Leonardo (Institute of Electrical and Electronics Engineers (IEEE), 2017-04-06)
      Comunicació de congrés
      Accés obert
      Resilience is an important challenge for extreme-scale supercomputers. Failures in current supercomputers are assumed to be uniformly distributed in time. However, recent studies show that failures in high-performance ...
    • On the applicability of PEBS based online memory access tracking for heterogeneous memory management at scale 

      Roca Nonell, Aleix; Gerofi, Balazs; Bautista-Gomez, Leonardo; Martinet, Dominique; Beltran Querol, Vicenç; Ishikawa, Yutaka (Association for Computing Machinery (ACM), 2018-11)
      Comunicació de congrés
      Accés obert
      Operating systems have historically had to manage only a single type of memory device. The imminent availability of heterogeneous memory devices based on emerging memory technologies confronts the classic single memory ...
    • Performance Study of Non-volatile Memories on a High-End Supercomputer 

      Bautista-Gomez, Leonardo; Keller, Kai; Unsal, Osman (2019-01-25)
      Capítol de llibre
      Accés obert
      The first exa-scale supercomputers are expected to be operational in China, USA, Japan and Europe within the early 2020’s. This allows scientists to execute applications at extreme scale with more than 1018 floating point ...
    • Towards Ad Hoc Recovery for Soft Errors 

      Losada, Nuria; Bautista-Gomez, Leonardo; Keller, Kai; Unsal, Osman (IEEE, 2018-12-06)
      Comunicació de congrés
      Accés obert
      The coming exascale era is a great opportunity for high performance computing (HPC) applications. However, high failure rates on these systems will hazard the successful completion of their execution. Bit-flip errors in ...
    • Tutorials 

      Gavin, Lucas; Bautista-Gomez, Leonardo; Peña, Toni (Barcelona Supercomputing Center, 2018-04-24)
      Altres
      Accés obert
    • Unprotected computing: a large-scale study of DRAM raw error rate on a supercomputer 

      Bautista-Gomez, Leonardo; Zyulkyarov, Ferad; Unsal, Osman; McIntosh-Smith, Simon (ACM, 2016-11-13)
      Comunicació de congrés
      Accés obert
      Supercomputers offer new opportunities for scientific computing as they grow in size. However, their growth also poses new challenges. Resilience has been recognized as one of the most pressing issues to solve for extreme ...