Now showing items 1-8 of 8

  • Approximating a Multi-Grid Solver 

    Le Fèvre, Valentin; Bautista-Gomez, Leonardo; Unsal, Osman; Casas, Marc (IEEE, 2019-02-14)
    Conference lecture
    Open Access
    Multi-grid methods are numerical algorithms used in parallel and distributed processing. The main idea of multigrid solvers is to speedup the convergence of an iterative method by reducing the problem to a coarser grid a ...
  • Exploring the capabilities of support vector machines in detecting silent data corruptions 

    Subasi, Omer; Di, Sheng; Bautista-Gomez, Leonardo; Balaprakash, Prasanna; Unsal, Osman Sabri; Labarta Mancho, Jesús José; Cristal Kestelman, Adrián; Krishnamoorthy, Sriram; Cappello, Franck (Elsevier, 2018-09)
    Article
    Restricted access - publisher's policy
    As the exascale era approaches, the increasing capacity of high-performance computing (HPC) systems with targeted power and energy budget goals introduces significant challenges in reliability. Silent data corruptions ...
  • Monitoring strategies for scalable dynamic checkpointing 

    Perarnau, Swann; Bautista-Gomez, Leonardo (Institute of Electrical and Electronics Engineers (IEEE), 2017-04-06)
    Conference lecture
    Open Access
    Resilience is an important challenge for extreme-scale supercomputers. Failures in current supercomputers are assumed to be uniformly distributed in time. However, recent studies show that failures in high-performance ...
  • On the Applicability of PEBS based Online Memory Access Tracking for Heterogeneous Memory Management at Scale 

    Roca Nonell, Aleix; Gerofi, Balazs; Bautista-Gomez, Leonardo; Martinet, Dominique; Beltran, Vicenç; Ishikawa, Yutaka (Association for Computing Machinery (ACM), 2018-11)
    Conference lecture
    Open Access
    Operating systems have historically had to manage only a single type of memory device. The imminent availability of heterogeneous memory devices based on emerging memory technologies confronts the classic single memory ...
  • Performance Study of Non-volatile Memories on a High-End Supercomputer 

    Bautista-Gomez, Leonardo; Keller, Kai; Unsal, Osman (2019-01-25)
    Part of book or chapter of book
    Open Access
    The first exa-scale supercomputers are expected to be operational in China, USA, Japan and Europe within the early 2020’s. This allows scientists to execute applications at extreme scale with more than 1018 floating point ...
  • Towards Ad Hoc Recovery for Soft Errors 

    Losada, Nuria; Bautista-Gomez, Leonardo; Keller, Kai; Unsal, Osman (IEEE, 2018-12-06)
    Conference lecture
    Open Access
    The coming exascale era is a great opportunity for high performance computing (HPC) applications. However, high failure rates on these systems will hazard the successful completion of their execution. Bit-flip errors in ...
  • Tutorials 

    Gavin, Lucas; Bautista-Gomez, Leonardo; Peña, Toni (Barcelona Supercomputing Center, 2018-04-24)
    Other
    Open Access
  • Unprotected computing: a large-scale study of DRAM raw error rate on a supercomputer 

    Bautista-Gomez, Leonardo; Zyulkyarov, Ferad; Unsal, Osman; McIntosh-Smith, Simon (ACM, 2016-11-13)
    Conference lecture
    Open Access
    Supercomputers offer new opportunities for scientific computing as they grow in size. However, their growth also poses new challenges. Resilience has been recognized as one of the most pressing issues to solve for extreme ...