Now showing items 1-8 of 8

    • Another trip to the wall: how much will stacked DRAM benefit HPC? 

      Radulovic, Milan; Živanovič, Darko; Ruiz, Daniel; De Supinski, Bronis; McKee, Sally; Radojkovic, Petar; Ayguadé Parra, Eduard (Association for Computing Machinery (ACM), 2015)
      Conference report
      Restricted access - publisher's policy
      First defined two decades ago, the memory wall remains a fundamental limitation to system performance. Recent innovations in 3D-stacking technology enable DRAM devices with much higher bandwidths than traditional DIMMs. ...
    • Cost-aware prediction of uncorrected DRAM errors in the field 

      Boixaderas Coderch, Isaac; Živanovič, Darko; Moré Codina, Sergi; Bartolomé Rodríguez, Javier; Vicente Dorca, David; Casas Guix, Marc; Carpenter, Paul Matthew; Radojkovic, Petar; Ayguadé Parra, Eduard (Institute of Electrical and Electronics Engineers (IEEE), 2020)
      Conference report
      Open Access
      This paper presents and evaluates a method to predict DRAM uncorrected errors, a leading cause of hardware failures in large-scale HPC clusters. The method uses a random forest classifier, which was trained and evaluated ...
    • DRAM errors in the field: a statistical approach 

      Živanovič, Darko; Esmaili Dokht, Pouya; Moré, Sergi; Bartolomé, Javier; Carpenter, Paul Matthew; Radojkovic, Petar; Ayguadé Parra, Eduard (Association for Computing Machinery (ACM), 2019)
      Conference report
      Open Access
      This paper summarizes our two-year study of corrected and uncor-rected errors on the MareNostrum 3 supercomputer, covering 2000 billion MB-hours of DRAM in the field. The study analyzes 4.5 million corrected and 71 uncorrected ...
    • Large-memory nodes for energy efficient high-performance computing 

      Živanovič, Darko; Radulovic, Milan; Llort, German; Zaragoza, David; Strassburg, Janko; Carpenter, Paul M.; Radojkovic, Petar; Ayguadé Parra, Eduard (Association for Computing Machinery (ACM), 2016)
      Conference report
      Open Access
      Energy consumption is by far the most important contributor to HPC cluster operational costs, and it accounts for a significant share of the total cost of ownership. Advanced energy-saving techniques in HPC components have ...
    • Main memory in HPC: do we need more or could we live with less? 

      Živanovič, Darko; Radojković, Petar; Ayguadé Parra, Eduard (Barcelona Supercomputing Center, 2017-05-04)
      Conference report
      Open Access
      This study analyzes the memory capacity requirements of important HPC benchmarks and applications. We find that the High Performance Conjugate Gradients benchmark could be an important success story for 3D-stacked memories ...
    • Main memory in HPC: do we need more, or could we live with less? 

      Živanovič, Darko; Pavlovic, Milan; Radulovic, Milan; Shin, Hyunsung; Son, Jongpil; McKee, Sally A.; Carpenter, Paul M.; Radojkovic, Petar; Ayguadé Parra, Eduard (2017-03)
      Article
      Open Access
      An important aspect of High-Performance Computing (HPC) system design is the choice of main memory capacity. This choice becomes increasingly important now that 3D-stacked memories are entering the market. Compared with ...
    • Mainstream vs. emerging HPC: metrics, trade-offs and lessons learned 

      Radulović, Milan; Asifuzzaman, Kazi; Živanovič, Darko; Rajovic, Nikola; Colin de Verdiére, Guillaume; Pleiter, Dirk; Marazakis, Manolis; Kallimanis, Nikolaos; Carpenter, Paul Matthew; Radojkovic, Petar; Ayguadé Parra, Eduard (Institute of Electrical and Electronics Engineers (IEEE), 2018)
      Conference report
      Open Access
      Various servers with different characteristics and architectures are hitting the market, and their evaluation and comparison in terms of HPC features is complex and multidimensional. In this paper, we share our experience ...
    • Memory systems for high-performance computing: the capacity and reliability implications 

      Živanovič, Darko (Universitat Politècnica de Catalunya, 2018-07-02)
      Doctoral thesis
      Open Access
      Memory systems are signicant contributors to the overall power requirements, energy consumption, and the operational cost of large high-performance computing systems (HPC). Limitations of main memory systems in terms of ...