• Designing and modelling selective replication for fault-tolerant HPC applications 

      Subasi, Omer; Yalcin, Gulay; Zyulkyarov, Ferad; Unsal, Osman Sabri; Labarta Mancho, Jesús José (Institute of Electrical and Electronics Engineers (IEEE), 2017)
      Text en actes de congrés
      Accés obert
      Fail-stop errors and Silent Data Corruptions (SDCs) are the most common failure modes for High Performance Computing (HPC) applications. There are studies that address fail-stop errors and studies that address SDCs. However ...
    • Diluting the Scalability Boundaries: Exploring the Use of Disaggregated Architectures for High-Level Network Data Analysis 

      Vega, Carlos; Zazo, Jose F.; Meyer, Hugo; Zyulkyarov, Ferad; Lopez-Buedo, S.; Aracil, Javier (IEEE, 2018-02-15)
      Comunicació de congrés
      Accés obert
      Traditional data centers are designed with a rigid architecture of fit-for-purpose servers that provision resources beyond the average workload in order to deal with occasional peaks of data. Heterogeneous data centers are ...
    • Disaggregated Computing. An Evaluation of Current Trends for Datacentres 

      Meyer, Hugo; Sancho, Jose C.; Quiroga, Josue V.; Zyulkyarov, Ferad; Roca, Damian; Nemirovsky, Mario (Elsevier, 2017)
      Article
      Accés obert
      Next generation data centers will likely be based on the emerging paradigm of disaggregated function-blocks-as-a-unit departing from the current state of mainboard-as-a-unit. Multiple functional blocks or bricks such as ...
    • dReDBox: A Disaggregated Architectural Perspective for Data Centers 

      Alachiotis, Nikolaos; Andronikakis, Andreas; Papadakis, Orion; Theodoropoulos, Dimitris; Pnevmatikatos, Dionisios; Syrivelis, Dimitris; Reale, Andrea; Katrinis, Kostas; Zervas, George; Mishra, Vaibhawa; Yuan, Hui; Syrigos, Ilias; Igoumenos, Ioannis; Korakis, Thanasis; Torrents, Marti; Zyulkyarov, Ferad (Springer, 2018-08-22)
      Capítol de llibre
      Accés obert
      Data centers are currently constructed with fixed blocks (blades); the hard boundaries of this approach lead to suboptimal utilization of resources and increased energy requirements. The dReDBox (disaggregated Recursive ...
    • QuakeTM: Parallelizing a complex serial application using transactional memory 

      Gajinov, Vladimir; Zyulkyarov, Ferad; Unsal, Osman Sabri; Cristal Kestelman, Adrián; Ayguadé Parra, Eduard; Harris, Tim; Valero Cortés, Mateo (2008-11)
      Report de recerca
      Accés obert
      'Is transactional memory useful?' is the question that cannot be answered until we provide substantial applications that can evaluate its capabilities. While existing TM applications can partially answer the above question, ...
    • Unified fault-tolerance framework for hybrid task-parallel message-passing applications 

      Subasi, Omer; Martsinkevich, Tatiana; Zyulkyarov, Ferad; Unsal, Osman Sabri; Labarta Mancho, Jesús José; Cappello, Franck (SAGE Publications, 2016-09-26)
      Article
      Accés obert
      We present a unified fault-tolerance framework for task-parallel message-passing applications to mitigate transient errors. First, we propose a fault-tolerant message-logging protocol that only requires the restart of the ...
    • Unprotected computing: a large-scale study of DRAM raw error rate on a supercomputer 

      Bautista-Gomez, Leonardo; Zyulkyarov, Ferad; Unsal, Osman; McIntosh-Smith, Simon (ACM, 2016-11-13)
      Comunicació de congrés
      Accés obert
      Supercomputers offer new opportunities for scientific computing as they grow in size. However, their growth also poses new challenges. Resilience has been recognized as one of the most pressing issues to solve for extreme ...