• A runtime heuristic to selectively replicate tasks for application-specific reliability targets 

    Subasi, Omer; Yalcin, Gulay; Zyulkyarov, Ferad; Unsal, Osman Sabri; Labarta Mancho, Jesús José (Institute of Electrical and Electronics Engineers (IEEE), 2016)
    Texto en actas de congreso
    Acceso abierto
    In this paper we propose a runtime-based selective task replication technique for task-parallel high performance computing applications. Our selective task replication technique is automatic and does not require ...
  • Circuit design of a novel adaptable and reliable L1 data cache 

    Seyedi, Azam; Yalcin, Gulay; Unsal, Osman Sabri; Cristal Kestelman, Adrián (2013)
    Texto en actas de congreso
    Acceso abierto
    This paper proposes a novel adaptable and reliable L1 data cache design (Adapcache) with the unique capability of automatically adapting itself for different supply voltage levels and providing the highest reliability. ...
  • CRC-based memory reliability for task-parallel HPC applications 

    Subasi, Omer; Unsal, Osman Sabri; Labarta Mancho, Jesús José; Yalcin, Gulay; Cristal Kestelman, Adrián (Institute of Electrical and Electronics Engineers (IEEE), 2016)
    Texto en actas de congreso
    Acceso restringido por política de la editorial
    Memory reliability will be one of the major concerns for future HPC and Exascale systems. This concern is mostly attributed to the expected massive increase in memory capacity and the number of memory devices in Exascale ...
  • Designing and modelling selective replication for fault-tolerant HPC applications 

    Subasi, Omer; Yalcin, Gulay; Zyulkyarov, Ferad; Unsal, Osman Sabri; Labarta Mancho, Jesús José (Institute of Electrical and Electronics Engineers (IEEE), 2017)
    Texto en actas de congreso
    Acceso abierto
    Fail-stop errors and Silent Data Corruptions (SDCs) are the most common failure modes for High Performance Computing (HPC) applications. There are studies that address fail-stop errors and studies that address SDCs. However ...
  • Designs for increasing reliability while reducing energy and increasing lifetime 

    Yalcin, Gulay (Universitat Politècnica de Catalunya, 2014-12-12)
    Tesis
    Acceso abierto
    In the last decades, the computing technology experienced tremendous developments. For instance, transistors' feature size shrank to half at every two years as consistently from the first time Moore stated his law. ...
  • FaulTM: Error detection and recovery using hardware transactional memory 

    Yalcin, Gulay; Unsal, Osman Sabri; Cristal Kestelman, Adrián (2013)
    Texto en actas de congreso
    Acceso restringido por política de la editorial
    Reliability is an essential concern for processor designers due to increasing transient and permanent fault rates. Executing instruction streams redundantly in chip multi processors (CMP) provides high reliability since ...
  • FaulTM: Fault-tolerance using hardware transactional memory 

    Yalcin, Gulay; Unsal, Osman Sabri; Hur, Ibrahim; Cristal Kestelman, Adrián; Valero Cortés, Mateo (2010)
    Texto en actas de congreso
    Acceso abierto
    Fault-tolerance has become an essential concern for processor designers due to increasing soft-error rates. In this study, we are motivated by the fact that Transactional Memory (TM) hardware provides an ideal base upon ...
  • FIMSIM: A fault injection infrastructure for microarchitectural simulators 

    Yalcin, Gulay; Unsal, Osman Sabri; Cristal Kestelman, Adrián; Valero Cortés, Mateo (2011)
    Report de recerca
    Acceso abierto
    Fault injection is a widely used approach for experiment-based dependability evaluation in which faults can be injected to the hardware, to the simulator or to the software. Simulation based fault injection is more appealing ...
  • ParaDIME: Parallel distributed infrastructure for minimization of energy for data centers 

    Rethinagiri, Santhosh Kumar; Palomar Pérez, Óscar; Sobe, Anita; Yalcin, Gulay; Knauth, Thomas; Titos Gil, Rubén; Prieto, Pablo; Schneegaß, Malte; Cristal Kestelman, Adrián; Unsal, Osman Sabri; Felber, Pascal; Fetzer, Christof; Milojevic, Dragomir (2015-11-01)
    Artículo
    Acceso abierto
    Dramatic environmental and economic impact of the ever increasing power and energy consumption of modern computing devices in data centers is now a critical challenge. On the one hand, designers use technology scaling as ...
  • Reliability of GPU-based heterogeneous systems 

    Yalcin, Gulay (Barcelona Supercomputing Center, 2018)
    Texto en actas de congreso
    Acceso abierto
  • System-level power & energy estimation methodology and optimization techniques for CPU-GPU based mobile platforms 

    Rethinagiri, Santhosh Kumar; Palomar Pérez, Óscar; Arias Moreno, Juan; Yalcin, Gulay; Unsal, Osman Sabri; Cristal Kestelman, Adrián (Institute of Electrical and Electronics Engineers (IEEE), 2015)
    Texto en actas de congreso
    Acceso restringido por política de la editorial
    Due to the growing computational requirements of mobile applications, using a heterogeneous Multiprocessor System-on-Chip becomes an incontrovertible solution to meet the service requirements. Today, Electronic System-Level ...