Now showing items 1-11 of 11

  • A runtime heuristic to selectively replicate tasks for application-specific reliability targets 

    Subasi, Omer; Yalcin, Gulay; Zyulkyarov, Ferad; Unsal, Osman Sabri; Labarta Mancho, Jesús José (Institute of Electrical and Electronics Engineers (IEEE), 2016)
    Conference report
    Open Access
    In this paper we propose a runtime-based selective task replication technique for task-parallel high performance computing applications. Our selective task replication technique is automatic and does not require ...
  • Circuit design of a novel adaptable and reliable L1 data cache 

    Seyedi, Azam; Yalcin, Gulay; Unsal, Osman Sabri; Cristal Kestelman, Adrián (2013)
    Conference report
    Open Access
    This paper proposes a novel adaptable and reliable L1 data cache design (Adapcache) with the unique capability of automatically adapting itself for different supply voltage levels and providing the highest reliability. ...
  • CRC-based memory reliability for task-parallel HPC applications 

    Subasi, Omer; Unsal, Osman Sabri; Labarta Mancho, Jesús José; Yalcin, Gulay; Cristal Kestelman, Adrián (Institute of Electrical and Electronics Engineers (IEEE), 2016)
    Conference report
    Restricted access - publisher's policy
    Memory reliability will be one of the major concerns for future HPC and Exascale systems. This concern is mostly attributed to the expected massive increase in memory capacity and the number of memory devices in Exascale ...
  • Designing and modelling selective replication for fault-tolerant HPC applications 

    Subasi, Omer; Yalcin, Gulay; Zyulkyarov, Ferad; Unsal, Osman Sabri; Labarta Mancho, Jesús José (Institute of Electrical and Electronics Engineers (IEEE), 2017)
    Conference report
    Open Access
    Fail-stop errors and Silent Data Corruptions (SDCs) are the most common failure modes for High Performance Computing (HPC) applications. There are studies that address fail-stop errors and studies that address SDCs. However ...
  • Designs for increasing reliability while reducing energy and increasing lifetime 

    Yalcin, Gulay (Universitat Politècnica de Catalunya, 2014-12-12)
    Doctoral thesis
    Open Access
    In the last decades, the computing technology experienced tremendous developments. For instance, transistors' feature size shrank to half at every two years as consistently from the first time Moore stated his law. ...
  • FaulTM: Error detection and recovery using hardware transactional memory 

    Yalcin, Gulay; Unsal, Osman Sabri; Cristal Kestelman, Adrián (2013)
    Conference report
    Restricted access - publisher's policy
    Reliability is an essential concern for processor designers due to increasing transient and permanent fault rates. Executing instruction streams redundantly in chip multi processors (CMP) provides high reliability since ...
  • FaulTM: Fault-tolerance using hardware transactional memory 

    Yalcin, Gulay; Unsal, Osman Sabri; Hur, Ibrahim; Cristal Kestelman, Adrián; Valero Cortés, Mateo (2010)
    Conference report
    Open Access
    Fault-tolerance has become an essential concern for processor designers due to increasing soft-error rates. In this study, we are motivated by the fact that Transactional Memory (TM) hardware provides an ideal base upon ...
  • FIMSIM: A fault injection infrastructure for microarchitectural simulators 

    Yalcin, Gulay; Unsal, Osman Sabri; Cristal Kestelman, Adrián; Valero Cortés, Mateo (2011)
    External research report
    Open Access
    Fault injection is a widely used approach for experiment-based dependability evaluation in which faults can be injected to the hardware, to the simulator or to the software. Simulation based fault injection is more appealing ...
  • ParaDIME: Parallel distributed infrastructure for minimization of energy for data centers 

    Rethinagiri, Santhosh Kumar; Palomar Pérez, Óscar; Sobe, Anita; Yalcin, Gulay; Knauth, Thomas; Titos Gil, Rubén; Prieto, Pablo; Schneegaß, Malte; Cristal Kestelman, Adrián; Unsal, Osman Sabri; Felber, Pascal; Fetzer, Christof; Milojevic, Dragomir (2015-11-01)
    Article
    Open Access
    Dramatic environmental and economic impact of the ever increasing power and energy consumption of modern computing devices in data centers is now a critical challenge. On the one hand, designers use technology scaling as ...
  • Reliability of GPU-based heterogeneous systems 

    Yalcin, Gulay (Barcelona Supercomputing Center, 2018)
    Conference report
    Open Access
  • System-level power & energy estimation methodology and optimization techniques for CPU-GPU based mobile platforms 

    Rethinagiri, Santhosh Kumar; Palomar Pérez, Óscar; Arias Moreno, Juan; Yalcin, Gulay; Unsal, Osman Sabri; Cristal Kestelman, Adrián (Institute of Electrical and Electronics Engineers (IEEE), 2015)
    Conference report
    Restricted access - publisher's policy
    Due to the growing computational requirements of mobile applications, using a heterogeneous Multiprocessor System-on-Chip becomes an incontrovertible solution to meet the service requirements. Today, Electronic System-Level ...