Recent Submissions

  • DRAM errors in the field: a statistical approach 

    Živanovič, Darko; Esmaili Dokht, Pouya; Moré, Sergi; Bartolomé, Javier; Carpenter, Paul Matthew; Radojkovic, Petar; Ayguadé Parra, Eduard (Association for Computing Machinery (ACM), 2019)
    Conference report
    Open Access
    This paper summarizes our two-year study of corrected and uncor-rected errors on the MareNostrum 3 supercomputer, covering 2000 billion MB-hours of DRAM in the field. The study analyzes 4.5 million corrected and 71 uncorrected ...
  • Neuron-level fuzzy memoization in RNNs 

    Silfa Feliz, Franyell Antonio; Dot Artigas, Gem; Arnau Montañés, José María; González Colás, Antonio María (Association for Computing Machinery (ACM), 2019)
    Conference report
    Open Access
    Recurrent Neural Networks (RNNs) are a key technology for applications such as automatic speech recognition or machine translation. Unlike conventional feed-forward DNNs, RNNs remember past information to improve the ...
  • Leveraging run-time feedback for efficient ASR acceleration 

    Yazdani, Reza; Arnau Montañés, José María; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 2019)
    Conference report
    Open Access
    In this work, we propose Locality-AWare-Scheme (LAWS) for an Automatic Speech Recognition (ASR) accelerator in order to significantly reduce its energy consumption and memory requirements, by leveraging the locality among ...
  • SCU: a GPU stream compaction unit for graph processing 

    Segura Salvador, Albert; Arnau Montañés, José María; González Colás, Antonio María (Association for Computing Machinery (ACM), 2019)
    Conference report
    Restricted access - publisher's policy
    Graph processing algorithms are key in many emerging applications in areas such as machine learning and data analytics. Although the processing of large scale graphs exhibits a high degree of parallelism, the memory access ...
  • Rendering elimination: early discard of redundant tiles in the graphics pipeline 

    Anglada Sánchez, Martí; de Lucas Casamayor, Enrique; Parcerisa Bundó, Joan Manuel; Aragón, Juan Luis; Marcuello Pascual, Pedro; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 2019)
    Conference report
    Restricted access - publisher's policy
    GPUs are one of the most energy-consuming components for real-time rendering applications, since a large number of fragment shading computations and memory accesses are involved. Main memory bandwidth is especially taxing ...
  • Early visibility resolution for removing ineffectual computations in the graphics pipeline 

    Anglada Sánchez, Martí; de Lucas Casamayor, Enrique; Parcerisa Bundó, Joan Manuel; Aragón Alcaraz, Juan Luis; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 2019)
    Conference report
    Restricted access - publisher's policy
    GPUs' main workload is real-time image rendering. These applications take a description of a (animated) scene and produce the corresponding image(s). An image is rendered by computing the colors of all its pixels. It is ...
  • E-PUR: an energy-efficient processing unit for recurrent neural networks 

    Silfa Feliz, Franyell Antonio; Dot, Gem; Arnau Montañés, José María; González Colás, Antonio María (2018)
    Conference report
    Restricted access - publisher's policy
    Recurrent Neural Networks (RNNs) are a key technology for emerging applications such as automatic speech recognition, machine translation or image description. Long Short Term Memory (LSTM) networks are the most successful ...
  • Computation reuse in DNNs by exploiting input similarity 

    Riera Villanueva, Marc; Arnau Montañés, José María; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 2018)
    Conference report
    Restricted access - publisher's policy
    In recent years, Deep Neural Networks (DNNs) have achieved tremendous success for diverse problems such as classification and decision making. Efficient support for DNNs on CPUs, GPUs and accelerators has become a prolific ...
  • The dark side of DNN pruning 

    Yazdani Aminabadi, Reza; Arnau Montañés, José María; González Colás, Antonio María; Riera Villanueva, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2018)
    Conference report
    Restricted access - publisher's policy
    DNN pruning has been recently proposed as an effective technique to improve the energy-efficiency of DNN-based solutions. It is claimed that by removing unimportant or redundant connections, the pruned DNN delivers higher ...
  • A novel register renaming technique for out-of-order processors 

    Tabani, Hamid; Arnau Montañés, José María; Tubella Murgadas, Jordi; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 2018)
    Conference report
    Restricted access - publisher's policy
    Modern superscalar processors support a large number of in-flight instructions, which requires sizeable register files. Conventional register renaming techniques allocate a new storage location, i.e. physical register, for ...
  • MeRLiN: Exploiting dynamic instruction behavior for fast and accurate microarchitecture level reliability assessment 

    Kaliorakis, Manolis; Gizopoulos, Dimitris; Canal Corretger, Ramon; González Colás, Antonio María (Association for Computing Machinery (ACM), 2017)
    Conference report
    Open Access
    Early reliability assessment of hardware structures using microarchitecture level simulators can effectively guide major error protection decisions in microprocessor design. Statistical fault injection on microarchitectural ...
  • HW/SW co-designed processors: Challenges, design choices and a simulation infrastructure for evaluation 

    Kumar, Rakesh; Cano, José; Brankovic, Aleksandar; Pavlou, Demos; Stavrou, Kyriakos; Gibert Codina, Enric; Martínez, Alejandro; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 2017)
    Conference report
    Open Access
    Improving single thread performance is a key challenge in modern microprocessors especially because the traditional approach of increasing clock frequency and deep pipelining cannot be pushed further due to power constraints. ...

View more