Now showing items 1-20 of 31

    • A low-power hardware accelerator for ORB feature extraction in self-driving cars 

      Taranco Serna, Raúl; Arnau Montañés, José María; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 2021)
      Conference report
      Open Access
      Simultaneous Localization And Mapping (SLAM) is a key component for autonomous navigation. SLAM consists of building and creating a map of an unknown environment while keeping track of the exploring agent's location within ...
    • A low-power, high-performance speech recognition accelerator 

      Yazdani, Reza; Arnau Montañés, José María; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 2019-12-01)
      Article
      Open Access
      Automatic Speech Recognition (ASR) is becoming increasingly ubiquitous, especially in the mobile segment. Fast and accurate ASR comes at high energy cost, not being affordable for the tiny power-budgeted mobile devices. ...
    • A novel register renaming technique for out-of-order processors 

      Tabani, Hamid; Arnau Montañés, José María; Tubella Murgadas, Jordi; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 2018)
      Conference report
      Restricted access - publisher's policy
      Modern superscalar processors support a large number of in-flight instructions, which requires sizeable register files. Conventional register renaming techniques allocate a new storage location, i.e. physical register, for ...
    • A programmable accelerator for streaming automatic speech recognition on edge devices 

      Pinto Rivero, Daniel; Arnau Montañés, José María; González Colás, Antonio María (2022)
      Conference report
      Open Access
      Automatic Speech Recognition (ASR) is quickly becoming a mainstream technology, mainly driven by the outstanding accuracy achieved by modern systems based on machine learning. However, these systems often require billions ...
    • An ultra low-power hardware accelerator for acoustic scoring in speech recognition 

      Tabani, Hamid; Arnau Montañés, José María; Tubella Murgadas, Jordi; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 2017)
      Conference report
      Restricted access - publisher's policy
      Accurate, real-time Automatic Speech Recognition (ASR) comes at a high energy cost, so accuracy has often to be sacrificed in order to fit the strict power constraints of mobile systems. However, accuracy is extremely ...
    • An ultra low-power hardware accelerator for automatic speech recognition 

      Yazdani Aminabadi, Reza; Segura Salvador, Albert; Arnau Montañés, José María; González Colás, Antonio María (IEEE Press, 2016)
      Conference report
      Open Access
      Automatic Speech Recognition (ASR) is becoming increasingly ubiquitous, especially in the mobile segment. Fast and accurate ASR comes at a high energy cost which is not affordable for the tiny power budget of mobile devices. ...
    • Boosting LSTM performance through dynamic precision selection 

      Silfa Feliz, Franyell Antonio; Arnau Montañés, José María; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 2020)
      Conference report
      Open Access
      The use of low numerical precision is a fundamental optimization included in modern accelerators for Deep Neural Networks (DNNs). The number of bits of the numerical representation is set to the minimum precision that is ...
    • CGPA: Coarse-Grained Pruning of Activations for Energy-Efficient RNN Inference 

      Riera Villanueva, Marc; Arnau Montañés, José María; González Colás, Antonio María (2019-09-01)
      Article
      Open Access
      Recurrent neural networks (RNNs) perform element-wise multiplications across the activations of gates. We show that a significant percentage of activations are saturated and propose coarse-grained pruning of activations ...
    • Characterizing self-driving tasks in general-purpose architectures 

      Exenberger Becker, Pedro Henrique; Arnau Montañés, José María; González Colás, Antonio María (European Network of Excellence on High Performance and Embedded Architecture and Compilation (HiPEAC), 2021-09-15)
      Part of book or chapter of book
      Open Access
      Autonomous Vehicles (AVs) have the potential to radically change the automotive industry. How- ever, computing solutions for AVs have to meet severe performance constraints to guarantee a safe driving experience. Current ...
    • Computation reuse in DNNs by exploiting input similarity 

      Riera Villanueva, Marc; Arnau Montañés, José María; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 2018)
      Conference report
      Restricted access - publisher's policy
      In recent years, Deep Neural Networks (DNNs) have achieved tremendous success for diverse problems such as classification and decision making. Efficient support for DNNs on CPUs, GPUs and accelerators has become a prolific ...
    • CREW: Computation reuse and efficient weight storage for hardware-accelerated MLPs and RNNs 

      Riera Villanueva, Marc; Arnau Montañés, José María; González Colás, Antonio María (2022-08-01)
      Article
      Open Access
      Deep Neural Networks (DNNs) have achieved tremendous success for cognitive applications. The core operation in a DNN is the dot product between quantized inputs and weights. Prior works exploit the weight/input repetition ...
    • Demystifying power and performance bottlenecks in autonomous driving systems 

      Exenberger Becker, Pedro Henrique; Arnau Montañés, José María; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 2020)
      Conference report
      Open Access
      Autonomous Vehicles (AVs) have the potential to radically change the automotive industry. However, computing solutions for AVs have to meet severe performance and power constraints to guarantee a safe driving experience. ...
    • Design and evaluation of an ultra low-power human-quality speech recognition system 

      Pinto Rivero, Daniel; Arnau Montañés, José María; González Colás, Antonio María (2020-11)
      Article
      Open Access
      Automatic Speech Recognition (ASR) has experienced a dramatic evolution since pioneer development of Bell Lab’s single-digit recognizer more than 50 years ago. Current ASR systems have taken advantage of the tremendous ...
    • DNN pruning with principal component analysis and connection importance estimation 

      Riera Villanueva, Marc; Arnau Montañés, José María; González Colás, Antonio María (2022-01)
      Article
      Open Access
      DNN pruning reduces memory footprint and computational work of DNN-based solutions to improve performance and energy-efficiency. An effective pruning scheme should be able to systematically remove connections and/or neurons ...
    • E-BATCH: Energy-efficient and high-throughput RNN batching 

      Silfa Feliz, Franyell Antonio; Arnau Montañés, José María; González Colás, Antonio María (2022-03)
      Article
      Open Access
      Recurrent Neural Network (RNN) inference exhibits low hardware utilization due to the strict data dependencies across time-steps. Batching multiple requests can increase throughput. However, RNN batching requires a large ...
    • E-PUR: an energy-efficient processing unit for recurrent neural networks 

      Silfa Feliz, Franyell Antonio; Dot, Gem; Arnau Montañés, José María; González Colás, Antonio María (2018)
      Conference report
      Restricted access - publisher's policy
      Recurrent Neural Networks (RNNs) are a key technology for emerging applications such as automatic speech recognition, machine translation or image description. Long Short Term Memory (LSTM) networks are the most successful ...
    • Eliminating redundant fragment shader executions on a mobile GPU via hardware memoization 

      Arnau Montañés, José María; Parcerisa Bundó, Joan Manuel; Xekalakis, Polychronis (2014)
      Conference report
      Restricted access - publisher's policy
      Redundancy is at the heart of graphical applications. In fact, generating an animation typically involves the succession of extremely similar images. In terms of rendering these images, this behavior translates into the ...
    • Energy-efficient mobile GPU systems 

      Arnau Montañés, José María (Universitat Politècnica de Catalunya, 2015-04-24)
      Doctoral thesis
      Open Access
      The design of mobile GPUs is all about saving energy. Smartphones and tablets are battery-operated and thus any type of rendering needs to use as little energy as possible. Furthermore, smartphones do not include sophisticated ...
    • Energy-efficient stream compaction through filtering and coalescing accesses in GPGPU memory partitions 

      Segura Salvador, Albert; Arnau Montañés, José María; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 2022-07-01)
      Article
      Open Access
      Graph-based applications are essential in emerging domains such as data analytics or machine learning. Data gathering in a knowledge-based society requires great data processing efficiency. High-throughput GPGPU architectures ...
    • Irregular accesses reorder unit: improving GPGPU memory coalescing for graph-based workloads 

      Segura Salvador, Albert; Arnau Montañés, José María; González Colás, Antonio María (2022-07-18)
      Article
      Open Access
      GPGPU architectures have become the dominant platform for massively parallel workloads, delivering high performance and energy efficiency for popular applications such as machine learning, computer vision or self-driving ...