Ara es mostren els items 21-40 de 237

    • A unified modulo scheduling and register allocation technique for clustered processors 

      Codina Viñas, Josep M.; Sánchez Navarro, F. Jesús; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 2001)
      Text en actes de congrés
      Accés obert
      This work presents a modulo scheduling framework for clustered ILP processors that integrates the cluster assignment, instruction scheduling and register allocation steps in a single phase. This unified approach is more ...
    • Accurate off-line phase classification for HW/SW co-designed processors 

      Brankovic, Aleksandar; Stavrou, Kyriakos; Gibert Codina, Enric; González Colás, Antonio María (Association for Computing Machinery (ACM), 2014)
      Text en actes de congrés
      Accés obert
      Evaluation techniques in microprocessor design are mostly based on simulating selected application's samples using a cycle-accurate simulator. These samples usually correspond to different phases of the application stream. ...
    • AGAMOS: A graph-based approach to modulo scheduling for clustered microarchitectures 

      Aleta Ortega, Alexandre; Codina Viñas, Josep M.; Sánchez Navarro, F. Jesús; González Colás, Antonio María; Kaeli, D (2009-06)
      Article
      Accés obert
      This paper presents AGAMOS, a technique to modulo schedule loops on clustered microarchitectures. The proposed scheme uses a multilevel graph partitioning strategy to distribute the workload among clusters and reduces the ...
    • An efficient solver for Cache Miss Equations 

      Bermudo, Nerina; Vera Rivera, Francisco Javier; González Colás, Antonio María; Llosa Espuny, José Francisco (Institute of Electrical and Electronics Engineers (IEEE), 2000)
      Text en actes de congrés
      Accés obert
      Cache Miss Equations (CME) (S. Ghosh et al., 1997) is a method that accurately describes the cache behavior by means of polyhedra. Even though the computation cost of generating CME is a linear function of the number of ...
    • An energy-efficient memory unit for clustered microarchitectures 

      Bieschewski, Stefan; Parcerisa Bundó, Joan Manuel; González Colás, Antonio María (2016-08-01)
      Article
      Accés obert
      Whereas clustered microarchitectures themselves have been extensively studied, the memory units for these clustered microarchitectures have received relatively little attention. This article discusses some of the inherent ...
    • An ultra low-power hardware accelerator for acoustic scoring in speech recognition 

      Tabani, Hamid; Arnau Montañés, José María; Tubella Murgadas, Jordi; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 2017)
      Text en actes de congrés
      Accés restringit per política de l'editorial
      Accurate, real-time Automatic Speech Recognition (ASR) comes at a high energy cost, so accuracy has often to be sacrificed in order to fit the strict power constraints of mobile systems. However, accuracy is extremely ...
    • An ultra low-power hardware accelerator for automatic speech recognition 

      Yazdani Aminabadi, Reza; Segura Salvador, Albert; Arnau Montañés, José María; González Colás, Antonio María (IEEE Press, 2016)
      Text en actes de congrés
      Accés obert
      Automatic Speech Recognition (ASR) is becoming increasingly ubiquitous, especially in the mobile segment. Fast and accurate ASR comes at a high energy cost which is not affordable for the tiny power budget of mobile devices. ...
    • Analysis and optimization of engines for dynamically typed languages 

      Dot Artigas, Gem; Martínez, Alejandro; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 2015)
      Text en actes de congrés
      Accés restringit per política de l'editorial
      Dynamically typed programming languages have become very popular in the recent years. These languages ease the task of the programmer but introduce significant overheads since assumptions about the types of variables have ...
    • Analysis of CPI variance for dynamic binary translators/optimizers modules 

      Brankovic, Aleksandar; Stavrou, Kyriakos; Gibert Codina, Enric; González Colás, Antonio María (IEEE, 2012)
      Text en actes de congrés
      Accés restringit per política de l'editorial
      Dynamic Binary Translators and Optimizers (DBTOs) have been established as a hot research topic. They are used in many different systems, such as emulation, instrumentation tools and innovative HW/SW co-designed ...
    • Analysis of non-uniform cache architecture policies for chip-multiprocessors using the Parsec Benchmark Suite 

      Lira Rueda, Javier; Molina Clemente, Carlos; González Colás, Antonio María (Association for Computing Machinery (ACM), 2009)
      Text en actes de congrés
      Accés restringit per política de l'editorial
      Non-Uniform Cache Architectures (NUCA) have been proposed as a solution to overcome wire delays that will dominate on-chip latencies in Chip Multiprocessor designs in the near future. This novel means of organization divides ...
    • Analyzing and improving hardware modeling of Accel-Sim 

      Huerta Gañán, Rodrigo; Abaie Shoushtary, Mojtaba; González Colás, Antonio María (2023-10)
      Report de recerca
      Accés obert
      GPU architectures have become popular for executing generalpurpose programs. Their many-core architecture supports a large number of threads that run concurrently to hide the latency among dependent instructions. In modern ...
    • Analyzing data locality in numeric applications 

      Sánchez Navarro, F. Jesús; González Colás, Antonio María (2000-07)
      Article
      Accés obert
      In this article, we introduce SPLAT (Static and Profiled Data Locality Analysis Tool). The tool's purpose is to provide a fast study of memory behavior without the necessity of a costly memory simulator. SPLAT consists of ...
    • Anaphase: a fine-grain thread decomposition scheme for speculative multithreading 

      Madriles Gimeno, Carles; López Muñoz, Pedro; Codina Viñas, Josep M.; Gibert Codina, Enric; Latorre Salinas, Fernando; Martínez Vicente, Alejandro; Martinez, Raul; González Colás, Antonio María (IEEE Computer Society, 2009)
      Text en actes de congrés
      Accés obert
      Industry is moving towards multi-core designs as we have hit the memory and power walls. Multi-core designs are very effective to exploit thread-level parallelism (TLP) but do not provide benefits when executing serial ...
    • Assisting static compiler vectorization with a speculative dynamic vectorizer in an HW/SW codesigned environment 

      Kumar, Rakesh; Martínez, Alejandro; González Colás, Antonio María (2016-01-01)
      Article
      Accés obert
      Compiler-based static vectorization is used widely to extract data-level parallelism from computation-intensive applications. Static vectorization is very effective in vectorizing traditional array-based applications. ...
    • Author retrospective for the dual data cache 

      González Colás, Antonio María; Aliagas Castell, Carles (Association for Computing Machinery (ACM), 2014)
      Capítol de llibre
      Accés obert
      In this paper we present a retrospective on our paper published in ICS 1995, which to best of our knowledge was the first paper that introduced the concept of a cache memory with multiple subcaches, each tuned for a different ...
    • Avoiding core's DUE & SDC via acoustic wave detectors and tailored error containment and recovery 

      Upasani, Gaurang; Vera Rivera, Francisco Javier; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 2014)
      Text en actes de congrés
      Accés obert
      The trend of downsizing transistors and operating voltage scaling has made the processor chip more sensitive against radiation phenomena making soft errors an important challenge. New reliability techniques for handling ...
    • Boosting LSTM performance through dynamic precision selection 

      Silfa Feliz, Franyell Antonio; Arnau Montañés, José María; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 2020)
      Text en actes de congrés
      Accés obert
      The use of low numerical precision is a fundamental optimization included in modern accelerators for Deep Neural Networks (DNNs). The number of bits of the numerical representation is set to the minimum precision that is ...
    • Boosting point cloud search with a vector unit 

      Exenberger Becker, Pedro Henrique; Arnau Montañés, José María; González Colás, Antonio María (2023)
      Report de recerca
      Accés obert
      Modern robots collect and process point clouds to perform accurate registration and segmentation. The most time-consuming kernel within point cloud processing -namely neighbor search- relies on appropriate data structures, ...
    • Boosting single-thread performance in multi-core systems through fine-grain multi-threading 

      Madriles Gimeno, Carles; López Muñoz, Pedro; Codina Viñas, Josep M.; Gibert Codina, Enric; Latorre Salinas, Fernando; Martínez Vicente, Alejandro; Martinez Morais, Raul; González Colás, Antonio María (ACM Press. Association for Computing Machinery, 2009-06)
      Text en actes de congrés
      Accés restringit per política de l'editorial
      Industry has shifted towards multi-core designs as we have hit the memory and power walls. However, single thread performance remains of paramount importance since some applications have limited thread-level parallelism ...
    • Boustrophedonic frames: Quasi-optimal L2 caching for textures in GPUs 

      Joseph, Diya; Aragón Alcaraz, Juan Luis; Parcerisa Bundó, Joan Manuel; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 2023)
      Text en actes de congrés
      Accés obert
      Literature is plentiful in works exploiting cache locality for GPUs. A majority of them explore replacement or bypassing policies. In this paper, however, we surpass this exploration by fabricating a formal proof for a ...