Exploració per autor "González Colás, Antonio María"
Ara es mostren els items 21-40 de 237
-
A unified modulo scheduling and register allocation technique for clustered processors
Codina Viñas, Josep M.; Sánchez Navarro, F. Jesús; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 2001)
Text en actes de congrés
Accés obertThis work presents a modulo scheduling framework for clustered ILP processors that integrates the cluster assignment, instruction scheduling and register allocation steps in a single phase. This unified approach is more ... -
Accurate off-line phase classification for HW/SW co-designed processors
Brankovic, Aleksandar; Stavrou, Kyriakos; Gibert Codina, Enric; González Colás, Antonio María (Association for Computing Machinery (ACM), 2014)
Text en actes de congrés
Accés obertEvaluation techniques in microprocessor design are mostly based on simulating selected application's samples using a cycle-accurate simulator. These samples usually correspond to different phases of the application stream. ... -
AGAMOS: A graph-based approach to modulo scheduling for clustered microarchitectures
Aleta Ortega, Alexandre; Codina Viñas, Josep M.; Sánchez Navarro, F. Jesús; González Colás, Antonio María; Kaeli, D (2009-06)
Article
Accés obertThis paper presents AGAMOS, a technique to modulo schedule loops on clustered microarchitectures. The proposed scheme uses a multilevel graph partitioning strategy to distribute the workload among clusters and reduces the ... -
An efficient solver for Cache Miss Equations
Bermudo, Nerina; Vera Rivera, Francisco Javier; González Colás, Antonio María; Llosa Espuny, José Francisco (Institute of Electrical and Electronics Engineers (IEEE), 2000)
Text en actes de congrés
Accés obertCache Miss Equations (CME) (S. Ghosh et al., 1997) is a method that accurately describes the cache behavior by means of polyhedra. Even though the computation cost of generating CME is a linear function of the number of ... -
An energy-efficient memory unit for clustered microarchitectures
Bieschewski, Stefan; Parcerisa Bundó, Joan Manuel; González Colás, Antonio María (2016-08-01)
Article
Accés obertWhereas clustered microarchitectures themselves have been extensively studied, the memory units for these clustered microarchitectures have received relatively little attention. This article discusses some of the inherent ... -
An ultra low-power hardware accelerator for acoustic scoring in speech recognition
Tabani, Hamid; Arnau Montañés, José María; Tubella Murgadas, Jordi; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 2017)
Text en actes de congrés
Accés restringit per política de l'editorialAccurate, real-time Automatic Speech Recognition (ASR) comes at a high energy cost, so accuracy has often to be sacrificed in order to fit the strict power constraints of mobile systems. However, accuracy is extremely ... -
An ultra low-power hardware accelerator for automatic speech recognition
Yazdani Aminabadi, Reza; Segura Salvador, Albert; Arnau Montañés, José María; González Colás, Antonio María (IEEE Press, 2016)
Text en actes de congrés
Accés obertAutomatic Speech Recognition (ASR) is becoming increasingly ubiquitous, especially in the mobile segment. Fast and accurate ASR comes at a high energy cost which is not affordable for the tiny power budget of mobile devices. ... -
Analysis and optimization of engines for dynamically typed languages
Dot Artigas, Gem; Martínez, Alejandro; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 2015)
Text en actes de congrés
Accés restringit per política de l'editorialDynamically typed programming languages have become very popular in the recent years. These languages ease the task of the programmer but introduce significant overheads since assumptions about the types of variables have ... -
Analysis of CPI variance for dynamic binary translators/optimizers modules
Brankovic, Aleksandar; Stavrou, Kyriakos; Gibert Codina, Enric; González Colás, Antonio María (IEEE, 2012)
Text en actes de congrés
Accés restringit per política de l'editorialDynamic Binary Translators and Optimizers (DBTOs) have been established as a hot research topic. They are used in many different systems, such as emulation, instrumentation tools and innovative HW/SW co-designed ... -
Analysis of non-uniform cache architecture policies for chip-multiprocessors using the Parsec Benchmark Suite
Lira Rueda, Javier; Molina Clemente, Carlos; González Colás, Antonio María (Association for Computing Machinery (ACM), 2009)
Text en actes de congrés
Accés restringit per política de l'editorialNon-Uniform Cache Architectures (NUCA) have been proposed as a solution to overcome wire delays that will dominate on-chip latencies in Chip Multiprocessor designs in the near future. This novel means of organization divides ... -
Analyzing and improving hardware modeling of Accel-Sim
Huerta Gañán, Rodrigo; Abaie Shoushtary, Mojtaba; González Colás, Antonio María (2023-10)
Report de recerca
Accés obertGPU architectures have become popular for executing generalpurpose programs. Their many-core architecture supports a large number of threads that run concurrently to hide the latency among dependent instructions. In modern ... -
Analyzing data locality in numeric applications
Sánchez Navarro, F. Jesús; González Colás, Antonio María (2000-07)
Article
Accés obertIn this article, we introduce SPLAT (Static and Profiled Data Locality Analysis Tool). The tool's purpose is to provide a fast study of memory behavior without the necessity of a costly memory simulator. SPLAT consists of ... -
Anaphase: a fine-grain thread decomposition scheme for speculative multithreading
Madriles Gimeno, Carles; López Muñoz, Pedro; Codina Viñas, Josep M.; Gibert Codina, Enric; Latorre Salinas, Fernando; Martínez Vicente, Alejandro; Martinez, Raul; González Colás, Antonio María (IEEE Computer Society, 2009)
Text en actes de congrés
Accés obertIndustry is moving towards multi-core designs as we have hit the memory and power walls. Multi-core designs are very effective to exploit thread-level parallelism (TLP) but do not provide benefits when executing serial ... -
Assisting static compiler vectorization with a speculative dynamic vectorizer in an HW/SW codesigned environment
Kumar, Rakesh; Martínez, Alejandro; González Colás, Antonio María (2016-01-01)
Article
Accés obertCompiler-based static vectorization is used widely to extract data-level parallelism from computation-intensive applications. Static vectorization is very effective in vectorizing traditional array-based applications. ... -
Author retrospective for the dual data cache
González Colás, Antonio María; Aliagas Castell, Carles (Association for Computing Machinery (ACM), 2014)
Capítol de llibre
Accés obertIn this paper we present a retrospective on our paper published in ICS 1995, which to best of our knowledge was the first paper that introduced the concept of a cache memory with multiple subcaches, each tuned for a different ... -
Avoiding core's DUE & SDC via acoustic wave detectors and tailored error containment and recovery
Upasani, Gaurang; Vera Rivera, Francisco Javier; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 2014)
Text en actes de congrés
Accés obertThe trend of downsizing transistors and operating voltage scaling has made the processor chip more sensitive against radiation phenomena making soft errors an important challenge. New reliability techniques for handling ... -
Boosting LSTM performance through dynamic precision selection
Silfa Feliz, Franyell Antonio; Arnau Montañés, José María; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 2020)
Text en actes de congrés
Accés obertThe use of low numerical precision is a fundamental optimization included in modern accelerators for Deep Neural Networks (DNNs). The number of bits of the numerical representation is set to the minimum precision that is ... -
Boosting point cloud search with a vector unit
Exenberger Becker, Pedro Henrique; Arnau Montañés, José María; González Colás, Antonio María (2023)
Report de recerca
Accés obertModern robots collect and process point clouds to perform accurate registration and segmentation. The most time-consuming kernel within point cloud processing -namely neighbor search- relies on appropriate data structures, ... -
Boosting single-thread performance in multi-core systems through fine-grain multi-threading
Madriles Gimeno, Carles; López Muñoz, Pedro; Codina Viñas, Josep M.; Gibert Codina, Enric; Latorre Salinas, Fernando; Martínez Vicente, Alejandro; Martinez Morais, Raul; González Colás, Antonio María (ACM Press. Association for Computing Machinery, 2009-06)
Text en actes de congrés
Accés restringit per política de l'editorialIndustry has shifted towards multi-core designs as we have hit the memory and power walls. However, single thread performance remains of paramount importance since some applications have limited thread-level parallelism ... -
Boustrophedonic frames: Quasi-optimal L2 caching for textures in GPUs
Joseph, Diya; Aragón Alcaraz, Juan Luis; Parcerisa Bundó, Joan Manuel; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 2023)
Text en actes de congrés
Accés obertLiterature is plentiful in works exploiting cache locality for GPUs. A majority of them explore replacement or bypassing policies. In this paper, however, we surpass this exploration by fabricating a formal proof for a ...