Exploració per autor "González Colás, Antonio María"

A unified modulo scheduling and register allocation technique for clustered processors

Codina Viñas, Josep M.; Sánchez Navarro, F. Jesús; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 2001)
Text en actes de congrés
Accés obert

This work presents a modulo scheduling framework for clustered ILP processors that integrates the cluster assignment, instruction scheduling and register allocation steps in a single phase. This unified approach is more ...

Accurate off-line phase classification for HW/SW co-designed processors

Brankovic, Aleksandar; Stavrou, Kyriakos; Gibert Codina, Enric; González Colás, Antonio María (Association for Computing Machinery (ACM), 2014)
Text en actes de congrés
Accés obert

Evaluation techniques in microprocessor design are mostly based on simulating selected application's samples using a cycle-accurate simulator. These samples usually correspond to different phases of the application stream. ...

AGAMOS: A graph-based approach to modulo scheduling for clustered microarchitectures

Aleta Ortega, Alexandre; Codina Viñas, Josep M.; Sánchez Navarro, F. Jesús; González Colás, Antonio María; Kaeli, D (2009-06)
Article
Accés obert

This paper presents AGAMOS, a technique to modulo schedule loops on clustered microarchitectures. The proposed scheme uses a multilevel graph partitioning strategy to distribute the workload among clusters and reduces the ...

An efficient solver for Cache Miss Equations

Bermudo, Nerina; Vera Rivera, Francisco Javier; González Colás, Antonio María; Llosa Espuny, José Francisco (Institute of Electrical and Electronics Engineers (IEEE), 2000)
Text en actes de congrés
Accés obert

Cache Miss Equations (CME) (S. Ghosh et al., 1997) is a method that accurately describes the cache behavior by means of polyhedra. Even though the computation cost of generating CME is a linear function of the number of ...

An energy-efficient memory unit for clustered microarchitectures

Bieschewski, Stefan; Parcerisa Bundó, Joan Manuel; González Colás, Antonio María (2016-08-01)
Article
Accés obert

Whereas clustered microarchitectures themselves have been extensively studied, the memory units for these clustered microarchitectures have received relatively little attention. This article discusses some of the inherent ...

An ultra low-power hardware accelerator for acoustic scoring in speech recognition

Tabani, Hamid; Arnau Montañés, José María; Tubella Murgadas, Jordi; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 2017)
Text en actes de congrés
Accés restringit per política de l'editorial

Accurate, real-time Automatic Speech Recognition (ASR) comes at a high energy cost, so accuracy has often to be sacrificed in order to fit the strict power constraints of mobile systems. However, accuracy is extremely ...

An ultra low-power hardware accelerator for automatic speech recognition

Yazdani Aminabadi, Reza; Segura Salvador, Albert; Arnau Montañés, José María; González Colás, Antonio María (IEEE Press, 2016)
Text en actes de congrés
Accés obert

Automatic Speech Recognition (ASR) is becoming increasingly ubiquitous, especially in the mobile segment. Fast and accurate ASR comes at a high energy cost which is not affordable for the tiny power budget of mobile devices. ...

Analysis and optimization of engines for dynamically typed languages

Dot Artigas, Gem; Martínez, Alejandro; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 2015)
Text en actes de congrés
Accés restringit per política de l'editorial

Dynamically typed programming languages have become very popular in the recent years. These languages ease the task of the programmer but introduce significant overheads since assumptions about the types of variables have ...

Analysis of CPI variance for dynamic binary translators/optimizers modules

Brankovic, Aleksandar; Stavrou, Kyriakos; Gibert Codina, Enric; González Colás, Antonio María (IEEE, 2012)
Text en actes de congrés
Accés restringit per política de l'editorial

Dynamic Binary Translators and Optimizers (DBTOs) have been established as a hot research topic. They are used in many different systems, such as emulation, instrumentation tools and innovative HW/SW co-designed ...

Analysis of non-uniform cache architecture policies for chip-multiprocessors using the Parsec Benchmark Suite

Lira Rueda, Javier; Molina Clemente, Carlos; González Colás, Antonio María (Association for Computing Machinery (ACM), 2009)
Text en actes de congrés
Accés restringit per política de l'editorial

Non-Uniform Cache Architectures (NUCA) have been proposed as a solution to overcome wire delays that will dominate on-chip latencies in Chip Multiprocessor designs in the near future. This novel means of organization divides ...

Analyzing and improving hardware modeling of Accel-Sim

Huerta Gañán, Rodrigo; Abaie Shoushtary, Mojtaba; González Colás, Antonio María (2023-10)
Report de recerca
Accés obert

GPU architectures have become popular for executing generalpurpose programs. Their many-core architecture supports a large number of threads that run concurrently to hide the latency among dependent instructions. In modern ...

Analyzing data locality in numeric applications

Sánchez Navarro, F. Jesús; González Colás, Antonio María (2000-07)
Article
Accés obert

In this article, we introduce SPLAT (Static and Profiled Data Locality Analysis Tool). The tool's purpose is to provide a fast study of memory behavior without the necessity of a costly memory simulator. SPLAT consists of ...

Anaphase: a fine-grain thread decomposition scheme for speculative multithreading

Madriles Gimeno, Carles; López Muñoz, Pedro; Codina Viñas, Josep M.; Gibert Codina, Enric; Latorre Salinas, Fernando; Martínez Vicente, Alejandro; Martinez, Raul; González Colás, Antonio María (IEEE Computer Society, 2009)
Text en actes de congrés
Accés obert

Industry is moving towards multi-core designs as we have hit the memory and power walls. Multi-core designs are very effective to exploit thread-level parallelism (TLP) but do not provide benefits when executing serial ...

Assisting static compiler vectorization with a speculative dynamic vectorizer in an HW/SW codesigned environment

Kumar, Rakesh; Martínez, Alejandro; González Colás, Antonio María (2016-01-01)
Article
Accés obert

Compiler-based static vectorization is used widely to extract data-level parallelism from computation-intensive applications. Static vectorization is very effective in vectorizing traditional array-based applications. ...

Author retrospective for the dual data cache

González Colás, Antonio María; Aliagas Castell, Carles (Association for Computing Machinery (ACM), 2014)
Capítol de llibre
Accés obert

In this paper we present a retrospective on our paper published in ICS 1995, which to best of our knowledge was the first paper that introduced the concept of a cache memory with multiple subcaches, each tuned for a different ...

Avoiding core's DUE & SDC via acoustic wave detectors and tailored error containment and recovery

Upasani, Gaurang; Vera Rivera, Francisco Javier; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 2014)
Text en actes de congrés
Accés obert

The trend of downsizing transistors and operating voltage scaling has made the processor chip more sensitive against radiation phenomena making soft errors an important challenge. New reliability techniques for handling ...

Boosting LSTM performance through dynamic precision selection

Silfa Feliz, Franyell Antonio; Arnau Montañés, José María; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 2020)
Text en actes de congrés
Accés obert

The use of low numerical precision is a fundamental optimization included in modern accelerators for Deep Neural Networks (DNNs). The number of bits of the numerical representation is set to the minimum precision that is ...

Boosting point cloud search with a vector unit

Exenberger Becker, Pedro Henrique; Arnau Montañés, José María; González Colás, Antonio María (2023)
Report de recerca
Accés obert

Modern robots collect and process point clouds to perform accurate registration and segmentation. The most time-consuming kernel within point cloud processing -namely neighbor search- relies on appropriate data structures, ...

Boosting single-thread performance in multi-core systems through fine-grain multi-threading

Madriles Gimeno, Carles; López Muñoz, Pedro; Codina Viñas, Josep M.; Gibert Codina, Enric; Latorre Salinas, Fernando; Martínez Vicente, Alejandro; Martinez Morais, Raul; González Colás, Antonio María (ACM Press. Association for Computing Machinery, 2009-06)
Text en actes de congrés
Accés restringit per política de l'editorial

Industry has shifted towards multi-core designs as we have hit the memory and power walls. However, single thread performance remains of paramount importance since some applications have limited thread-level parallelism ...

Boustrophedonic frames: Quasi-optimal L2 caching for textures in GPUs

Joseph, Diya; Aragón Alcaraz, Juan Luis; Parcerisa Bundó, Joan Manuel; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 2023)
Text en actes de congrés
Accés obert

Literature is plentiful in works exploiting cache locality for GPUs. A majority of them explore replacement or bypassing policies. In this paper, however, we surpass this exploration by fabricating a formal proof for a ...