Reports de recerca
http://hdl.handle.net/2117/3113
2024-03-28T12:38:33ZBoosting point cloud search with a vector unit
http://hdl.handle.net/2117/404506
Boosting point cloud search with a vector unit
Exenberger Becker, Pedro Henrique; Arnau Montañés, José María; González Colás, Antonio María
Modern robots collect and process point clouds to perform accurate registration and segmentation. The most time-consuming kernel within point cloud processing -namely neighbor search- relies on appropriate data structures, such as k-d trees to prune the search space. In this work, we exploit similarity in subsequent k-d tree traversals to improve search on CPUs leveraging vector units. We show the potential for vectorizing k-d tree search using modern vector hardware, and performance benefits with a software-only implementation. Further, we discuss limitations in current architectures and present in-progress ideas to specialize the CPU vector hardware for an efficient k-d tree vector search.
Treball presentat a: RoboARCH, Workshop on Robotics Acceleration with Computing Hardware, an MICRO 2023 Workshop.
2024-03-14T09:34:43ZExenberger Becker, Pedro HenriqueArnau Montañés, José MaríaGonzález Colás, Antonio MaríaModern robots collect and process point clouds to perform accurate registration and segmentation. The most time-consuming kernel within point cloud processing -namely neighbor search- relies on appropriate data structures, such as k-d trees to prune the search space. In this work, we exploit similarity in subsequent k-d tree traversals to improve search on CPUs leveraging vector units. We show the potential for vectorizing k-d tree search using modern vector hardware, and performance benefits with a software-only implementation. Further, we discuss limitations in current architectures and present in-progress ideas to specialize the CPU vector hardware for an efficient k-d tree vector search.Analyzing and improving hardware modeling of Accel-Sim
http://hdl.handle.net/2117/404505
Analyzing and improving hardware modeling of Accel-Sim
Huerta Gañán, Rodrigo; Abaie Shoushtary, Mojtaba; González Colás, Antonio María
GPU architectures have become popular for executing generalpurpose programs. Their many-core architecture supports a large number of threads that run concurrently to hide the latency among dependent instructions. In modern GPU architectures, each SM/core is typically composed of several sub-cores, where each sub-core has its own independent pipeline. Simulators are a key tool for investigating novel concepts in computer architecture. They must be performance-accurate and have a proper model related to the target hardware to explore the different bottlenecks properly. This paper presents a wide analysis of different parts of Accelsim, a popular GPGPU simulator, and some improvements of its model. First, we focus on the front-end and developed a more realistic model. Then, we analyze the way the result bus works and develop a more realistic one. Next, we describe the current memory pipeline model and propose a model for a more cost-effective design. Finally, we discuss other areas of improvement of the simulator.
Treball presentat a: 1st Workshop on Computer Architecture Modeling and Simulation (CAMS 2023)
2024-03-14T09:10:32ZHuerta Gañán, RodrigoAbaie Shoushtary, MojtabaGonzález Colás, Antonio MaríaGPU architectures have become popular for executing generalpurpose programs. Their many-core architecture supports a large number of threads that run concurrently to hide the latency among dependent instructions. In modern GPU architectures, each SM/core is typically composed of several sub-cores, where each sub-core has its own independent pipeline. Simulators are a key tool for investigating novel concepts in computer architecture. They must be performance-accurate and have a proper model related to the target hardware to explore the different bottlenecks properly. This paper presents a wide analysis of different parts of Accelsim, a popular GPGPU simulator, and some improvements of its model. First, we focus on the front-end and developed a more realistic model. Then, we analyze the way the result bus works and develop a more realistic one. Next, we describe the current memory pipeline model and propose a model for a more cost-effective design. Finally, we discuss other areas of improvement of the simulator.Keeping control transfer instructions out of the pipeline in architectures without condition codes
http://hdl.handle.net/2117/110641
Keeping control transfer instructions out of the pipeline in architectures without condition codes
Cortadella, Jordi; Llaberia Griñó, José M.; González Colás, Antonio María
The execution of branch instructions involves a loss of performance in pipelined processors. In this paper we present a mechanism for executing this kind of instruction with a zero delay. This mechanism has been proposed for architectures without condition codes.
This work was funded by the Ministry of Education CAICYT under contract Number 314-85
2017-11-15T08:59:52ZCortadella, JordiLlaberia Griñó, José M.González Colás, Antonio MaríaThe execution of branch instructions involves a loss of performance in pipelined processors. In this paper we present a mechanism for executing this kind of instruction with a zero delay. This mechanism has been proposed for architectures without condition codes.Una herramienta automática de feedback para ensamblador
http://hdl.handle.net/2117/22816
Una herramienta automática de feedback para ensamblador
Álvarez Martínez, Carlos; Jiménez González, Daniel; López Álvarez, David; Alonso López, Javier; Tous Liesa, Rubén; Parcerisa Bundó, Joan Manuel; Barlet Ros, Pere; Fernández Barta, Montserrat; Tubella Murgadas, Jordi; Pérez, Christian
Un estudiante de primer curso de Ingeniería en Informática debe adquirir la capacidad de analizar y depurar códigos, tanto a alto nivel como en ensamblador. Este proceso requiere una participación activa por parte de los estudiantes, sobre todo en el laboratorio. Sin embargo, un entorno de laboratorio desconocido y la falta de conocimientos previos llevan a los alumnos a basar su aprendizaje en la realimentación (feedback) ofrecida por el profesor. Esta dependencia hace que los alumnos no utilicen el software de laboratorio para solucionar los problemas de teoría y trabajar de forma autónoma. Además, la mayoría de las preguntas de los estudiantes son siempre las mismas, y pueden ser resueltas de manera fácil y rápida. Sin embargo, el profesor no está disponible en todo momento, por lo que una pregunta que podía resolverse en segundos mantiene al estudiante atascado durante minutos, hasta que dispone del profesor. Para solucionar esta situación se ha desarrollado el entorno SISA-EMU, que permite comprobar rápidamente si una solución es correcta. Además incorpora un sistema de realimentación automático que orienta a los alumnos sobre los errores que han cometido. Con esta herramienta se fomenta que los alumnos trabajen de forma semiautónoma y obtengan un mayor provecho de los problemas realizados.
2014-05-05T13:51:06ZÁlvarez Martínez, CarlosJiménez González, DanielLópez Álvarez, DavidAlonso López, JavierTous Liesa, RubénParcerisa Bundó, Joan ManuelBarlet Ros, PereFernández Barta, MontserratTubella Murgadas, JordiPérez, ChristianUn estudiante de primer curso de Ingeniería en Informática debe adquirir la capacidad de analizar y depurar códigos, tanto a alto nivel como en ensamblador. Este proceso requiere una participación activa por parte de los estudiantes, sobre todo en el laboratorio. Sin embargo, un entorno de laboratorio desconocido y la falta de conocimientos previos llevan a los alumnos a basar su aprendizaje en la realimentación (feedback) ofrecida por el profesor. Esta dependencia hace que los alumnos no utilicen el software de laboratorio para solucionar los problemas de teoría y trabajar de forma autónoma. Además, la mayoría de las preguntas de los estudiantes son siempre las mismas, y pueden ser resueltas de manera fácil y rápida. Sin embargo, el profesor no está disponible en todo momento, por lo que una pregunta que podía resolverse en segundos mantiene al estudiante atascado durante minutos, hasta que dispone del profesor. Para solucionar esta situación se ha desarrollado el entorno SISA-EMU, que permite comprobar rápidamente si una solución es correcta. Además incorpora un sistema de realimentación automático que orienta a los alumnos sobre los errores que han cometido. Con esta herramienta se fomenta que los alumnos trabajen de forma semiautónoma y obtengan un mayor provecho de los problemas realizados.Process variability in sub-16nm bulk CMOS technology
http://hdl.handle.net/2117/15667
Process variability in sub-16nm bulk CMOS technology
Rubio Sola, Jose Antonio; Figueras Pàmies, Joan; Vatajelu, Elena Ioana; Canal Corretger, Ramon
The document is part of deliverable D3.6 of the TRAMS Project (EU FP7 248789), of public nature, and shows and justifies the levels of variability used in the research project for sub-18nm bulk CMOS technologies.
2012-03-26T18:45:53ZRubio Sola, Jose AntonioFigueras Pàmies, JoanVatajelu, Elena IoanaCanal Corretger, RamonThe document is part of deliverable D3.6 of the TRAMS Project (EU FP7 248789), of public nature, and shows and justifies the levels of variability used in the research project for sub-18nm bulk CMOS technologies.Dynamic fine-grain body biasing of caches with latency and leakage 3T1D-based monitors
http://hdl.handle.net/2117/15019
Dynamic fine-grain body biasing of caches with latency and leakage 3T1D-based monitors
Ganapathy, Shrikanth; Canal Corretger, Ramon; González Colás, Antonio María; Rubio Sola, Jose Antonio
In this paper, we propose a dynamically tunable fine-grain body biasing mechanism to reduce active & standby leakage power in caches under process variations.
2012-02-08T12:50:44ZGanapathy, ShrikanthCanal Corretger, RamonGonzález Colás, Antonio MaríaRubio Sola, Jose AntonioIn this paper, we propose a dynamically tunable fine-grain body biasing mechanism to reduce active & standby leakage power in caches under process variations.A selective logging mechanism for hardware transactional memory systems
http://hdl.handle.net/2117/15009
A selective logging mechanism for hardware transactional memory systems
Lupon Navazo, Marc; Magklis, Grigorios; González Colás, Antonio María
Log-based Hardware Transactional Memory (HTM) systems offer an elegant solution to handle speculative data that overflow transactional L1 caches. By keeping the pre-transactional values on a software-resident log, speculative values can be safely moved across the memory hierarchy, without requiring expensive searches on L1 misses or commits.
2012-02-08T11:38:25ZLupon Navazo, MarcMagklis, GrigoriosGonzález Colás, Antonio MaríaLog-based Hardware Transactional Memory (HTM) systems offer an elegant solution to handle speculative data that overflow transactional L1 caches. By keeping the pre-transactional values on a software-resident log, speculative values can be safely moved across the memory hierarchy, without requiring expensive searches on L1 misses or commits.On the effectiveness of hybrid mechanisms on reduction of parametric failures in caches
http://hdl.handle.net/2117/15007
On the effectiveness of hybrid mechanisms on reduction of parametric failures in caches
Ganapathy, Shrikanth; Canal Corretger, Ramon; González Colás, Antonio María; Rubio Sola, Jose Antonio
In this paper, we provide an insight on the different proactive read/write assist methods (wordline boosting & adaptive body biasing) that help in preventing (and reducing) parametric failures when coupled with reactive techniques like ECC and redundancy which cope with already existent failures. While proactive and reactive have been previously viewed as complementary techniques, we show that it is not necessarily the case when considering the benefits of such hybrid schemes.
2012-02-08T11:07:29ZGanapathy, ShrikanthCanal Corretger, RamonGonzález Colás, Antonio MaríaRubio Sola, Jose AntonioIn this paper, we provide an insight on the different proactive read/write assist methods (wordline boosting & adaptive body biasing) that help in preventing (and reducing) parametric failures when coupled with reactive techniques like ECC and redundancy which cope with already existent failures. While proactive and reactive have been previously viewed as complementary techniques, we show that it is not necessarily the case when considering the benefits of such hybrid schemes.Implementing a hybrid SRAM / eDRAM NUCA architecture
http://hdl.handle.net/2117/13932
Implementing a hybrid SRAM / eDRAM NUCA architecture
Lira Rueda, Javier; Molina Clemente, Carlos; Brooks, David; González Colás, Antonio María
In this paper, we propose a hybrid cache architecture that exploits the main features of both memory technologies, speed of SRAM and high density of eDRAM. We demonstrate, that due to the high locality found in emerging applications, a high percentage of data that enters to the on-chip last-level cache are not accessed again before they are replaced
2011-11-16T11:21:21ZLira Rueda, JavierMolina Clemente, CarlosBrooks, DavidGonzález Colás, Antonio MaríaIn this paper, we propose a hybrid cache architecture that exploits the main features of both memory technologies, speed of SRAM and high density of eDRAM. We demonstrate, that due to the high locality found in emerging applications, a high percentage of data that enters to the on-chip last-level cache are not accessed again before they are replacedvPROBE: Variation aware post-silicon power/performance binning using embedded 3T1D cells
http://hdl.handle.net/2117/13911
vPROBE: Variation aware post-silicon power/performance binning using embedded 3T1D cells
Ganapathy, Shrikanth; Canal Corretger, Ramon; González Colás, Antonio María; Rubio Sola, Jose Antonio
In this paper, we present an on-die post-silicon binning methodology that takes into account the effect of static and dynamic variations and categorizes every processor based on power/performance.The proposed scheme is composed of a discretization hardware that exploits the delay/leakage dependence on variability sources characteristic for categorization
2011-11-15T14:28:57ZGanapathy, ShrikanthCanal Corretger, RamonGonzález Colás, Antonio MaríaRubio Sola, Jose AntonioIn this paper, we present an on-die post-silicon binning methodology that takes into account the effect of static and dynamic variations and categorizes every processor based on power/performance.The proposed scheme is composed of a discretization hardware that exploits the delay/leakage dependence on variability sources characteristic for categorization