Reports de recerca

Reports de recerca http://hdl.handle.net/2117/3113 Thu, 25 Apr 2024 05:50:54 GMT 2024-04-25T05:50:54Z Boosting point cloud search with a vector unit http://hdl.handle.net/2117/404506 Boosting point cloud search with a vector unit Exenberger Becker, Pedro Henrique; Arnau Montañés, José María; González Colás, Antonio María Modern robots collect and process point clouds to perform accurate registration and segmentation. The most time-consuming kernel within point cloud processing -namely neighbor search- relies on appropriate data structures, such as k-d trees to prune the search space. In this work, we exploit similarity in subsequent k-d tree traversals to improve search on CPUs leveraging vector units. We show the potential for vectorizing k-d tree search using modern vector hardware, and performance benefits with a software-only implementation. Further, we discuss limitations in current architectures and present in-progress ideas to specialize the CPU vector hardware for an efficient k-d tree vector search. Treball presentat a: RoboARCH, Workshop on Robotics Acceleration with Computing Hardware, an MICRO 2023 Workshop. Thu, 14 Mar 2024 09:34:43 GMT http://hdl.handle.net/2117/404506 2024-03-14T09:34:43Z Exenberger Becker, Pedro Henrique Arnau Montañés, José María González Colás, Antonio María Modern robots collect and process point clouds to perform accurate registration and segmentation. The most time-consuming kernel within point cloud processing -namely neighbor search- relies on appropriate data structures, such as k-d trees to prune the search space. In this work, we exploit similarity in subsequent k-d tree traversals to improve search on CPUs leveraging vector units. We show the potential for vectorizing k-d tree search using modern vector hardware, and performance benefits with a software-only implementation. Further, we discuss limitations in current architectures and present in-progress ideas to specialize the CPU vector hardware for an efficient k-d tree vector search. Analyzing and improving hardware modeling of Accel-Sim http://hdl.handle.net/2117/404505 Analyzing and improving hardware modeling of Accel-Sim Huerta Gañán, Rodrigo; Abaie Shoushtary, Mojtaba; González Colás, Antonio María GPU architectures have become popular for executing generalpurpose programs. Their many-core architecture supports a large number of threads that run concurrently to hide the latency among dependent instructions. In modern GPU architectures, each SM/core is typically composed of several sub-cores, where each sub-core has its own independent pipeline. Simulators are a key tool for investigating novel concepts in computer architecture. They must be performance-accurate and have a proper model related to the target hardware to explore the different bottlenecks properly. This paper presents a wide analysis of different parts of Accelsim, a popular GPGPU simulator, and some improvements of its model. First, we focus on the front-end and developed a more realistic model. Then, we analyze the way the result bus works and develop a more realistic one. Next, we describe the current memory pipeline model and propose a model for a more cost-effective design. Finally, we discuss other areas of improvement of the simulator. Treball presentat a: 1st Workshop on Computer Architecture Modeling and Simulation (CAMS 2023) Thu, 14 Mar 2024 09:10:32 GMT http://hdl.handle.net/2117/404505 2024-03-14T09:10:32Z Huerta Gañán, Rodrigo Abaie Shoushtary, Mojtaba González Colás, Antonio María GPU architectures have become popular for executing generalpurpose programs. Their many-core architecture supports a large number of threads that run concurrently to hide the latency among dependent instructions. In modern GPU architectures, each SM/core is typically composed of several sub-cores, where each sub-core has its own independent pipeline. Simulators are a key tool for investigating novel concepts in computer architecture. They must be performance-accurate and have a proper model related to the target hardware to explore the different bottlenecks properly. This paper presents a wide analysis of different parts of Accelsim, a popular GPGPU simulator, and some improvements of its model. First, we focus on the front-end and developed a more realistic model. Then, we analyze the way the result bus works and develop a more realistic one. Next, we describe the current memory pipeline model and propose a model for a more cost-effective design. Finally, we discuss other areas of improvement of the simulator. Keeping control transfer instructions out of the pipeline in architectures without condition codes http://hdl.handle.net/2117/110641 Keeping control transfer instructions out of the pipeline in architectures without condition codes Cortadella, Jordi; Llaberia Griñó, José M.; González Colás, Antonio María The execution of branch instructions involves a loss of performance in pipelined processors. In this paper we present a mechanism for executing this kind of instruction with a zero delay. This mechanism has been proposed for architectures without condition codes. This work was funded by the Ministry of Education CAICYT under contract Number 314-85 Wed, 15 Nov 2017 08:59:52 GMT http://hdl.handle.net/2117/110641 2017-11-15T08:59:52Z Cortadella, Jordi Llaberia Griñó, José M. González Colás, Antonio María The execution of branch instructions involves a loss of performance in pipelined processors. In this paper we present a mechanism for executing this kind of instruction with a zero delay. This mechanism has been proposed for architectures without condition codes. Una herramienta automática de feedback para ensamblador http://hdl.handle.net/2117/22816 Una herramienta automática de feedback para ensamblador Álvarez Martínez, Carlos; Jiménez González, Daniel; López Álvarez, David; Alonso López, Javier; Tous Liesa, Rubén; Parcerisa Bundó, Joan Manuel; Barlet Ros, Pere; Fernández Barta, Montserrat; Tubella Murgadas, Jordi; Pérez, Christian Un estudiante de primer curso de Ingeniería en Informática debe adquirir la capacidad de analizar y depurar códigos, tanto a alto nivel como en ensamblador. Este proceso requiere una participación activa por parte de los estudiantes, sobre todo en el laboratorio. Sin embargo, un entorno de laboratorio desconocido y la falta de conocimientos previos llevan a los alumnos a basar su aprendizaje en la realimentación (feedback) ofrecida por el profesor. Esta dependencia hace que los alumnos no utilicen el software de laboratorio para solucionar los problemas de teoría y trabajar de forma autónoma. Además, la mayoría de las preguntas de los estudiantes son siempre las mismas, y pueden ser resueltas de manera fácil y rápida. Sin embargo, el profesor no está disponible en todo momento, por lo que una pregunta que podía resolverse en segundos mantiene al estudiante atascado durante minutos, hasta que dispone del profesor. Para solucionar esta situación se ha desarrollado el entorno SISA-EMU, que permite comprobar rápidamente si una solución es correcta. Además incorpora un sistema de realimentación automático que orienta a los alumnos sobre los errores que han cometido. Con esta herramienta se fomenta que los alumnos trabajen de forma semiautónoma y obtengan un mayor provecho de los problemas realizados. Mon, 05 May 2014 13:51:06 GMT http://hdl.handle.net/2117/22816 2014-05-05T13:51:06Z Álvarez Martínez, Carlos Jiménez González, Daniel López Álvarez, David Alonso López, Javier Tous Liesa, Rubén Parcerisa Bundó, Joan Manuel Barlet Ros, Pere Fernández Barta, Montserrat Tubella Murgadas, Jordi Pérez, Christian Un estudiante de primer curso de Ingeniería en Informática debe adquirir la capacidad de analizar y depurar códigos, tanto a alto nivel como en ensamblador. Este proceso requiere una participación activa por parte de los estudiantes, sobre todo en el laboratorio. Sin embargo, un entorno de laboratorio desconocido y la falta de conocimientos previos llevan a los alumnos a basar su aprendizaje en la realimentación (feedback) ofrecida por el profesor. Esta dependencia hace que los alumnos no utilicen el software de laboratorio para solucionar los problemas de teoría y trabajar de forma autónoma. Además, la mayoría de las preguntas de los estudiantes son siempre las mismas, y pueden ser resueltas de manera fácil y rápida. Sin embargo, el profesor no está disponible en todo momento, por lo que una pregunta que podía resolverse en segundos mantiene al estudiante atascado durante minutos, hasta que dispone del profesor. Para solucionar esta situación se ha desarrollado el entorno SISA-EMU, que permite comprobar rápidamente si una solución es correcta. Además incorpora un sistema de realimentación automático que orienta a los alumnos sobre los errores que han cometido. Con esta herramienta se fomenta que los alumnos trabajen de forma semiautónoma y obtengan un mayor provecho de los problemas realizados. Process variability in sub-16nm bulk CMOS technology http://hdl.handle.net/2117/15667 Process variability in sub-16nm bulk CMOS technology Rubio Sola, Jose Antonio; Figueras Pàmies, Joan; Vatajelu, Elena Ioana; Canal Corretger, Ramon The document is part of deliverable D3.6 of the TRAMS Project (EU FP7 248789), of public nature, and shows and justifies the levels of variability used in the research project for sub-18nm bulk CMOS technologies. Mon, 26 Mar 2012 18:45:53 GMT http://hdl.handle.net/2117/15667 2012-03-26T18:45:53Z Rubio Sola, Jose Antonio Figueras Pàmies, Joan Vatajelu, Elena Ioana Canal Corretger, Ramon The document is part of deliverable D3.6 of the TRAMS Project (EU FP7 248789), of public nature, and shows and justifies the levels of variability used in the research project for sub-18nm bulk CMOS technologies. Dynamic fine-grain body biasing of caches with latency and leakage 3T1D-based monitors http://hdl.handle.net/2117/15019 Dynamic fine-grain body biasing of caches with latency and leakage 3T1D-based monitors Ganapathy, Shrikanth; Canal Corretger, Ramon; González Colás, Antonio María; Rubio Sola, Jose Antonio In this paper, we propose a dynamically tunable fine-grain body biasing mechanism to reduce active & standby leakage power in caches under process variations. Wed, 08 Feb 2012 12:50:44 GMT http://hdl.handle.net/2117/15019 2012-02-08T12:50:44Z Ganapathy, Shrikanth Canal Corretger, Ramon González Colás, Antonio María Rubio Sola, Jose Antonio In this paper, we propose a dynamically tunable fine-grain body biasing mechanism to reduce active & standby leakage power in caches under process variations. A selective logging mechanism for hardware transactional memory systems http://hdl.handle.net/2117/15009 A selective logging mechanism for hardware transactional memory systems Lupon Navazo, Marc; Magklis, Grigorios; González Colás, Antonio María Log-based Hardware Transactional Memory (HTM) systems offer an elegant solution to handle speculative data that overflow transactional L1 caches. By keeping the pre-transactional values on a software-resident log, speculative values can be safely moved across the memory hierarchy, without requiring expensive searches on L1 misses or commits. Wed, 08 Feb 2012 11:38:25 GMT http://hdl.handle.net/2117/15009 2012-02-08T11:38:25Z Lupon Navazo, Marc Magklis, Grigorios González Colás, Antonio María Log-based Hardware Transactional Memory (HTM) systems offer an elegant solution to handle speculative data that overflow transactional L1 caches. By keeping the pre-transactional values on a software-resident log, speculative values can be safely moved across the memory hierarchy, without requiring expensive searches on L1 misses or commits. On the effectiveness of hybrid mechanisms on reduction of parametric failures in caches http://hdl.handle.net/2117/15007 On the effectiveness of hybrid mechanisms on reduction of parametric failures in caches Ganapathy, Shrikanth; Canal Corretger, Ramon; González Colás, Antonio María; Rubio Sola, Jose Antonio In this paper, we provide an insight on the different proactive read/write assist methods (wordline boosting & adaptive body biasing) that help in preventing (and reducing) parametric failures when coupled with reactive techniques like ECC and redundancy which cope with already existent failures. While proactive and reactive have been previously viewed as complementary techniques, we show that it is not necessarily the case when considering the benefits of such hybrid schemes. Wed, 08 Feb 2012 11:07:29 GMT http://hdl.handle.net/2117/15007 2012-02-08T11:07:29Z Ganapathy, Shrikanth Canal Corretger, Ramon González Colás, Antonio María Rubio Sola, Jose Antonio In this paper, we provide an insight on the different proactive read/write assist methods (wordline boosting & adaptive body biasing) that help in preventing (and reducing) parametric failures when coupled with reactive techniques like ECC and redundancy which cope with already existent failures. While proactive and reactive have been previously viewed as complementary techniques, we show that it is not necessarily the case when considering the benefits of such hybrid schemes. Implementing a hybrid SRAM / eDRAM NUCA architecture http://hdl.handle.net/2117/13932 Implementing a hybrid SRAM / eDRAM NUCA architecture Lira Rueda, Javier; Molina Clemente, Carlos; Brooks, David; González Colás, Antonio María In this paper, we propose a hybrid cache architecture that exploits the main features of both memory technologies, speed of SRAM and high density of eDRAM. We demonstrate, that due to the high locality found in emerging applications, a high percentage of data that enters to the on-chip last-level cache are not accessed again before they are replaced Wed, 16 Nov 2011 11:21:21 GMT http://hdl.handle.net/2117/13932 2011-11-16T11:21:21Z Lira Rueda, Javier Molina Clemente, Carlos Brooks, David González Colás, Antonio María In this paper, we propose a hybrid cache architecture that exploits the main features of both memory technologies, speed of SRAM and high density of eDRAM. We demonstrate, that due to the high locality found in emerging applications, a high percentage of data that enters to the on-chip last-level cache are not accessed again before they are replaced vPROBE: Variation aware post-silicon power/performance binning using embedded 3T1D cells http://hdl.handle.net/2117/13911 vPROBE: Variation aware post-silicon power/performance binning using embedded 3T1D cells Ganapathy, Shrikanth; Canal Corretger, Ramon; González Colás, Antonio María; Rubio Sola, Jose Antonio In this paper, we present an on-die post-silicon binning methodology that takes into account the effect of static and dynamic variations and categorizes every processor based on power/performance.The proposed scheme is composed of a discretization hardware that exploits the delay/leakage dependence on variability sources characteristic for categorization Tue, 15 Nov 2011 14:28:57 GMT http://hdl.handle.net/2117/13911 2011-11-15T14:28:57Z Ganapathy, Shrikanth Canal Corretger, Ramon González Colás, Antonio María Rubio Sola, Jose Antonio In this paper, we present an on-die post-silicon binning methodology that takes into account the effect of static and dynamic variations and categorizes every processor based on power/performance.The proposed scheme is composed of a discretization hardware that exploits the delay/leakage dependence on variability sources characteristic for categorization