Enviaments recents

DNA-TEQ: an adaptive exponential quantization of tensors for DNN inference

Khabbazan, Bahareh; Riera Villanueva, Marc; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 2023)
Text en actes de congrés
Accés obert

Quantization is commonly used in Deep Neural Networks (DNNs) to reduce the storage and computational complexity by decreasing the arithmetical precision of activations and weights, a.k.a. tensors. Efficient hardware ...

Boosting point cloud search with a vector unit

Exenberger Becker, Pedro Henrique; Arnau Montañés, José María; González Colás, Antonio María (2023)
Report de recerca
Accés obert

Modern robots collect and process point clouds to perform accurate registration and segmentation. The most time-consuming kernel within point cloud processing -namely neighbor search- relies on appropriate data structures, ...

Analyzing and improving hardware modeling of Accel-Sim

Huerta Gañán, Rodrigo; Abaie Shoushtary, Mojtaba; González Colás, Antonio María (2023-10)
Report de recerca
Accés obert

GPU architectures have become popular for executing generalpurpose programs. Their many-core architecture supports a large number of threads that run concurrently to hide the latency among dependent instructions. In modern ...

δLTA:: Decoupling camera sampling from processing to avoid redundant computations in the vision pipeline

Taranco Serna, Raúl; Arnau Montañés, José María; González Colás, Antonio María (Association for Computing Machinery (ACM), 2023)
Text en actes de congrés
Accés obert

Continuous Vision (CV) systems are essential for emerging applications like Autonomous Driving (AD) and Augmented/Virtual Reality (AR/VR). A standard CV System-on-a-Chip (SoC) pipeline includes a frontend for image capture ...

SLIDEX: Sliding window extension for image processing

Taranco Serna, Raúl; Arnau Montañés, José María; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 2023)
Text en actes de congrés
Accés obert

With the rising need for efficient image processing in emerging applications such as Autonomous Driving (AD) and Augmented/Virtual Reality (AR/VR), many existing solutions do not meet their performance and energy efficiency ...

QeiHaN: An energy-efficient DNN accelerator that leverages log quantization in NDP architectures

Khabbazan, Bahareh; Riera Villanueva, Marc; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 2023)
Comunicació de congrés
Accés obert

The constant growth of DNNs makes them challenging to implement and run efficiently on traditional computecentric architectures. Some works have attempted to enhance accelerators by adding more compute units and on-chip ...

Boustrophedonic frames: Quasi-optimal L2 caching for textures in GPUs

Joseph, Diya; Aragón Alcaraz, Juan Luis; Parcerisa Bundó, Joan Manuel; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 2023)
Text en actes de congrés
Accés obert

Literature is plentiful in works exploiting cache locality for GPUs. A majority of them explore replacement or bypassing policies. In this paper, however, we surpass this exploration by fabricating a formal proof for a ...

Exploiting kernel compression on BNNs

Silfa Feliz, Franyell Antonio; Arnau Montañés, José María; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 2023)
Text en actes de congrés
Accés obert

Binary Neural Networks (BNNs) are showing tremen-dous success on realistic image classification tasks. Notably, their accuracy is similar to the state-of-the-art accuracy obtained by full-precision models tailored to edge ...

K-D Bonsai: ISA-extensions to compress K-D trees for autonomous driving tasks

Exenberger Becker, Pedro Henrique; Arnau Montañés, José María; González Colás, Antonio María (Association for Computing Machinery (ACM), 2023)
Text en actes de congrés
Accés obert

Autonomous Driving (AD) systems extensively manipulate 3D point clouds for object detection and vehicle localization. Thereby, efficient processing of 3D point clouds is crucial in these systems. In this work we propose ...

Lightweight register file caching in collector units for GPUs

Abaie Shoushtary, Mojtaba; Arnau Montañés, José María; Tubella Murgadas, Jordi; González Colás, Antonio María (Association for Computing Machinery (ACM), 2023)
Text en actes de congrés
Accés obert

Modern GPUs benefit from a sizable Register File (RF) to provide fine-grained thread switching. As the RF is huge and accessed frequently, it consumes a considerable share of the dynamic energy of the GPU. Designing a ...

Simple out of order core for GPGPUs

Huerta Gañán, Rodrigo; Arnau Montañés, José María; González Colás, Antonio María (Association for Computing Machinery (ACM), 2023)
Text en actes de congrés
Accés obert

GPU architectures have become popular for executing general-purpose programs which rely on having a large number of threads that run concurrently to hide the latency among dependent instructions. This approach has an ...

SHARP: An adaptable, energy-efficient accelerator for recurrent neural networks

Yazdani Aminabadi, Reza; Ruwase, Olatunji; Zhang, Minjia; He, Yuxiong; Arnau Montañés, José María; González Colás, Antonio María (Association for Computing Machinery (ACM), 2023-01-24)
Article
Accés obert

The effectiveness of Recurrent Neural Networks (RNNs) for tasks such as Automatic Speech Recognition has fostered interest in RNN inference acceleration. Due to the recurrent nature and data dependencies of RNN computations, ...

UPCommons. Portal del coneixement obert de la UPC

ARCO - Microarquitectura i Compiladors: Enviaments recents

DNA-TEQ: an adaptive exponential quantization of tensors for DNN inference

Boosting point cloud search with a vector unit

Analyzing and improving hardware modeling of Accel-Sim

δLTA:: Decoupling camera sampling from processing to avoid redundant computations in the vision pipeline

SLIDEX: Sliding window extension for image processing

QeiHaN: An energy-efficient DNN accelerator that leverages log quantization in NDP architectures

Boustrophedonic frames: Quasi-optimal L2 caching for textures in GPUs

Exploiting kernel compression on BNNs

K-D Bonsai: ISA-extensions to compress K-D trees for autonomous driving tasks

Lightweight register file caching in collector units for GPUs

Simple out of order core for GPGPUs

SHARP: An adaptable, energy-efficient accelerator for recurrent neural networks

Explora

ARCO - Microarquitectura i Compiladors: Enviaments recents

DNA-TEQ: an adaptive exponential quantization of tensors for DNN inference ﻿

Boosting point cloud search with a vector unit ﻿

Analyzing and improving hardware modeling of Accel-Sim ﻿

δLTA:: Decoupling camera sampling from processing to avoid redundant computations in the vision pipeline ﻿

SLIDEX: Sliding window extension for image processing ﻿

QeiHaN: An energy-efficient DNN accelerator that leverages log quantization in NDP architectures ﻿

Boustrophedonic frames: Quasi-optimal L2 caching for textures in GPUs ﻿

Exploiting kernel compression on BNNs ﻿

K-D Bonsai: ISA-extensions to compress K-D trees for autonomous driving tasks ﻿

Lightweight register file caching in collector units for GPUs ﻿

Simple out of order core for GPGPUs ﻿

SHARP: An adaptable, energy-efficient accelerator for recurrent neural networks ﻿

DNA-TEQ: an adaptive exponential quantization of tensors for DNN inference

Boosting point cloud search with a vector unit

Analyzing and improving hardware modeling of Accel-Sim

δLTA:: Decoupling camera sampling from processing to avoid redundant computations in the vision pipeline

SLIDEX: Sliding window extension for image processing

QeiHaN: An energy-efficient DNN accelerator that leverages log quantization in NDP architectures

Boustrophedonic frames: Quasi-optimal L2 caching for textures in GPUs

Exploiting kernel compression on BNNs

K-D Bonsai: ISA-extensions to compress K-D trees for autonomous driving tasks

Lightweight register file caching in collector units for GPUs

Simple out of order core for GPGPUs

SHARP: An adaptable, energy-efficient accelerator for recurrent neural networks