Exploració per autor "Armejach Sanosa, Adrià"

A BF16 FMA is all you need for DNN training

Osorio Ríos, John Haiber; Armejach Sanosa, Adrià; Petit, Eric; Henry, Greg; Casas, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2022-07-01)
Article
Accés obert

Fused Multiply-Add (FMA) functional units constitute a fundamental hardware component to train Deep Neural Networks (DNNs). Its silicon area grows quadratically with the mantissa bit count of the computer number format, ...

A FM-index transformation to enable large k-steps

Langarita, Rubén; Armejach Sanosa, Adrià; Moretó Planas, Miquel (Barcelona Supercomputing Center, 2019-05-07)
Text en actes de congrés
Accés obert

A Tensor Marshaling Unit for sparse tensor algebra on general-purpose processors

Siracusa, Marco; Soria Pardos, Víctor; Sgherzi, Francesco; Randall, Joshua; Joseph, Douglas J.; Moretó Planas, Miquel; Armejach Sanosa, Adrià (Association for Computing Machinery (ACM), 2023)
Text en actes de congrés
Accés obert

This paper proposes the Tensor Marshaling Unit (TMU), a near-core programmable dataflow engine for multicore architectures that accelerates tensor traversals and merging, the most critical op-erations of sparse tensor ...

Characterization of a coherent hardware accelerator framework for SoCs

López Paradís, Guillem; Venu, Balaji; Armejach Sanosa, Adrià; Moretó Planas, Miquel (Springer, 2023)
Text en actes de congrés
Accés restringit per política de l'editorial

Accelerators rich architectures have become the standard in today’s SoCs. After Moore’s law diminish, it is common to only dedicate a fraction of the area of the SoC to traditional cores and leave the rest of space for ...

Compressed sparse FM-index: Fast sequence alignment using large K-steps

Langarita Benítez, Rubén; Armejach Sanosa, Adrià; Setoain, Javier; Ibáñez Marín, Pablo Enrique; Alastruey Benedé, Jesús; Moretó Planas, Miquel (2022-01-01)
Article
Accés obert

The FM-index is a data structure used in genomics for exact search of input sequences over large reference genomes. Algorithms based on the FM-index show an irregular memory access pattern, resulting in a memory bound ...

Design space exploration of next-generation HPC machines

Gómez Crespo, Constantino; Martínez Palau, Francesc; Armejach Sanosa, Adrià; Moretó Planas, Miquel; Mantovani, Filippo; Casas, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2019)
Text en actes de congrés
Accés restringit per acord de confidencialitat

The landscape of High Performance Computing (HPC) system architectures keeps expanding with new technologies and increased complexity. With the goal of improving the efficiency of next-generation large HPC systems, designers ...

Design trade-offs for emerging HPC processors based on mobile market technology

Armejach Sanosa, Adrià; Casas, Marc; Moretó Planas, Miquel (2019-09-01)
Article
Accés obert

High-performance computing (HPC) is at the crossroads of a potential transition toward mobile market processor technology. Unlike in prior transitions, numerous hardware vendors and integrators will have access to ...

Dynamically adapting floating-point precision to accelerate deep neural network training

Osorio Ríos, John Haiber; Armejach Sanosa, Adrià; Petit, Eric; Henry, Greg; Casas, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2021)
Text en actes de congrés
Accés obert

Mixed-precision (MP) arithmetic combining both single- and half-precision operands has been successfully applied to train deep neural networks. Despite its advantages in terms of reducing the need for key resources like ...

DynAMO: Improving parallelism through dynamic placement of atomic memory operations

Soria Pardos, Víctor; Armejach Sanosa, Adrià; Mück, Tiago; Suárez Gracía, Dario; Joao, Jose A.; Rico, Alejandro; Moretó Planas, Miquel (Association for Computing Machinery (ACM), 2023)
Text en actes de congrés
Accés obert

With increasing core counts in modern multi-core designs, the overhead of synchronization jeopardizes the scalability and efficiency of parallel applications. To mitigate these overheads, modern cache-coherent protocols ...

Efficient direct convolution using long SIMD instructions

Limas Santana, Alexandre de; Armejach Sanosa, Adrià; Casas, Marc (Association for Computing Machinery (ACM), 2023)
Text en actes de congrés
Accés obert

This paper demonstrates that state-of-the-art proposals to compute convolutions on architectures with CPUs supporting SIMD instructions deliver poor performance for long SIMD lengths due to frequent cache conflict misses. ...

Evaluating mixed-precision arithmetic for 3D generative adversarial networks to simulate high energy physics detectors

Osorio Ríos, John Haiber; Armejach Sanosa, Adrià; Khattak, Gulrukh; Petit, Eric; Vallecorsa, Sofia; Casas, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2020)
Text en actes de congrés
Accés obert

Several hardware companies are proposing native Brain Float 16-bit (BF16) support for neural network training. The usage of Mixed Precision (MP) arithmetic with floating-point 32-bit (FP32) and 16-bit half-precision aims ...

Exploration of architectural parameters for future HPC systems

Gómez, Constantino; Martínez, Francesc; Armejach Sanosa, Adrià; Casas, Marc; Mantovani, Filippo; Moretó Planas, Miquel (Barcelona Supercomputing Center, 2019-05-07)
Text en actes de congrés
Accés obert

FASE: A fast, accurate and seamless emulator for custom numerical formats

Osorio Ríos, John Haiber; Armejach Sanosa, Adrià; Petit, Eric; Henry, Greg; Casas, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2022)
Text en actes de congrés
Accés obert

Deep Neural Networks (DNNs) have become ubiquitous in a wide range of application domains. Despite their success, training DNNs is an expensive task that has motivated the use of reduced numerical precision formats to ...

Fast behavioural RTL simulation of 10B transistor SoC designs with Metro-Mpi

López Paradís, Guillem; Li, Brian; Armejach Sanosa, Adrià; Wallentowitz, Stefan; Moretó Planas, Miquel; Balkind, Jonathan (Institute of Electrical and Electronics Engineers (IEEE), 2023)
Text en actes de congrés
Accés obert

Chips with tens of billions of transistors have become today's norm. These designs are straining our electronic design automation tools throughout the design process, requiring ever more computational resources. In many ...

gem5 + rtl: A framework to enable RTL models inside a full-system simulator

López Paradís, Guillem; Armejach Sanosa, Adrià; Moretó Planas, Miquel (Association for Computing Machinery (ACM), 2021)
Text en actes de congrés
Accés obert

In recent years there has been a surge of interest in designing custom accelerators for power-efficient high-performance computing. However, available tools to simulate low-level RTL designs often neglect the target system ...

Hardware acceleration for query processing: Leveraging FPGAs, CPUs, and memory

Arcas Abella, Oriol; Armejach Sanosa, Adrià; Hayes, Timothy; Malazgirt, Görker Alp; Palomar Pérez, Óscar; Salami, Behzad; Sonmez, Nehir (2016-01)
Article
Accés obert

Database management systems have become an indispensable tool for industry, government, and academia, and form a significant component of modern datacenters. They can be used in a multitude of scenarios, including online ...

HARP: Adaptive abort recurrence prediction for Hardware Transactional Memory

Armejach Sanosa, Adrià; Negi, Anurag; Cristal Kestelman, Adrián; Unsal, Osman Sabri; Stenström, Per; Harris, Tim (Institute of Electrical and Electronics Engineers (IEEE), 2013)
Text en actes de congrés
Accés obert

Hardware Transactional Memory (HTM) exposes parallelism by allowing possibly conflicting sections of code, called transactions, to execute concurrently in multithreaded applications. However, conflicts among concurrent ...

Implications of non-volatile memory as primary storage for database management systems

Ul Mustafa, Naveed; Armejach Sanosa, Adrià; Ozturk, Ozcan; Cristal Kestelman, Adrián; Unsal, Osman Sabri (IEEE, 2017-01-19)
Text en actes de congrés
Accés obert

Traditional Database Management System (DBMS) software relies on hard disks for storing relational data. Hard disks are cheap, persistent, and offer huge storage capacities. However, data retrieval latency for hard disks ...

Mont-Blanc 2020: Towards scalable and power efficient European HPC processors

Armejach Sanosa, Adrià; Brank, Bine; Cortina Guardia, Jordi; Dolique, François; Hayes, Timothy; Ho, Nam; Lagadec, Pierre-Axel; Lemaire, Romain; López Paradís, Guillem; Marliac, Laurent; Moretó Planas, Miquel; Marcuello Pascual, Pedro; Pleiter, Dirk; Tan, Xubin; Derradji, Said (Institute of Electrical and Electronics Engineers (IEEE), 2021)
Text en actes de congrés
Accés obert

The Mont-Blanc 2020 (MB2020) project has triggered the development of the next generation industrial processor for Big Data and High Performance Computing (HPC). MB2020 is paving the way to the future low-power European ...

Multilevel simulation-based co-design of next generation HPC microprocessors

Zaourar, Lilia; Benazouz, Mohamed; Mouhagir, Ayoub; Jebali, Fatma; Sassolas, Tanguy; Weill, Jean Christophe; Radulović, Milan; Martínez Palau, Francesc; Armejach Sanosa, Adrià; Casas, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2021)
Text en actes de congrés
Accés obert

This paper demonstrates the combined use of three simulation tools in support of a co-design methodology for an HPC-focused System-on-a-Chip (SoC) design. The simulation tools make different trade-offs between simulation ...