Browsing by Author "Armejach Sanosa, Adrià"
Now showing items 1-20 of 30
-
A BF16 FMA is all you need for DNN training
Osorio Ríos, John Haiber; Armejach Sanosa, Adrià; Petit, Eric; Henry, Greg; Casas, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2022-07-01)
Article
Open AccessFused Multiply-Add (FMA) functional units constitute a fundamental hardware component to train Deep Neural Networks (DNNs). Its silicon area grows quadratically with the mantissa bit count of the computer number format, ... -
A FM-index transformation to enable large k-steps
Langarita, Rubén; Armejach Sanosa, Adrià; Moretó Planas, Miquel (Barcelona Supercomputing Center, 2019-05-07)
Conference report
Open Access -
A Tensor Marshaling Unit for sparse tensor algebra on general-purpose processors
Siracusa, Marco; Soria Pardos, Víctor; Sgherzi, Francesco; Randall, Joshua; Joseph, Douglas J.; Moretó Planas, Miquel; Armejach Sanosa, Adrià (Association for Computing Machinery (ACM), 2023)
Conference report
Open AccessThis paper proposes the Tensor Marshaling Unit (TMU), a near-core programmable dataflow engine for multicore architectures that accelerates tensor traversals and merging, the most critical op-erations of sparse tensor ... -
Characterization of a coherent hardware accelerator framework for SoCs
López Paradís, Guillem; Venu, Balaji; Armejach Sanosa, Adrià; Moretó Planas, Miquel (Springer, 2023)
Conference report
Restricted access - publisher's policyAccelerators rich architectures have become the standard in today’s SoCs. After Moore’s law diminish, it is common to only dedicate a fraction of the area of the SoC to traditional cores and leave the rest of space for ... -
Compressed sparse FM-index: Fast sequence alignment using large K-steps
Langarita Benítez, Rubén; Armejach Sanosa, Adrià; Setoain, Javier; Ibáñez Marín, Pablo Enrique; Alastruey Benedé, Jesús; Moretó Planas, Miquel (2022-01-01)
Article
Open AccessThe FM-index is a data structure used in genomics for exact search of input sequences over large reference genomes. Algorithms based on the FM-index show an irregular memory access pattern, resulting in a memory bound ... -
Design space exploration of next-generation HPC machines
Gómez Crespo, Constantino; Martínez Palau, Francesc; Armejach Sanosa, Adrià; Moretó Planas, Miquel; Mantovani, Filippo; Casas, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2019)
Conference report
Restricted access - confidentiality agreementThe landscape of High Performance Computing (HPC) system architectures keeps expanding with new technologies and increased complexity. With the goal of improving the efficiency of next-generation large HPC systems, designers ... -
Design trade-offs for emerging HPC processors based on mobile market technology
Armejach Sanosa, Adrià; Casas, Marc; Moretó Planas, Miquel (2019-09-01)
Article
Open AccessHigh-performance computing (HPC) is at the crossroads of a potential transition toward mobile market processor technology. Unlike in prior transitions, numerous hardware vendors and integrators will have access to ... -
Dynamically adapting floating-point precision to accelerate deep neural network training
Osorio Ríos, John Haiber; Armejach Sanosa, Adrià; Petit, Eric; Henry, Greg; Casas, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2021)
Conference report
Open AccessMixed-precision (MP) arithmetic combining both single- and half-precision operands has been successfully applied to train deep neural networks. Despite its advantages in terms of reducing the need for key resources like ... -
DynAMO: Improving parallelism through dynamic placement of atomic memory operations
Soria Pardos, Víctor; Armejach Sanosa, Adrià; Mück, Tiago; Suárez Gracía, Dario; Joao, Jose A.; Rico, Alejandro; Moretó Planas, Miquel (Association for Computing Machinery (ACM), 2023)
Conference report
Open AccessWith increasing core counts in modern multi-core designs, the overhead of synchronization jeopardizes the scalability and efficiency of parallel applications. To mitigate these overheads, modern cache-coherent protocols ... -
Efficient direct convolution using long SIMD instructions
Limas Santana, Alexandre de; Armejach Sanosa, Adrià; Casas, Marc (Association for Computing Machinery (ACM), 2023)
Conference report
Open AccessThis paper demonstrates that state-of-the-art proposals to compute convolutions on architectures with CPUs supporting SIMD instructions deliver poor performance for long SIMD lengths due to frequent cache conflict misses. ... -
Evaluating mixed-precision arithmetic for 3D generative adversarial networks to simulate high energy physics detectors
Osorio Ríos, John Haiber; Armejach Sanosa, Adrià; Khattak, Gulrukh; Petit, Eric; Vallecorsa, Sofia; Casas, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2020)
Conference report
Open AccessSeveral hardware companies are proposing native Brain Float 16-bit (BF16) support for neural network training. The usage of Mixed Precision (MP) arithmetic with floating-point 32-bit (FP32) and 16-bit half-precision aims ... -
Exploiting vector code semantics for efficient data cache prefetching
Martínez Palau, Francesc; Torrents Lapuerta, Martí; Armejach Sanosa, Adrià; Casas, Marc (Association for Computing Machinery (ACM), 2024)
Conference report
Open AccessEmerging workloads from domains like high performance computing, data analytics or deep learning consume large amounts of memory bandwidth. To mitigate this problem, computing systems include large and deep memory cache ... -
Exploration of architectural parameters for future HPC systems
Gómez, Constantino; Martínez, Francesc; Armejach Sanosa, Adrià; Casas, Marc; Mantovani, Filippo; Moretó Planas, Miquel (Barcelona Supercomputing Center, 2019-05-07)
Conference report
Open Access -
FASE: A fast, accurate and seamless emulator for custom numerical formats
Osorio Ríos, John Haiber; Armejach Sanosa, Adrià; Petit, Eric; Henry, Greg; Casas, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2022)
Conference report
Open AccessDeep Neural Networks (DNNs) have become ubiquitous in a wide range of application domains. Despite their success, training DNNs is an expensive task that has motivated the use of reduced numerical precision formats to ... -
Fast behavioural RTL simulation of 10B transistor SoC designs with Metro-Mpi
López Paradís, Guillem; Li, Brian; Armejach Sanosa, Adrià; Wallentowitz, Stefan; Moretó Planas, Miquel; Balkind, Jonathan (Institute of Electrical and Electronics Engineers (IEEE), 2023)
Conference report
Open AccessChips with tens of billions of transistors have become today's norm. These designs are straining our electronic design automation tools throughout the design process, requiring ever more computational resources. In many ... -
gem5 + rtl: A framework to enable RTL models inside a full-system simulator
López Paradís, Guillem; Armejach Sanosa, Adrià; Moretó Planas, Miquel (Association for Computing Machinery (ACM), 2021)
Conference report
Open AccessIn recent years there has been a surge of interest in designing custom accelerators for power-efficient high-performance computing. However, available tools to simulate low-level RTL designs often neglect the target system ... -
GenArchBench: A genomics benchmark suite for arm HPC processors
López Villellas, Lorien; Langarita Benítez, Rubén; Badouh, Asaf; Soria Pardos, Víctor; Aguado Puig, Quim; López Paradís, Guillem; Doblas Font, Max; Setoain, Javier; Kim, Chulho; Ono, Makoto; Armejach Sanosa, Adrià; Marco Sola, Santiago; Alastruey Benedé, Jesús; Ibáñez Marín, Pablo; Moretó Planas, Miquel (Elsevier, 2024-08)
Article
Open AccessArm usage has substantially grown in the High-Performance Computing (HPC) community. Japanese supercomputer Fugaku, powered by Arm-based A64FX processors, held the top position on the Top500 list between June 2020 and June ... -
Hardware acceleration for query processing: Leveraging FPGAs, CPUs, and memory
Arcas Abella, Oriol; Armejach Sanosa, Adrià; Hayes, Timothy; Malazgirt, Görker Alp; Palomar Pérez, Óscar; Salami, Behzad; Sonmez, Nehir (2016-01)
Article
Open AccessDatabase management systems have become an indispensable tool for industry, government, and academia, and form a significant component of modern datacenters. They can be used in a multitude of scenarios, including online ... -
HARP: Adaptive abort recurrence prediction for Hardware Transactional Memory
Armejach Sanosa, Adrià; Negi, Anurag; Cristal Kestelman, Adrián; Unsal, Osman Sabri; Stenström, Per; Harris, Tim (Institute of Electrical and Electronics Engineers (IEEE), 2013)
Conference report
Open AccessHardware Transactional Memory (HTM) exposes parallelism by allowing possibly conflicting sections of code, called transactions, to execute concurrently in multithreaded applications. However, conflicts among concurrent ... -
Implications of non-volatile memory as primary storage for database management systems
Ul Mustafa, Naveed; Armejach Sanosa, Adrià; Ozturk, Ozcan; Cristal Kestelman, Adrián; Unsal, Osman Sabri (IEEE, 2017-01-19)
Conference report
Open AccessTraditional Database Management System (DBMS) software relies on hard disks for storing relational data. Hard disks are cheap, persistent, and offer huge storage capacities. However, data retrieval latency for hard disks ...