Now showing items 1-20 of 30

    • A BF16 FMA is all you need for DNN training 

      Osorio Ríos, John Haiber; Armejach Sanosa, Adrià; Petit, Eric; Henry, Greg; Casas, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2022-07-01)
      Article
      Open Access
      Fused Multiply-Add (FMA) functional units constitute a fundamental hardware component to train Deep Neural Networks (DNNs). Its silicon area grows quadratically with the mantissa bit count of the computer number format, ...
    • A FM-index transformation to enable large k-steps 

      Langarita, Rubén; Armejach Sanosa, Adrià; Moretó Planas, Miquel (Barcelona Supercomputing Center, 2019-05-07)
      Conference report
      Open Access
    • A Tensor Marshaling Unit for sparse tensor algebra on general-purpose processors 

      Siracusa, Marco; Soria Pardos, Víctor; Sgherzi, Francesco; Randall, Joshua; Joseph, Douglas J.; Moretó Planas, Miquel; Armejach Sanosa, Adrià (Association for Computing Machinery (ACM), 2023)
      Conference report
      Open Access
      This paper proposes the Tensor Marshaling Unit (TMU), a near-core programmable dataflow engine for multicore architectures that accelerates tensor traversals and merging, the most critical op-erations of sparse tensor ...
    • Characterization of a coherent hardware accelerator framework for SoCs 

      López Paradís, Guillem; Venu, Balaji; Armejach Sanosa, Adrià; Moretó Planas, Miquel (Springer, 2023)
      Conference report
      Restricted access - publisher's policy
      Accelerators rich architectures have become the standard in today’s SoCs. After Moore’s law diminish, it is common to only dedicate a fraction of the area of the SoC to traditional cores and leave the rest of space for ...
    • Compressed sparse FM-index: Fast sequence alignment using large K-steps 

      Langarita Benítez, Rubén; Armejach Sanosa, Adrià; Setoain, Javier; Ibáñez Marín, Pablo Enrique; Alastruey Benedé, Jesús; Moretó Planas, Miquel (2022-01-01)
      Article
      Open Access
      The FM-index is a data structure used in genomics for exact search of input sequences over large reference genomes. Algorithms based on the FM-index show an irregular memory access pattern, resulting in a memory bound ...
    • Design space exploration of next-generation HPC machines 

      Gómez Crespo, Constantino; Martínez Palau, Francesc; Armejach Sanosa, Adrià; Moretó Planas, Miquel; Mantovani, Filippo; Casas, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2019)
      Conference report
      Restricted access - confidentiality agreement
      The landscape of High Performance Computing (HPC) system architectures keeps expanding with new technologies and increased complexity. With the goal of improving the efficiency of next-generation large HPC systems, designers ...
    • Design trade-offs for emerging HPC processors based on mobile market technology 

      Armejach Sanosa, Adrià; Casas, Marc; Moretó Planas, Miquel (2019-09-01)
      Article
      Open Access
      High-performance computing (HPC) is at the crossroads of a potential transition toward mobile market processor technology. Unlike in prior transitions, numerous hardware vendors and integrators will have access to ...
    • Dynamically adapting floating-point precision to accelerate deep neural network training 

      Osorio Ríos, John Haiber; Armejach Sanosa, Adrià; Petit, Eric; Henry, Greg; Casas, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2021)
      Conference report
      Open Access
      Mixed-precision (MP) arithmetic combining both single- and half-precision operands has been successfully applied to train deep neural networks. Despite its advantages in terms of reducing the need for key resources like ...
    • DynAMO: Improving parallelism through dynamic placement of atomic memory operations 

      Soria Pardos, Víctor; Armejach Sanosa, Adrià; Mück, Tiago; Suárez Gracía, Dario; Joao, Jose A.; Rico, Alejandro; Moretó Planas, Miquel (Association for Computing Machinery (ACM), 2023)
      Conference report
      Open Access
      With increasing core counts in modern multi-core designs, the overhead of synchronization jeopardizes the scalability and efficiency of parallel applications. To mitigate these overheads, modern cache-coherent protocols ...
    • Efficient direct convolution using long SIMD instructions 

      Limas Santana, Alexandre de; Armejach Sanosa, Adrià; Casas, Marc (Association for Computing Machinery (ACM), 2023)
      Conference report
      Open Access
      This paper demonstrates that state-of-the-art proposals to compute convolutions on architectures with CPUs supporting SIMD instructions deliver poor performance for long SIMD lengths due to frequent cache conflict misses. ...
    • Evaluating mixed-precision arithmetic for 3D generative adversarial networks to simulate high energy physics detectors 

      Osorio Ríos, John Haiber; Armejach Sanosa, Adrià; Khattak, Gulrukh; Petit, Eric; Vallecorsa, Sofia; Casas, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2020)
      Conference report
      Open Access
      Several hardware companies are proposing native Brain Float 16-bit (BF16) support for neural network training. The usage of Mixed Precision (MP) arithmetic with floating-point 32-bit (FP32) and 16-bit half-precision aims ...
    • Exploiting vector code semantics for efficient data cache prefetching 

      Martínez Palau, Francesc; Torrents Lapuerta, Martí; Armejach Sanosa, Adrià; Casas, Marc (Association for Computing Machinery (ACM), 2024)
      Conference report
      Open Access
      Emerging workloads from domains like high performance computing, data analytics or deep learning consume large amounts of memory bandwidth. To mitigate this problem, computing systems include large and deep memory cache ...
    • Exploration of architectural parameters for future HPC systems 

      Gómez, Constantino; Martínez, Francesc; Armejach Sanosa, Adrià; Casas, Marc; Mantovani, Filippo; Moretó Planas, Miquel (Barcelona Supercomputing Center, 2019-05-07)
      Conference report
      Open Access
    • FASE: A fast, accurate and seamless emulator for custom numerical formats 

      Osorio Ríos, John Haiber; Armejach Sanosa, Adrià; Petit, Eric; Henry, Greg; Casas, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2022)
      Conference report
      Open Access
      Deep Neural Networks (DNNs) have become ubiquitous in a wide range of application domains. Despite their success, training DNNs is an expensive task that has motivated the use of reduced numerical precision formats to ...
    • Fast behavioural RTL simulation of 10B transistor SoC designs with Metro-Mpi 

      López Paradís, Guillem; Li, Brian; Armejach Sanosa, Adrià; Wallentowitz, Stefan; Moretó Planas, Miquel; Balkind, Jonathan (Institute of Electrical and Electronics Engineers (IEEE), 2023)
      Conference report
      Open Access
      Chips with tens of billions of transistors have become today's norm. These designs are straining our electronic design automation tools throughout the design process, requiring ever more computational resources. In many ...
    • gem5 + rtl: A framework to enable RTL models inside a full-system simulator 

      López Paradís, Guillem; Armejach Sanosa, Adrià; Moretó Planas, Miquel (Association for Computing Machinery (ACM), 2021)
      Conference report
      Open Access
      In recent years there has been a surge of interest in designing custom accelerators for power-efficient high-performance computing. However, available tools to simulate low-level RTL designs often neglect the target system ...
    • GenArchBench: A genomics benchmark suite for arm HPC processors 

      López Villellas, Lorien; Langarita Benítez, Rubén; Badouh, Asaf; Soria Pardos, Víctor; Aguado Puig, Quim; López Paradís, Guillem; Doblas Font, Max; Setoain, Javier; Kim, Chulho; Ono, Makoto; Armejach Sanosa, Adrià; Marco Sola, Santiago; Alastruey Benedé, Jesús; Ibáñez Marín, Pablo; Moretó Planas, Miquel (Elsevier, 2024-08)
      Article
      Open Access
      Arm usage has substantially grown in the High-Performance Computing (HPC) community. Japanese supercomputer Fugaku, powered by Arm-based A64FX processors, held the top position on the Top500 list between June 2020 and June ...
    • Hardware acceleration for query processing: Leveraging FPGAs, CPUs, and memory 

      Arcas Abella, Oriol; Armejach Sanosa, Adrià; Hayes, Timothy; Malazgirt, Görker Alp; Palomar Pérez, Óscar; Salami, Behzad; Sonmez, Nehir (2016-01)
      Article
      Open Access
      Database management systems have become an indispensable tool for industry, government, and academia, and form a significant component of modern datacenters. They can be used in a multitude of scenarios, including online ...
    • HARP: Adaptive abort recurrence prediction for Hardware Transactional Memory 

      Armejach Sanosa, Adrià; Negi, Anurag; Cristal Kestelman, Adrián; Unsal, Osman Sabri; Stenström, Per; Harris, Tim (Institute of Electrical and Electronics Engineers (IEEE), 2013)
      Conference report
      Open Access
      Hardware Transactional Memory (HTM) exposes parallelism by allowing possibly conflicting sections of code, called transactions, to execute concurrently in multithreaded applications. However, conflicts among concurrent ...
    • Implications of non-volatile memory as primary storage for database management systems 

      Ul Mustafa, Naveed; Armejach Sanosa, Adrià; Ozturk, Ozcan; Cristal Kestelman, Adrián; Unsal, Osman Sabri (IEEE, 2017-01-19)
      Conference report
      Open Access
      Traditional Database Management System (DBMS) software relies on hard disks for storing relational data. Hard disks are cheap, persistent, and offer huge storage capacities. However, data retrieval latency for hard disks ...