Ara es mostren els items 1-20 de 28

    • A BF16 FMA is all you need for DNN training 

      Osorio Ríos, John Haiber; Armejach Sanosa, Adrià; Petit, Eric; Henry, Greg; Casas, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2022-07-01)
      Article
      Accés obert
      Fused Multiply-Add (FMA) functional units constitute a fundamental hardware component to train Deep Neural Networks (DNNs). Its silicon area grows quadratically with the mantissa bit count of the computer number format, ...
    • A FM-index transformation to enable large k-steps 

      Langarita, Rubén; Armejach Sanosa, Adrià; Moretó Planas, Miquel (Barcelona Supercomputing Center, 2019-05-07)
      Text en actes de congrés
      Accés obert
    • A Tensor Marshaling Unit for sparse tensor algebra on general-purpose processors 

      Siracusa, Marco; Soria Pardos, Víctor; Sgherzi, Francesco; Randall, Joshua; Joseph, Douglas J.; Moretó Planas, Miquel; Armejach Sanosa, Adrià (Association for Computing Machinery (ACM), 2023)
      Text en actes de congrés
      Accés obert
      This paper proposes the Tensor Marshaling Unit (TMU), a near-core programmable dataflow engine for multicore architectures that accelerates tensor traversals and merging, the most critical op-erations of sparse tensor ...
    • Characterization of a coherent hardware accelerator framework for SoCs 

      López Paradís, Guillem; Venu, Balaji; Armejach Sanosa, Adrià; Moretó Planas, Miquel (Springer, 2023)
      Text en actes de congrés
      Accés restringit per política de l'editorial
      Accelerators rich architectures have become the standard in today’s SoCs. After Moore’s law diminish, it is common to only dedicate a fraction of the area of the SoC to traditional cores and leave the rest of space for ...
    • Compressed sparse FM-index: Fast sequence alignment using large K-steps 

      Langarita Benítez, Rubén; Armejach Sanosa, Adrià; Setoain, Javier; Ibáñez Marín, Pablo Enrique; Alastruey Benedé, Jesús; Moretó Planas, Miquel (2022-01-01)
      Article
      Accés obert
      The FM-index is a data structure used in genomics for exact search of input sequences over large reference genomes. Algorithms based on the FM-index show an irregular memory access pattern, resulting in a memory bound ...
    • Design space exploration of next-generation HPC machines 

      Gómez Crespo, Constantino; Martínez Palau, Francesc; Armejach Sanosa, Adrià; Moretó Planas, Miquel; Mantovani, Filippo; Casas, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2019)
      Text en actes de congrés
      Accés restringit per acord de confidencialitat
      The landscape of High Performance Computing (HPC) system architectures keeps expanding with new technologies and increased complexity. With the goal of improving the efficiency of next-generation large HPC systems, designers ...
    • Design trade-offs for emerging HPC processors based on mobile market technology 

      Armejach Sanosa, Adrià; Casas, Marc; Moretó Planas, Miquel (2019-09-01)
      Article
      Accés obert
      High-performance computing (HPC) is at the crossroads of a potential transition toward mobile market processor technology. Unlike in prior transitions, numerous hardware vendors and integrators will have access to ...
    • Dynamically adapting floating-point precision to accelerate deep neural network training 

      Osorio Ríos, John Haiber; Armejach Sanosa, Adrià; Petit, Eric; Henry, Greg; Casas, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2021)
      Text en actes de congrés
      Accés obert
      Mixed-precision (MP) arithmetic combining both single- and half-precision operands has been successfully applied to train deep neural networks. Despite its advantages in terms of reducing the need for key resources like ...
    • DynAMO: Improving parallelism through dynamic placement of atomic memory operations 

      Soria Pardos, Víctor; Armejach Sanosa, Adrià; Mück, Tiago; Suárez Gracía, Dario; Joao, Jose A.; Rico, Alejandro; Moretó Planas, Miquel (Association for Computing Machinery (ACM), 2023)
      Text en actes de congrés
      Accés obert
      With increasing core counts in modern multi-core designs, the overhead of synchronization jeopardizes the scalability and efficiency of parallel applications. To mitigate these overheads, modern cache-coherent protocols ...
    • Efficient direct convolution using long SIMD instructions 

      Limas Santana, Alexandre de; Armejach Sanosa, Adrià; Casas, Marc (Association for Computing Machinery (ACM), 2023)
      Text en actes de congrés
      Accés obert
      This paper demonstrates that state-of-the-art proposals to compute convolutions on architectures with CPUs supporting SIMD instructions deliver poor performance for long SIMD lengths due to frequent cache conflict misses. ...
    • Evaluating mixed-precision arithmetic for 3D generative adversarial networks to simulate high energy physics detectors 

      Osorio Ríos, John Haiber; Armejach Sanosa, Adrià; Khattak, Gulrukh; Petit, Eric; Vallecorsa, Sofia; Casas, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2020)
      Text en actes de congrés
      Accés obert
      Several hardware companies are proposing native Brain Float 16-bit (BF16) support for neural network training. The usage of Mixed Precision (MP) arithmetic with floating-point 32-bit (FP32) and 16-bit half-precision aims ...
    • Exploration of architectural parameters for future HPC systems 

      Gómez, Constantino; Martínez, Francesc; Armejach Sanosa, Adrià; Casas, Marc; Mantovani, Filippo; Moretó Planas, Miquel (Barcelona Supercomputing Center, 2019-05-07)
      Text en actes de congrés
      Accés obert
    • FASE: A fast, accurate and seamless emulator for custom numerical formats 

      Osorio Ríos, John Haiber; Armejach Sanosa, Adrià; Petit, Eric; Henry, Greg; Casas, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2022)
      Text en actes de congrés
      Accés obert
      Deep Neural Networks (DNNs) have become ubiquitous in a wide range of application domains. Despite their success, training DNNs is an expensive task that has motivated the use of reduced numerical precision formats to ...
    • Fast behavioural RTL simulation of 10B transistor SoC designs with Metro-Mpi 

      López Paradís, Guillem; Li, Brian; Armejach Sanosa, Adrià; Wallentowitz, Stefan; Moretó Planas, Miquel; Balkind, Jonathan (Institute of Electrical and Electronics Engineers (IEEE), 2023)
      Text en actes de congrés
      Accés obert
      Chips with tens of billions of transistors have become today's norm. These designs are straining our electronic design automation tools throughout the design process, requiring ever more computational resources. In many ...
    • gem5 + rtl: A framework to enable RTL models inside a full-system simulator 

      López Paradís, Guillem; Armejach Sanosa, Adrià; Moretó Planas, Miquel (Association for Computing Machinery (ACM), 2021)
      Text en actes de congrés
      Accés obert
      In recent years there has been a surge of interest in designing custom accelerators for power-efficient high-performance computing. However, available tools to simulate low-level RTL designs often neglect the target system ...
    • Hardware acceleration for query processing: Leveraging FPGAs, CPUs, and memory 

      Arcas Abella, Oriol; Armejach Sanosa, Adrià; Hayes, Timothy; Malazgirt, Görker Alp; Palomar Pérez, Óscar; Salami, Behzad; Sonmez, Nehir (2016-01)
      Article
      Accés obert
      Database management systems have become an indispensable tool for industry, government, and academia, and form a significant component of modern datacenters. They can be used in a multitude of scenarios, including online ...
    • HARP: Adaptive abort recurrence prediction for Hardware Transactional Memory 

      Armejach Sanosa, Adrià; Negi, Anurag; Cristal Kestelman, Adrián; Unsal, Osman Sabri; Stenström, Per; Harris, Tim (Institute of Electrical and Electronics Engineers (IEEE), 2013)
      Text en actes de congrés
      Accés obert
      Hardware Transactional Memory (HTM) exposes parallelism by allowing possibly conflicting sections of code, called transactions, to execute concurrently in multithreaded applications. However, conflicts among concurrent ...
    • Implications of non-volatile memory as primary storage for database management systems 

      Ul Mustafa, Naveed; Armejach Sanosa, Adrià; Ozturk, Ozcan; Cristal Kestelman, Adrián; Unsal, Osman Sabri (IEEE, 2017-01-19)
      Text en actes de congrés
      Accés obert
      Traditional Database Management System (DBMS) software relies on hard disks for storing relational data. Hard disks are cheap, persistent, and offer huge storage capacities. However, data retrieval latency for hard disks ...
    • Mont-Blanc 2020: Towards scalable and power efficient European HPC processors 

      Armejach Sanosa, Adrià; Brank, Bine; Cortina Guardia, Jordi; Dolique, François; Hayes, Timothy; Ho, Nam; Lagadec, Pierre-Axel; Lemaire, Romain; López Paradís, Guillem; Marliac, Laurent; Moretó Planas, Miquel; Marcuello Pascual, Pedro; Pleiter, Dirk; Tan, Xubin; Derradji, Said (Institute of Electrical and Electronics Engineers (IEEE), 2021)
      Text en actes de congrés
      Accés obert
      The Mont-Blanc 2020 (MB2020) project has triggered the development of the next generation industrial processor for Big Data and High Performance Computing (HPC). MB2020 is paving the way to the future low-power European ...
    • Multilevel simulation-based co-design of next generation HPC microprocessors 

      Zaourar, Lilia; Benazouz, Mohamed; Mouhagir, Ayoub; Jebali, Fatma; Sassolas, Tanguy; Weill, Jean Christophe; Radulović, Milan; Martínez Palau, Francesc; Armejach Sanosa, Adrià; Casas, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2021)
      Text en actes de congrés
      Accés obert
      This paper demonstrates the combined use of three simulation tools in support of a co-design methodology for an HPC-focused System-on-a-Chip (SoC) design. The simulation tools make different trade-offs between simulation ...