Exploració per autor "Armejach Sanosa, Adrià"
Ara es mostren els items 1-20 de 28
-
A BF16 FMA is all you need for DNN training
Osorio Ríos, John Haiber; Armejach Sanosa, Adrià; Petit, Eric; Henry, Greg; Casas, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2022-07-01)
Article
Accés obertFused Multiply-Add (FMA) functional units constitute a fundamental hardware component to train Deep Neural Networks (DNNs). Its silicon area grows quadratically with the mantissa bit count of the computer number format, ... -
A FM-index transformation to enable large k-steps
Langarita, Rubén; Armejach Sanosa, Adrià; Moretó Planas, Miquel (Barcelona Supercomputing Center, 2019-05-07)
Text en actes de congrés
Accés obert -
A Tensor Marshaling Unit for sparse tensor algebra on general-purpose processors
Siracusa, Marco; Soria Pardos, Víctor; Sgherzi, Francesco; Randall, Joshua; Joseph, Douglas J.; Moretó Planas, Miquel; Armejach Sanosa, Adrià (Association for Computing Machinery (ACM), 2023)
Text en actes de congrés
Accés obertThis paper proposes the Tensor Marshaling Unit (TMU), a near-core programmable dataflow engine for multicore architectures that accelerates tensor traversals and merging, the most critical op-erations of sparse tensor ... -
Characterization of a coherent hardware accelerator framework for SoCs
López Paradís, Guillem; Venu, Balaji; Armejach Sanosa, Adrià; Moretó Planas, Miquel (Springer, 2023)
Text en actes de congrés
Accés restringit per política de l'editorialAccelerators rich architectures have become the standard in today’s SoCs. After Moore’s law diminish, it is common to only dedicate a fraction of the area of the SoC to traditional cores and leave the rest of space for ... -
Compressed sparse FM-index: Fast sequence alignment using large K-steps
Langarita Benítez, Rubén; Armejach Sanosa, Adrià; Setoain, Javier; Ibáñez Marín, Pablo Enrique; Alastruey Benedé, Jesús; Moretó Planas, Miquel (2022-01-01)
Article
Accés obertThe FM-index is a data structure used in genomics for exact search of input sequences over large reference genomes. Algorithms based on the FM-index show an irregular memory access pattern, resulting in a memory bound ... -
Design space exploration of next-generation HPC machines
Gómez Crespo, Constantino; Martínez Palau, Francesc; Armejach Sanosa, Adrià; Moretó Planas, Miquel; Mantovani, Filippo; Casas, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2019)
Text en actes de congrés
Accés restringit per acord de confidencialitatThe landscape of High Performance Computing (HPC) system architectures keeps expanding with new technologies and increased complexity. With the goal of improving the efficiency of next-generation large HPC systems, designers ... -
Design trade-offs for emerging HPC processors based on mobile market technology
Armejach Sanosa, Adrià; Casas, Marc; Moretó Planas, Miquel (2019-09-01)
Article
Accés obertHigh-performance computing (HPC) is at the crossroads of a potential transition toward mobile market processor technology. Unlike in prior transitions, numerous hardware vendors and integrators will have access to ... -
Dynamically adapting floating-point precision to accelerate deep neural network training
Osorio Ríos, John Haiber; Armejach Sanosa, Adrià; Petit, Eric; Henry, Greg; Casas, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2021)
Text en actes de congrés
Accés obertMixed-precision (MP) arithmetic combining both single- and half-precision operands has been successfully applied to train deep neural networks. Despite its advantages in terms of reducing the need for key resources like ... -
DynAMO: Improving parallelism through dynamic placement of atomic memory operations
Soria Pardos, Víctor; Armejach Sanosa, Adrià; Mück, Tiago; Suárez Gracía, Dario; Joao, Jose A.; Rico, Alejandro; Moretó Planas, Miquel (Association for Computing Machinery (ACM), 2023)
Text en actes de congrés
Accés obertWith increasing core counts in modern multi-core designs, the overhead of synchronization jeopardizes the scalability and efficiency of parallel applications. To mitigate these overheads, modern cache-coherent protocols ... -
Efficient direct convolution using long SIMD instructions
Limas Santana, Alexandre de; Armejach Sanosa, Adrià; Casas, Marc (Association for Computing Machinery (ACM), 2023)
Text en actes de congrés
Accés obertThis paper demonstrates that state-of-the-art proposals to compute convolutions on architectures with CPUs supporting SIMD instructions deliver poor performance for long SIMD lengths due to frequent cache conflict misses. ... -
Evaluating mixed-precision arithmetic for 3D generative adversarial networks to simulate high energy physics detectors
Osorio Ríos, John Haiber; Armejach Sanosa, Adrià; Khattak, Gulrukh; Petit, Eric; Vallecorsa, Sofia; Casas, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2020)
Text en actes de congrés
Accés obertSeveral hardware companies are proposing native Brain Float 16-bit (BF16) support for neural network training. The usage of Mixed Precision (MP) arithmetic with floating-point 32-bit (FP32) and 16-bit half-precision aims ... -
Exploration of architectural parameters for future HPC systems
Gómez, Constantino; Martínez, Francesc; Armejach Sanosa, Adrià; Casas, Marc; Mantovani, Filippo; Moretó Planas, Miquel (Barcelona Supercomputing Center, 2019-05-07)
Text en actes de congrés
Accés obert -
FASE: A fast, accurate and seamless emulator for custom numerical formats
Osorio Ríos, John Haiber; Armejach Sanosa, Adrià; Petit, Eric; Henry, Greg; Casas, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2022)
Text en actes de congrés
Accés obertDeep Neural Networks (DNNs) have become ubiquitous in a wide range of application domains. Despite their success, training DNNs is an expensive task that has motivated the use of reduced numerical precision formats to ... -
Fast behavioural RTL simulation of 10B transistor SoC designs with Metro-Mpi
López Paradís, Guillem; Li, Brian; Armejach Sanosa, Adrià; Wallentowitz, Stefan; Moretó Planas, Miquel; Balkind, Jonathan (Institute of Electrical and Electronics Engineers (IEEE), 2023)
Text en actes de congrés
Accés obertChips with tens of billions of transistors have become today's norm. These designs are straining our electronic design automation tools throughout the design process, requiring ever more computational resources. In many ... -
gem5 + rtl: A framework to enable RTL models inside a full-system simulator
López Paradís, Guillem; Armejach Sanosa, Adrià; Moretó Planas, Miquel (Association for Computing Machinery (ACM), 2021)
Text en actes de congrés
Accés obertIn recent years there has been a surge of interest in designing custom accelerators for power-efficient high-performance computing. However, available tools to simulate low-level RTL designs often neglect the target system ... -
Hardware acceleration for query processing: Leveraging FPGAs, CPUs, and memory
Arcas Abella, Oriol; Armejach Sanosa, Adrià; Hayes, Timothy; Malazgirt, Görker Alp; Palomar Pérez, Óscar; Salami, Behzad; Sonmez, Nehir (2016-01)
Article
Accés obertDatabase management systems have become an indispensable tool for industry, government, and academia, and form a significant component of modern datacenters. They can be used in a multitude of scenarios, including online ... -
HARP: Adaptive abort recurrence prediction for Hardware Transactional Memory
Armejach Sanosa, Adrià; Negi, Anurag; Cristal Kestelman, Adrián; Unsal, Osman Sabri; Stenström, Per; Harris, Tim (Institute of Electrical and Electronics Engineers (IEEE), 2013)
Text en actes de congrés
Accés obertHardware Transactional Memory (HTM) exposes parallelism by allowing possibly conflicting sections of code, called transactions, to execute concurrently in multithreaded applications. However, conflicts among concurrent ... -
Implications of non-volatile memory as primary storage for database management systems
Ul Mustafa, Naveed; Armejach Sanosa, Adrià; Ozturk, Ozcan; Cristal Kestelman, Adrián; Unsal, Osman Sabri (IEEE, 2017-01-19)
Text en actes de congrés
Accés obertTraditional Database Management System (DBMS) software relies on hard disks for storing relational data. Hard disks are cheap, persistent, and offer huge storage capacities. However, data retrieval latency for hard disks ... -
Mont-Blanc 2020: Towards scalable and power efficient European HPC processors
Armejach Sanosa, Adrià; Brank, Bine; Cortina Guardia, Jordi; Dolique, François; Hayes, Timothy; Ho, Nam; Lagadec, Pierre-Axel; Lemaire, Romain; López Paradís, Guillem; Marliac, Laurent; Moretó Planas, Miquel; Marcuello Pascual, Pedro; Pleiter, Dirk; Tan, Xubin; Derradji, Said (Institute of Electrical and Electronics Engineers (IEEE), 2021)
Text en actes de congrés
Accés obertThe Mont-Blanc 2020 (MB2020) project has triggered the development of the next generation industrial processor for Big Data and High Performance Computing (HPC). MB2020 is paving the way to the future low-power European ... -
Multilevel simulation-based co-design of next generation HPC microprocessors
Zaourar, Lilia; Benazouz, Mohamed; Mouhagir, Ayoub; Jebali, Fatma; Sassolas, Tanguy; Weill, Jean Christophe; Radulović, Milan; Martínez Palau, Francesc; Armejach Sanosa, Adrià; Casas, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2021)
Text en actes de congrés
Accés obertThis paper demonstrates the combined use of three simulation tools in support of a co-design methodology for an HPC-focused System-on-a-Chip (SoC) design. The simulation tools make different trade-offs between simulation ...