Browsing by Author "Casas Guix, Marc"
Now showing items 1-20 of 51
-
A BF16 FMA is all you need for DNN training
Osorio Ríos, John Haiber; Armejach Sanosa, Adrià; Petit, Eric; Henry, Greg; Casas Guix, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2022-07-01)
Article
Open AccessFused Multiply-Add (FMA) functional units constitute a fundamental hardware component to train Deep Neural Networks (DNNs). Its silicon area grows quadratically with the mantissa bit count of the computer number format, ... -
A generator of numerically-tailored and high-throughput accelerators for batched GEMMs
Ledoux Pardo, Luis Eduardo; Casas Guix, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2022)
Conference report
Open AccessWe propose a hardware generator of GEMM accelerators. Our generator produces vendor-agnostic HDL describing highly customizable systolic arrays guided by accuracy and energy efficiency goals. The generated arrays have three ... -
Adaptive and application dependent runtime guided hardware prefetcher reconfiguration on the IBM Power7
Prat Robles, David; Ortega Carrasco, Cristobal; Casas Guix, Marc; Moreto Planas, Miquel; Valero Cortés, Mateo (2015)
Conference report
Open Access -
An optimized predication execution for SIMD extensions
Barredo Ferreira, Adrián; Cebrián González, Juan Manuel; Moreto Planas, Miquel; Casas Guix, Marc; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2019)
Conference lecture
Open AccessVector processing is a widely used technique to improve performance and energy efficiency in modern processors. Most of them rely on predication to support divergence control. However, performance and energy consumption ... -
Autoencoders for semi-supervised water level modeling in sewer pipes with sparse labeled data
Plana Rius, Ferran; Philipsen, Mark P.; Mirats Tur, Josep Maria; Moeslund, Thomas; Angulo Bahón, Cecilio; Casas Guix, Marc (2022-01-24)
Article
Open AccessMore frequent and thorough inspection of sewer pipes has the potential to save billions in utilities. However, the amount and quality of inspection are impeded by an imprecise and highly subjective manual process. It ... -
Automatic structure extraction from MPI applications tracefiles
Casas Guix, Marc; Badia Sala, Rosa Maria; Labarta Mancho, Jesús José (Springer, 2007)
Conference report
Open AccessThe process of obtaining useful message passing applications tracefiles for performance analysis in supercomputers is a large and tedious task. When using hundreds or thousands of processors, the tracefile size can grow ... -
Cache-aware sparse patterns for the factorized sparse approximate inverse preconditioner
Laut Turón, Sergi; Borrell Pol, Ricard; Casas Guix, Marc (Association for Computing Machinery (ACM), 2021)
Conference report
Open AccessConjugate Gradient is a widely used iterative method to solve linear systems Ax=b with matrix A being symmetric and positive definite. Part of its effectiveness relies on finding a suitable preconditioner that accelerates ... -
Characterizing the impact of last-level cache replacement policies on big-data workloads
Jamet, Alexandre Valentin; Álvarez Martí, Lluc; Jiménez, Daniel A.; Casas Guix, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2020)
Conference report
Open AccessThe vast disparity between Last Level Cache (LLC) and memory latencies has motivated the need for efficient cache management policies. The computer architecture literature abounds with work on LLC replacement policy. ... -
Communication-aware sparse patterns for the factorized approximate inverse preconditioner
Laut Turón, Sergi; Casas Guix, Marc; Borrell Pol, Ricard (Association for Computing Machinery (ACM), 2022)
Conference report
Open AccessThe Conjugate Gradient (CG) method is an iterative solver targeting linear systems of equations Ax=b where A is a symmetric and positive definite matrix. CG convergence properties improve when preconditioning is applied ... -
Compiler-assisted compaction/restoration of SIMD instructions
Cebrián González, Juan Manuel; Balem, Thibaud; Barredo Ferreira, Adrián; Casas Guix, Marc; Moreto Planas, Miquel; Ros Bardisa, Alberto; Jimborean, Alexandra (2022-04-01)
Article
Open AccessAll the supercomputers in the world exploit data-level parallelism (DLP), for example by using single instructions to operate over several data elements. Improving vector processing is therefore key for exascale computing. ... -
Convolutional neural network training with dynamic epoch ordering
Plana Rius, Ferran; Angulo Bahón, Cecilio; Casas Guix, Marc; Mirats Tur, Josep Maria (IOS Press, 2019)
Conference lecture
Restricted access - publisher's policyThe paper presented exposes a novel approach to feed data to a Convolutional Neural Network (CNN) while training. Normally, neural networks are fed with shuffled data without any control of what type of examples contains ... -
Cost-aware prediction of uncorrected DRAM errors in the field
Boixaderas Coderch, Isaac; Živanovič, Darko; Moré Codina, Sergi; Bartolomé Rodríguez, Javier; Vicente Dorca, David; Casas Guix, Marc; Carpenter, Paul Matthew; Radojković, Petar; Ayguadé Parra, Eduard (Institute of Electrical and Electronics Engineers (IEEE), 2020)
Conference report
Open AccessThis paper presents and evaluates a method to predict DRAM uncorrected errors, a leading cause of hardware failures in large-scale HPC clusters. The method uses a random forest classifier, which was trained and evaluated ... -
Design space exploration of next-generation HPC machines
Gómez Crespo, Constantino; Martínez Palau, Francesc; Armejach Sanosa, Adrià; Moreto Planas, Miquel; Mantovani, Filippo; Casas Guix, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2019)
Conference report
Restricted access - confidentiality agreementThe landscape of High Performance Computing (HPC) system architectures keeps expanding with new technologies and increased complexity. With the goal of improving the efficiency of next-generation large HPC systems, designers ... -
Dynamically adapting floating-point precision to accelerate deep neural network training
Osorio Ríos, John Haiber; Armejach Sanosa, Adrià; Petit, Eric; Henry, Greg; Casas Guix, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2021)
Conference report
Open AccessMixed-precision (MP) arithmetic combining both single- and half-precision operands has been successfully applied to train deep neural networks. Despite its advantages in terms of reducing the need for key resources like ... -
Efficiency analysis of modern vector architectures: vector ALU sizes, core counts and clock frequencies
Barredo Ferreira, Adrián; Cebrián González, Juan Manuel; Valero Cortés, Mateo; Casas Guix, Marc; Moreto Planas, Miquel (2020-03)
Article
Open AccessMoore’s Law predicted that the number of transistors on a chip would double approximately every 2 years. However, this trend is arriving at an impasse. Optimizing the usage of the available transistors within the thermal ... -
Efficiently running SpMV on long vector architectures
Gómez Crespo, Constantino; Mantovani, Filippo; Focht, Erich; Casas Guix, Marc (Association for Computing Machinery (ACM), 2021)
Conference report
Restricted access - publisher's policySparse Matrix-Vector multiplication (SpMV) is an essential kernel for parallel numerical applications. SpMV displays sparse and irregular data accesses, which complicate its vectorization. Such difficulties make SpMV to ... -
ETP4HPC’s SRA 5 strategic research agenda for High-Performance Computing in Europe 2022: European HPC research priorities 2023-2027
Carpenter, Paul Matthew; Casas Guix, Marc; Unsal, Osman Sabri; Radojkovic, Petar; Martorell Bofill, Xavier; Miranda, Alberto; Guitart Fernández, Jordi; Corbalán González, Julita; Peña Monferrer, Antonio José; Bautista Gomez, Leonardo Arturo; Vázquez García, Miguel; Beltran Querol, Vicenç; Queralt Calafat, Anna; Nou Castell, Ramon; Borrell Pol, Ricard; Houzeaux, Guillaume; Serradell Maronda, Kim; Carrera Pérez, David; García Sáez, Artur; Puchol García, Carlos (2022-09)
Research report
Open AccessThis document feeds research and development priorities devel-oped by the European HPC ecosystem into EuroHPC’s Research and Innovation Advisory Group with an aim to define the HPC Technology research Work Programme and ... -
Evaluating execution time predictability of task-based programs on multi-core processors
Grass, Thomas Dieter; Rico Carro, Alejandro; Casas Guix, Marc; Moreto Planas, Miquel; Ramírez Bellido, Alejandro (Springer, 2015)
Conference report
Restricted access - publisher's policyTask-based programming models are becoming increasingly important, as they can reduce the synchronization costs of parallel programs on multi-cores. Instances of the same task type in task-based programs consist of the ... -
Evaluating mixed-precision arithmetic for 3D generative adversarial networks to simulate high energy physics detectors
Osorio Ríos, John Haiber; Armejach Sanosa, Adrià; Khattak, Gulrukh; Petit, Eric; Vallecorsa, Sofia; Casas Guix, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2020)
Conference report
Open AccessSeveral hardware companies are proposing native Brain Float 16-bit (BF16) support for neural network training. The usage of Mixed Precision (MP) arithmetic with floating-point 32-bit (FP32) and 16-bit half-precision aims ... -
Evaluating the impact of OpenMP 4.0 extensions on relevant parallel workloads
Vidal Ortiz, Raul; Casas Guix, Marc; Moreto Planas, Miquel; Chasapis, Dimitrios; Ferrer Ibáñez, Roger; Martorell Bofill, Xavier; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Valero Cortés, Mateo (Springer, 2015)
Conference report
Open AccessOpenMP has been for many years the most widely used programming model for shared memory architectures. Periodically, new features are proposed and some of them are finally selected for inclusion in the OpenMP standard. The ...