Now showing items 1-20 of 95

    • A BF16 FMA is all you need for DNN training 

      Osorio Ríos, John Haiber; Armejach Sanosa, Adrià; Petit, Eric; Henry, Greg; Casas, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2022-07-01)
      Article
      Open Access
      Fused Multiply-Add (FMA) functional units constitute a fundamental hardware component to train Deep Neural Networks (DNNs). Its silicon area grows quadratically with the mantissa bit count of the computer number format, ...
    • A generator of numerically-tailored and high-throughput accelerators for batched GEMMs 

      Ledoux Pardo, Luis Eduardo; Casas, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2022)
      Conference report
      Open Access
      We propose a hardware generator of GEMM accelerators. Our generator produces vendor-agnostic HDL describing highly customizable systolic arrays guided by accuracy and energy efficiency goals. The generated arrays have three ...
    • A vulnerability factor for ECC-protected memory 

      Jaulmes, Luc; Moreto Planas, Miquel; Valero Cortés, Mateo; Casas, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2019)
      Conference report
      Open Access
      Fault injection studies and vulnerability analyses have been used to estimate the reliability of data structures in memory. We survey these metrics and look at their adequacy to describe the data stored in ECC-protected ...
    • Active measurement of memory resource consumption 

      Casas, Marc; Bronevetsky, Greg (IEEE, 2014)
      Conference report
      Open Access
      Hierarchical memory is a cornerstone of modern hardware design because it provides high memory performance and capacity at a low cost. However, the use of multiple levels of memory and complex cache management policies ...
    • Active Measurement of the Impact of Network Switch Utilization on Application Performance 

      Casas, Marc; Bronevetsky, Greg (IEEE, 2014)
      Conference report
      Open Access
      Inter-node networks are a key capability of High-Performance Computing (HPC) systems that differentiates them from less capable classes of machines. However, in spite of their very high performance, the increasing ...
    • Adaptive and application dependent runtime guided hardware prefetcher reconfiguration on the IBM Power7 

      Prat Robles, David; Ortega Carrasco, Cristobal; Casas, Marc; Moreto Planas, Miquel; Valero Cortés, Mateo (2015)
      Conference report
      Open Access
    • An optimized predication execution for SIMD extensions 

      Barredo Ferreira, Adrián; Cebrián González, Juan Manuel; Moreto Planas, Miquel; Casas, Marc; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2019)
      Conference lecture
      Open Access
      Vector processing is a widely used technique to improve performance and energy efficiency in modern processors. Most of them rely on predication to support divergence control. However, performance and energy consumption ...
    • Approximating a Multi-Grid Solver 

      Le Fèvre, Valentin; Bautista-Gomez, Leonardo; Unsal, Osman; Casas, Marc (IEEE, 2019-02-14)
      Conference lecture
      Open Access
      Multi-grid methods are numerical algorithms used in parallel and distributed processing. The main idea of multigrid solvers is to speedup the convergence of an iterative method by reducing the problem to a coarser grid a ...
    • Architectural support for task dependence management with flexible software scheduling 

      Castillo, Emilio; Álvarez Martí, Lluc; Moreto Planas, Miquel; Casas, Marc; Vallejo, Enrique; Bosque, Jose L.; Beivide Palacio, Ramon; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2018)
      Conference report
      Open Access
      The growing complexity of multi-core architectures has motivated a wide range of software mechanisms to improve the orchestration of parallel executions. Task parallelism has become a very attractive approach thanks to its ...
    • Asynchronous and exact forward recovery for detected errors in iterative solvers 

      Jaulmes, Luc; Casas, Marc; Moreto Planas, Miquel; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Valero Cortés, Mateo (2018-03-19)
      Article
      Open Access
      Current trends and projections show that faults in computer systems become increasingly common. Such errors may be detected, and possibly corrected transparently, e.g. by Error Correcting Codes (ECC). For a program to be ...
    • ATM: approximate task memoization in the runtime system 

      Brumar, Iulian; Casas, Marc; Moreto Planas, Miquel; Valero Cortés, Mateo; Sohi, Gurindar S. (Institute of Electrical and Electronics Engineers (IEEE), 2017)
      Conference report
      Open Access
      Redundant computations appear during the execution of real programs. Multiple factors contribute to these unnecessary computations, such as repetitive inputs and patterns, calling functions with the same parameters or bad ...
    • Autoencoders for semi-supervised water level modeling in sewer pipes with sparse labeled data 

      Plana Rius, Ferran; Philipsen, Mark P.; Mirats Tur, Josep Maria; Moeslund, Thomas; Angulo Bahón, Cecilio; Casas, Marc (2022-01-24)
      Article
      Open Access
      More frequent and thorough inspection of sewer pipes has the potential to save billions in utilities. However, the amount and quality of inspection are impeded by an imprecise and highly subjective manual process. It ...
    • Automatic structure extraction from MPI applications tracefiles 

      Casas, Marc; Badia Sala, Rosa Maria; Labarta Mancho, Jesús José (Springer, 2007)
      Conference report
      Open Access
      The process of obtaining useful message passing applications tracefiles for performance analysis in supercomputers is a large and tedious task. When using hundreds or thousands of processors, the tracefile size can grow ...
    • Cache-aware sparse patterns for the factorized sparse approximate inverse preconditioner 

      Laut Turón, Sergi; Borrell Pol, Ricard; Casas, Marc (Association for Computing Machinery (ACM), 2021)
      Conference report
      Open Access
      Conjugate Gradient is a widely used iterative method to solve linear systems Ax=b with matrix A being symmetric and positive definite. Part of its effectiveness relies on finding a suitable preconditioner that accelerates ...
    • CATA: Criticality aware task acceleration for multicore processors 

      Castillo, Emilio; Moreto Planas, Miquel; Casas, Marc; Álvarez Martí, Lluc; Vallejo, Enrique; Chronaki, Kallia; Badia Sala, Rosa Maria; Bosque Orero, José Luis; Beivide Palacio, Julio Ramón; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2016)
      Conference report
      Open Access
      Managing criticality in task-based programming models opens a wide range of performance and power optimization opportunities in future manycore systems. Criticality aware task schedulers can benefit from these opportunities ...
    • Characterizing the impact of last-level cache replacement policies on big-data workloads 

      Jamet, Alexandre Valentin; Álvarez Martí, Lluc; Jiménez, Daniel A.; Casas, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2020)
      Conference report
      Open Access
      The vast disparity between Last Level Cache (LLC) and memory latencies has motivated the need for efficient cache management policies. The computer architecture literature abounds with work on LLC replacement policy. ...
    • Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures 

      Álvarez Martí, Lluc; Vilanova, Lluís; Moreto Planas, Miquel; Casas, Marc; González Tallada, Marc; Martorell Bofill, Xavier; Navarro, Nacho; Ayguadé Parra, Eduard; Valero Cortés, Mateo (Association for Computing Machinery (ACM), 2015)
      Conference report
      Open Access
      The increasing number of cores in manycore architectures causes important power and scalability problems in the memory subsystem. One solution is to introduce scratchpad memories alongside the cache hierarchy, forming a ...
    • Communication-aware sparse patterns for the factorized approximate inverse preconditioner 

      Laut Turón, Sergi; Casas, Marc; Borrell Pol, Ricard (Association for Computing Machinery (ACM), 2022)
      Conference report
      Open Access
      The Conjugate Gradient (CG) method is an iterative solver targeting linear systems of equations Ax=b where A is a symmetric and positive definite matrix. CG convergence properties improve when preconditioning is applied ...
    • Compiler-assisted compaction/restoration of SIMD instructions 

      Cebrián González, Juan Manuel; Balem, Thibaud; Barredo Ferreira, Adrián; Casas, Marc; Moreto Planas, Miquel; Ros Bardisa, Alberto; Jimborean, Alexandra (2022-04-01)
      Article
      Open Access
      All the supercomputers in the world exploit data-level parallelism (DLP), for example by using single instructions to operate over several data elements. Improving vector processing is therefore key for exascale computing. ...
    • Convolutional neural network training with dynamic epoch ordering 

      Plana Rius, Ferran; Angulo Bahón, Cecilio; Casas, Marc; Mirats Tur, Josep Maria (IOS Press, 2019)
      Conference lecture
      Restricted access - publisher's policy
      The paper presented exposes a novel approach to feed data to a Convolutional Neural Network (CNN) while training. Normally, neural networks are fed with shuffled data without any control of what type of examples contains ...