Now showing items 1-20 of 38

    • Adaptive and application dependent runtime guided hardware prefetcher reconfiguration on the IBM Power7 

      Prat Robles, David; Ortega Carrasco, Cristobal; Casas Guix, Marc; Moreto Planas, Miquel; Valero Cortés, Mateo (2015)
      Conference report
      Open Access
    • An optimized predication execution for SIMD extensions 

      Barredo Ferreira, Adrián; Cebrián González, Juan Manuel; Moreto Planas, Miquel; Casas Guix, Marc; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2019)
      Conference lecture
      Open Access
      Vector processing is a widely used technique to improve performance and energy efficiency in modern processors. Most of them rely on predication to support divergence control. However, performance and energy consumption ...
    • Automatic structure extraction from MPI applications tracefiles 

      Casas Guix, Marc; Badia Sala, Rosa Maria; Labarta Mancho, Jesús José (Springer, 2007)
      Conference report
      Open Access
      The process of obtaining useful message passing applications tracefiles for performance analysis in supercomputers is a large and tedious task. When using hundreds or thousands of processors, the tracefile size can grow ...
    • Characterizing the impact of last-level cache replacement policies on big-data workloads 

      Jamet, Alexandre Valentin; Álvarez Martí, Lluc; Jiménez, Daniel A.; Casas Guix, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2020)
      Conference report
      Open Access
      The vast disparity between Last Level Cache (LLC) and memory latencies has motivated the need for efficient cache management policies. The computer architecture literature abounds with work on LLC replacement policy. ...
    • Compiler-assisted compaction/restoration of SIMD instructions 

      Cebrián González, Juan Manuel; Balem, Thibaud; Barredo Ferreira, Adrián; Casas Guix, Marc; Moreto Planas, Miquel; Ros Bardisa, Alberto; Jimborean, Alexandra (2022-04-01)
      Article
      Open Access
      All the supercomputers in the world exploit data-level parallelism (DLP), for example by using single instructions to operate over several data elements. Improving vector processing is therefore key for exascale computing. ...
    • Convolutional neural network training with dynamic epoch ordering 

      Plana Rius, Ferran; Angulo Bahón, Cecilio; Casas Guix, Marc; Mirats Tur, Josep Maria (IOS Press, 2019)
      Conference lecture
      Restricted access - publisher's policy
      The paper presented exposes a novel approach to feed data to a Convolutional Neural Network (CNN) while training. Normally, neural networks are fed with shuffled data without any control of what type of examples contains ...
    • Cost-aware prediction of uncorrected DRAM errors in the field 

      Boixaderas Coderch, Isaac; Živanovič, Darko; Moré Codina, Sergi; Bartolomé Rodríguez, Javier; Vicente Dorca, David; Casas Guix, Marc; Carpenter, Paul Matthew; Radojkovic, Petar; Ayguadé Parra, Eduard (Institute of Electrical and Electronics Engineers (IEEE), 2020)
      Conference report
      Open Access
      This paper presents and evaluates a method to predict DRAM uncorrected errors, a leading cause of hardware failures in large-scale HPC clusters. The method uses a random forest classifier, which was trained and evaluated ...
    • Design space exploration of next-generation HPC machines 

      Gómez Crespo, Constantino; Martínez Palau, Francesc; Armejach Sanosa, Adrià; Moreto Planas, Miquel; Mantovani, Filippo; Casas Guix, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2019)
      Conference report
      Restricted access - confidentiality agreement
      The landscape of High Performance Computing (HPC) system architectures keeps expanding with new technologies and increased complexity. With the goal of improving the efficiency of next-generation large HPC systems, designers ...
    • Efficiency analysis of modern vector architectures: vector ALU sizes, core counts and clock frequencies 

      Barredo Ferreira, Adrián; Cebrián González, Juan Manuel; Valero Cortés, Mateo; Casas Guix, Marc; Moreto Planas, Miquel (2020-03)
      Article
      Open Access
      Moore’s Law predicted that the number of transistors on a chip would double approximately every 2 years. However, this trend is arriving at an impasse. Optimizing the usage of the available transistors within the thermal ...
    • Efficiently running SpMV on long vector architectures 

      Gómez Crespo, Constantino; Mantovani, Filippo; Focht, Erich; Casas Guix, Marc (Association for Computing Machinery (ACM), 2021)
      Conference report
      Restricted access - publisher's policy
      Sparse Matrix-Vector multiplication (SpMV) is an essential kernel for parallel numerical applications. SpMV displays sparse and irregular data accesses, which complicate its vectorization. Such difficulties make SpMV to ...
    • Evaluating execution time predictability of task-based programs on multi-core processors 

      Grass, Thomas Dieter; Rico Carro, Alejandro; Casas Guix, Marc; Moreto Planas, Miquel; Ramírez Bellido, Alejandro (Springer, 2015)
      Conference report
      Restricted access - publisher's policy
      Task-based programming models are becoming increasingly important, as they can reduce the synchronization costs of parallel programs on multi-cores. Instances of the same task type in task-based programs consist of the ...
    • Evaluating mixed-precision arithmetic for 3D generative adversarial networks to simulate high energy physics detectors 

      Osorio Ríos, John Haiber; Armejach Sanosa, Adrià; Khattak, Gulrukh; Petit, Eric; Vallecorsa, Sofia; Casas Guix, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2020)
      Conference report
      Open Access
      Several hardware companies are proposing native Brain Float 16-bit (BF16) support for neural network training. The usage of Mixed Precision (MP) arithmetic with floating-point 32-bit (FP32) and 16-bit half-precision aims ...
    • Evaluating the impact of OpenMP 4.0 extensions on relevant parallel workloads 

      Vidal Ortiz, Raul; Casas Guix, Marc; Moreto Planas, Miquel; Chasapis, Dimitrios; Ferrer Ibáñez, Roger; Martorell Bofill, Xavier; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Valero Cortés, Mateo (Springer, 2015)
      Conference report
      Open Access
      OpenMP has been for many years the most widely used programming model for shared memory architectures. Periodically, new features are proposed and some of them are finally selected for inclusion in the OpenMP standard. The ...
    • Exploiting asynchrony from exact forward recovery for DUE in iterative solvers 

      Jaulmes, Luc; Casas Guix, Marc; Moreto Planas, Miquel; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Valero Cortés, Mateo (Association for Computing Machinery (ACM), 2015)
      Conference report
      Open Access
      This paper presents a method to protect iterative solvers from Detected and Uncorrected Errors (DUE) relying on error detection techniques already available in commodity hardware. Detection operates at the memory page ...
    • Exploiting asynchrony from exact forward recovery for DUE in iterative solvers 

      Jaulmes, Luc; Casas Guix, Marc; Moreto Planas, Miquel; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Valero Cortés, Mateo (2015)
      External research report
      Open Access
      This paper presents a method to protect iterative solvers from Detected and Uncorrected Errors (DUE) relying on error detection techniques already available in commodity hardware. Detection operates at the memory page ...
    • Exploiting page table locality for Agile TLB Prefetching 

      Vavouliotis, Georgios; Alvarez Martí, Lluc; Karakostas, Vasileios; Nikas, Konstantinos; Koziris, Nectarios; Jiménez, Daniel A.; Casas Guix, Marc (Institute of Electrical and Electronics Engineers (IEEE), 2021)
      Conference report
      Open Access
      Frequent Translation Lookaside Buffer (TLB) misses incur high performance and energy costs due to page walks required for fetching the corresponding address translations. Prefetching page table entries (PTEs) ahead of ...
    • Improving predication efficiency through compaction/restoration of SIMD instructions 

      Barredo Ferreira, Adrián; Cebrián González, Juan Manuel; Moreto Planas, Miquel; Casas Guix, Marc; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2020)
      Conference report
      Open Access
      Vector processors offer a wide range of unexplored opportunities to improve performance and energy efficiency. However, despite its potential, vector code generation and execution have significant challenges, the most ...
    • libPRISM: an intelligent adaptation of prefetch and SMT levels 

      Ortega, Cristobal; Moreto Planas, Miquel; Casas Guix, Marc; Bertran, Ramon; Buyuktosunoglu, Alper; Eichenberger, Alexandre; Bose, Pradip (Association for Computing Machinery (ACM), 2017)
      Conference report
      Open Access
      Current microprocessors include several knobs to modify the hardware behavior in order to improve performance under different workload demands. An impractical and time consuming offline profiling is needed to evaluate the ...
    • Modeling and optimizing NUMA effects and prefetching with machine learning 

      Sánchez Barrera, Isaac; Black-Schaffer, David; Casas Guix, Marc; Moreto Planas, Miquel; Stupnikova, Anastasiia; Popov, Mihail (Association for Computing Machinery (ACM), 2020)
      Conference report
      Open Access
      Both NUMA thread/data placement and hardware prefetcher configuration have significant impacts on HPC performance. Optimizing both together leads to a large and complex design space that has previously been impractical to ...
    • Optimizing sparse matrix-vector multiplication in NEC SX-Aurora vector engine 

      Gómez Crespo, Constantino; Casas Guix, Marc; Mantovani, Filippo; Focht, Erich (2020-06-26)
      External research report
      Open Access
      Sparse Matrix-Vector multiplication (SpMV) is an essential piece of code used in many High Performance Computing (HPC) applications. As previous literature shows, achieving efficient vectorization and performance in modern ...