Now showing items 1-20 of 25

    • Adaptive and application dependent runtime guided hardware prefetcher reconfiguration on the IBM Power7 

      Prat Robles, David; Ortega, Cristobal; Casas Guix, Marc; Moreto Planas, Miquel; Valero Cortés, Mateo (2015)
      Conference report
      Open Access
    • Convolutional neural network training with dynamic epoch ordering 

      Plana Rius, Ferran; Angulo Bahón, Cecilio; Casas Guix, Marc; Mirats Tur, Josep Maria (IOS Press, 2019)
      Conference lecture
      Restricted access - publisher's policy
      The paper presented exposes a novel approach to feed data to a Convolutional Neural Network (CNN) while training. Normally, neural networks are fed with shuffled data without any control of what type of examples contains ...
    • Efficiency analysis of modern vector architectures: vector ALU sizes, core counts and clock frequencies 

      Barredo Ferreira, Adrián; Cebrián González, Juan Manuel; Valero Cortés, Mateo; Casas Guix, Marc; Moreto Planas, Miquel (2020-03)
      Article
      Open Access
      Moore’s Law predicted that the number of transistors on a chip would double approximately every 2 years. However, this trend is arriving at an impasse. Optimizing the usage of the available transistors within the thermal ...
    • Evaluating execution time predictability of task-based programs on multi-core processors 

      Grass, Thomas Dieter; Rico Carro, Alejandro; Casas Guix, Marc; Moreto Planas, Miquel; Ramírez Bellido, Alejandro (Springer, 2015)
      Conference report
      Restricted access - publisher's policy
      Task-based programming models are becoming increasingly important, as they can reduce the synchronization costs of parallel programs on multi-cores. Instances of the same task type in task-based programs consist of the ...
    • Evaluating the impact of OpenMP 4.0 extensions on relevant parallel workloads 

      Vidal Ortiz, Raul; Casas Guix, Marc; Moreto Planas, Miquel; Chasapis, Dimitrios; Ferrer Ibáñez, Roger; Martorell Bofill, Xavier; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Valero Cortés, Mateo (Springer, 2015)
      Conference report
      Open Access
      OpenMP has been for many years the most widely used programming model for shared memory architectures. Periodically, new features are proposed and some of them are finally selected for inclusion in the OpenMP standard. The ...
    • Exploiting asynchrony from exact forward recovery for DUE in iterative solvers 

      Jaulmes, Luc Etienne; Casas Guix, Marc; Moreto Planas, Miquel; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Valero Cortés, Mateo (2015)
      External research report
      Open Access
      This paper presents a method to protect iterative solvers from Detected and Uncorrected Errors (DUE) relying on error detection techniques already available in commodity hardware. Detection operates at the memory page ...
    • Exploiting asynchrony from exact forward recovery for DUE in iterative solvers 

      Jaulmes, Luc Etienne; Casas Guix, Marc; Moreto Planas, Miquel; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Valero Cortés, Mateo (Association for Computing Machinery (ACM), 2015)
      Conference report
      Open Access
      This paper presents a method to protect iterative solvers from Detected and Uncorrected Errors (DUE) relying on error detection techniques already available in commodity hardware. Detection operates at the memory page ...
    • libPRISM: an intelligent adaptation of prefetch and SMT levels 

      Ortega, Cristobal; Moreto Planas, Miquel; Casas Guix, Marc; Bertran, Ramon; Buyuktosunoglu, Alper; Eichenberger, Alexandre; Bose, Pradip (Association for Computing Machinery (ACM), 2017)
      Conference report
      Open Access
      Current microprocessors include several knobs to modify the hardware behavior in order to improve performance under different workload demands. An impractical and time consuming offline profiling is needed to evaluate the ...
    • Modeling and optimizing NUMA effects and prefetching with machine learning 

      Sánchez Barrera, Isaac; Black-Schaffer, David; Casas Guix, Marc; Moreto Planas, Miquel; Stupnikova, Anastasiia; Popov, Mihail (Association for Computing Machinery (ACM), 2020)
      Conference report
      Open Access
      Both NUMA thread/data placement and hardware prefetcher configuration have significant impacts on HPC performance. Optimizing both together leads to a large and complex design space that has previously been impractical to ...
    • Optimizing sparse matrix-vector multiplication in NEC SX-Aurora vector engine 

      Gómez Crespo, Constantino; Casas Guix, Marc; Mantovani, Filippo; Focht, Erich (2020-06-26)
      External research report
      Open Access
      Sparse Matrix-Vector multiplication (SpMV) is an essential piece of code used in many High Performance Computing (HPC) applications. As previous literature shows, achieving efficient vectorization and performance in modern ...
    • PARSECSs: Evaluating the impact of task parallelism in the PARSEC benchmark suite 

      Chasapis, Dimitrios; Casas Guix, Marc; Moreto Planas, Miquel; Vidal Ortiz, Raul; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Valero Cortés, Mateo (2015-12-01)
      Article
      Open Access
      In this work, we show how parallel applications can be implemented efficiently using task parallelism. We also evaluate the benefits of such parallel paradigm with respect to other approaches. We use the PARSEC benchmark ...
    • POSTER: Exploiting asymmetric multi-core processors with flexible system sofware 

      Chronaki, Kallia; Moreto Planas, Miquel; Casas Guix, Marc; Rico, Alejandro; Badia Sala, Rosa Maria; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Valero Cortés, Mateo (Association for Computing Machinery (ACM), 2016)
      Conference lecture
      Open Access
      Energy efficiency has become the main challenge for high performance computing (HPC). The use of mobile asymmetric multi-core architectures to build future multi-core systems is an approach towards energy savings while ...
    • Reducing cache coherence traffic with a NUMA-aware runtime approach 

      Caheny, Paul; Álvarez Martí, Lluc; Derradji, Said; Valero Cortés, Mateo; Moreto Planas, Miquel; Casas Guix, Marc (2018-05)
      Article
      Open Access
      Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provide for scaling core count and memory capacity. Also, the flat memory address space they offer considerably improves ...
    • Reducing data movement on large shared memory systems by exploiting computation dependencies 

      Barrera, I.S.; Ayguadé Parra, Eduard; Valero Cortés, Mateo; Moreto Planas, Miquel; Labarta Mancho, Jesús José; Casas Guix, Marc (Association for Computing Machinery (ACM), 2018)
      Conference report
      Open Access
      Shared memory systems are becoming increasingly complex as they typically integrate several storage devices. That brings different access latencies or bandwidth rates depending on the proximity between the cores where ...
    • RICH: implementing reductions in the cache hierarchy 

      Dimic, Vladimir; Moreto Planas, Miquel; Casas Guix, Marc; Ciesko, Jan; Valero Cortés, Mateo (Association for Computing Machinery (ACM), 2020)
      Conference report
      Open Access
      Reductions constitute a frequent algorithmic pattern in high-performance and scientific computing. Sophisticated techniques are needed to ensure their correct and scalable concurrent execution on modern processors. Reductions ...
    • Runtime-assisted shared cache insertion policies based on re-reference intervals 

      Dimic, Vladimir; Moreto Planas, Miquel; Casas Guix, Marc; Valero Cortés, Mateo (Springer, 2017)
      Conference report
      Open Access
      Processor speed is improving at a faster rate than the speed of main memory, which makes memory accesses increasingly expensive. One way to solve this problem is to reduce miss ratio of the processor’s last level cache by ...
    • Runtime-aware architectures 

      Casas Guix, Marc; Moreto Planas, Miquel; Álvarez Martí, Lluc; Castillo Villar, Emilio; Chasapis, Dimitrios; Hayes, Timothy; Jaulmes, Luc Etienne; Palomar Pérez, Óscar; Unsal, Osman Sabri; Cristal Kestelman, Adrián; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Valero Cortés, Mateo (Springer, 2015)
      Conference report
      Open Access
      In the last few years, the traditional ways to keep the increase of hardware performance to the rate predicted by the Moore’s Law have vanished. When uni-cores were the norm, hardware design was decoupled from the software ...
    • Runtime-aware architectures: a first approach 

      Valero Cortés, Mateo; Moreto Planas, Miquel; Casas Guix, Marc; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José (2014)
      Article
      Open Access
      In the last few years, the traditional ways to keep the increase of hardware performance at the rate predicted by Moore's Law have vanished. When uni-cores were the norm, hardware design was decoupled from the software ...
    • Runtime-guided management of scratchpad memories in multicore architectures 

      Álvarez Martí, Lluc; Moreto Planas, Miquel; Casas Guix, Marc; Castillo Villar, Emilio; Martorell Bofill, Xavier; Labarta Mancho, Jesús José; Ayguadé Parra, Eduard; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2015)
      Conference report
      Open Access
      The increasing number of cores and the anticipated level of heterogeneity in upcoming multicore architectures cause important problems in traditional cache hierarchies. A good way to alleviate these problems is to add ...
    • Sampled simulation of task-based programs 

      Grass, Thomas; Carlson, Trevor E.; Rico Carro, Alejandro; Ceballos, Germán; Ayguadé Parra, Eduard; Casas Guix, Marc; Moreto Planas, Miquel (Institute of Electrical and Electronics Engineers (IEEE), 2019-02-01)
      Article
      Open Access
      Sampled simulation is a mature technique for reducing simulation time of single-threaded programs. Nevertheless, current sampling techniques do not take advantage of other execution models, like task-based execution, to ...