Now showing items 1-19 of 19

  • Adaptive and application dependent runtime guided hardware prefetcher reconfiguration on the IBM Power7 

    Prat Robles, David; Ortega, Cristobal; Casas Guix, Marc; Moreto Planas, Miquel; Valero Cortés, Mateo (2015)
    Conference report
    Open Access
  • Evaluating execution time predictability of task-based programs on multi-core processors 

    Grass, Thomas Dieter; Rico Carro, Alejandro; Casas Guix, Marc; Moreto Planas, Miquel; Ramírez Bellido, Alejandro (Springer, 2015)
    Conference report
    Restricted access - publisher's policy
    Task-based programming models are becoming increasingly important, as they can reduce the synchronization costs of parallel programs on multi-cores. Instances of the same task type in task-based programs consist of the ...
  • Evaluating the impact of OpenMP 4.0 extensions on relevant parallel workloads 

    Vidal Ortiz, Raul; Casas Guix, Marc; Moreto Planas, Miquel; Chasapis, Dimitrios; Ferrer Ibáñez, Roger; Martorell Bofill, Xavier; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Valero Cortés, Mateo (Springer, 2015)
    Conference report
    Open Access
    OpenMP has been for many years the most widely used programming model for shared memory architectures. Periodically, new features are proposed and some of them are finally selected for inclusion in the OpenMP standard. The ...
  • Exploiting asynchrony from exact forward recovery for DUE in iterative solvers 

    Jaulmes, Luc Etienne; Casas Guix, Marc; Moreto Planas, Miquel; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Valero Cortés, Mateo (2015)
    External research report
    Open Access
    This paper presents a method to protect iterative solvers from Detected and Uncorrected Errors (DUE) relying on error detection techniques already available in commodity hardware. Detection operates at the memory page ...
  • Exploiting asynchrony from exact forward recovery for DUE in iterative solvers 

    Jaulmes, Luc Etienne; Casas Guix, Marc; Moreto Planas, Miquel; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Valero Cortés, Mateo (Association for Computing Machinery (ACM), 2015)
    Conference report
    Open Access
    This paper presents a method to protect iterative solvers from Detected and Uncorrected Errors (DUE) relying on error detection techniques already available in commodity hardware. Detection operates at the memory page ...
  • libPRISM: an intelligent adaptation of prefetch and SMT levels 

    Ortega, Cristobal; Moreto Planas, Miquel; Casas Guix, Marc; Bertran, Ramon; Buyuktosunoglu, Alper; Eichenberger, Alexandre; Bose, Pradip (Association for Computing Machinery (ACM), 2017)
    Conference report
    Open Access
    Current microprocessors include several knobs to modify the hardware behavior in order to improve performance under different workload demands. An impractical and time consuming offline profiling is needed to evaluate the ...
  • PARSECSs: Evaluating the impact of task parallelism in the PARSEC benchmark suite 

    Chasapis, Dimitrios; Casas Guix, Marc; Moreto Planas, Miquel; Vidal Ortiz, Raul; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Valero Cortés, Mateo (2015-12-01)
    Article
    Open Access
    In this work, we show how parallel applications can be implemented efficiently using task parallelism. We also evaluate the benefits of such parallel paradigm with respect to other approaches. We use the PARSEC benchmark ...
  • POSTER: Exploiting asymmetric multi-core processors with flexible system sofware 

    Chronaki, Kallia; Moreto Planas, Miquel; Casas Guix, Marc; Rico, Alejandro; Badia Sala, Rosa Maria; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Valero Cortés, Mateo (Association for Computing Machinery (ACM), 2016)
    Conference lecture
    Open Access
    Energy efficiency has become the main challenge for high performance computing (HPC). The use of mobile asymmetric multi-core architectures to build future multi-core systems is an approach towards energy savings while ...
  • Reducing cache coherence traffic with a NUMA-aware runtime approach 

    Caheny, Paul; Álvarez Martí, Lluc; Derradji, Said; Valero Cortés, Mateo; Moreto Planas, Miquel; Casas Guix, Marc (2018-05)
    Article
    Open Access
    Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provide for scaling core count and memory capacity. Also, the flat memory address space they offer considerably improves ...
  • Reducing data movement on large shared memory systems by exploiting computation dependencies 

    Barrera, I.S.; Ayguadé Parra, Eduard; Valero Cortés, Mateo; Moreto Planas, Miquel; Labarta Mancho, Jesús José; Casas Guix, Marc (Association for Computing Machinery (ACM), 2018)
    Conference report
    Restricted access - publisher's policy
    Shared memory systems are becoming increasingly complex as they typically integrate several storage devices. That brings different access latencies or bandwidth rates depending on the proximity between the cores where ...
  • Runtime-assisted shared cache insertion policies based on re-reference intervals 

    Dimic, Vladimir; Moreto Planas, Miquel; Casas Guix, Marc; Valero Cortés, Mateo (Springer, 2017)
    Conference report
    Open Access
    Processor speed is improving at a faster rate than the speed of main memory, which makes memory accesses increasingly expensive. One way to solve this problem is to reduce miss ratio of the processor’s last level cache by ...
  • Runtime-aware architectures 

    Casas Guix, Marc; Moreto Planas, Miquel; Álvarez Martí, Lluc; Castillo Villar, Emilio; Chasapis, Dimitrios; Hayes, Timothy; Jaulmes, Luc Etienne; Palomar Pérez, Óscar; Unsal, Osman Sabri; Cristal Kestelman, Adrián; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Valero Cortés, Mateo (Springer, 2015)
    Conference report
    Open Access
    In the last few years, the traditional ways to keep the increase of hardware performance to the rate predicted by the Moore’s Law have vanished. When uni-cores were the norm, hardware design was decoupled from the software ...
  • Runtime-aware architectures: a first approach 

    Valero Cortés, Mateo; Moreto Planas, Miquel; Casas Guix, Marc; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José (2014)
    Article
    Open Access
    In the last few years, the traditional ways to keep the increase of hardware performance at the rate predicted by Moore's Law have vanished. When uni-cores were the norm, hardware design was decoupled from the software ...
  • Runtime-guided management of scratchpad memories in multicore architectures 

    Álvarez Martí, Lluc; Moreto Planas, Miquel; Casas Guix, Marc; Castillo Villar, Emilio; Martorell Bofill, Xavier; Labarta Mancho, Jesús José; Ayguadé Parra, Eduard; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2015)
    Conference report
    Open Access
    The increasing number of cores and the anticipated level of heterogeneity in upcoming multicore architectures cause important problems in traditional cache hierarchies. A good way to alleviate these problems is to add ...
  • Sampled simulation of task-based programs 

    Grass, Thomas; Carlson, Trevor E.; Rico Carro, Alejandro; Ceballos, Germán; Ayguadé Parra, Eduard; Casas Guix, Marc; Moreto Planas, Miquel (Institute of Electrical and Electronics Engineers (IEEE), 2019-02-01)
    Article
    Open Access
    Sampled simulation is a mature technique for reducing simulation time of single-threaded programs. Nevertheless, current sampling techniques do not take advantage of other execution models, like task-based execution, to ...
  • Simulating whole supercomputer applications 

    Gonzalez, Juan; Casas Guix, Marc; Moreto Planas, Miquel; Giménez, Judit; Labarta Mancho, Jesús José; Valero Cortés, Mateo (2009)
    External research report
    Open Access
    Architecture simulation tools are extremely useful not only to predict the performance of future system designs, but also to analyze and improve the performance of software running on well know architectures. However, since ...
  • Spectral analysis of executions of computer programs and its applications on performance analysis 

    Casas Guix, Marc (Universitat Politècnica de Catalunya, 2010-03-09)
    Doctoral thesis
    Open Access
    This work is motivated by the growing intricacy of high performance computing infrastructures. For example, supercomputer MareNostrum (installed in 2005 at BSC) has 10240 processors and currently there are machines with ...
  • Trace spectral analysis toward dynamic levels of detail 

    Llort Sánchez, Germán; Casas Guix, Marc; Servat, Harald; Huck, Kevin A.; Giménez Lucas, Judit; Labarta Mancho, Jesús José (Institute of Electrical and Electronics Engineers (IEEE), 2011)
    Conference report
    Restricted access - publisher's policy
    The emergence of Petascale systems has raised new challenges to performance analysis tools. Understanding every single detail of an execution is important to bridge the gap between the theoretical peak and the actual ...
  • Using Arm’s scalable vector extension on stencil codes 

    Armejach Sanosa, Adrià; Caminal Pallarés, Helena; Cebrián González, Juan Manuel; Langarita, Rubén; González-Alberquilla, Rekai; Adeniyi-Jones, Chris; Valero Cortés, Mateo; Casas Guix, Marc; Moreto Planas, Miquel (2019-04-08)
    Article
    Restricted access - publisher's policy
    Data-level parallelism is frequently ignored or underutilized. Achieved through vector/SIMD capabilities, it can provide substantial performance improvements on top of widely used techniques such as thread-level parallelism. ...