Ara es mostren els items 1-17 de 17

  • Active measurement of memory resource consumption 

    Casas, Marc; Bronevetsky, Greg (IEEE, 2014)
    Text en actes de congrés
    Accés obert
    Hierarchical memory is a cornerstone of modern hardware design because it provides high memory performance and capacity at a low cost. However, the use of multiple levels of memory and complex cache management policies ...
  • Active Measurement of the Impact of Network Switch Utilization on Application Performance 

    Casas, Marc; Bronevetsky, Greg (IEEE, 2014)
    Text en actes de congrés
    Accés obert
    Inter-node networks are a key capability of High-Performance Computing (HPC) systems that differentiates them from less capable classes of machines. However, in spite of their very high performance, the increasing ...
  • ATM: approximate task memoization in the runtime system 

    Brumar, Iulian; Casas, Marc; Moreto Planas, Miquel; Valero Cortés, Mateo; Sohi, Gurindar S. (Institute of Electrical and Electronics Engineers (IEEE), 2017)
    Text en actes de congrés
    Accés obert
    Redundant computations appear during the execution of real programs. Multiple factors contribute to these unnecessary computations, such as repetitive inputs and patterns, calling functions with the same parameters or bad ...
  • CATA: Criticality aware task acceleration for multicore processors 

    Castillo, Emilio; Moreto Planas, Miquel; Casas, Marc; Álvarez Martí, Lluc; Vallejo, Enrique; Chronaki, Kallia; Badia Sala, Rosa Maria; Bosque Orero, José Luis; Beivide Palacio, Julio Ramón; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2016)
    Text en actes de congrés
    Accés obert
    Managing criticality in task-based programming models opens a wide range of performance and power optimization opportunities in future manycore systems. Criticality aware task schedulers can benefit from these opportunities ...
  • Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures 

    Álvarez, Lluc; Vilanova, Lluís; Moreto Planas, Miquel; Casas, Marc; González Tallada, Marc; Martorell Bofill, Xavier; Navarro, Nacho; Ayguadé Parra, Eduard; Valero Cortés, Mateo (Association for Computing Machinery (ACM), 2015)
    Text en actes de congrés
    Accés obert
    The increasing number of cores in manycore architectures causes important power and scalability problems in the memory subsystem. One solution is to introduce scratchpad memories alongside the cache hierarchy, forming a ...
  • Evaluation of HPC applications’ Memory Resource Consumption via Active Measurement 

    Casas, Marc; Bronevetsky, Greg (IEE, 2016)
    Article
    Accés obert
    As the number of compute cores per chip continues to rise faster than the total amount of available memory, applications will become increasingly starved for memory storage capacity and bandwidth, making the problem of ...
  • How can we improve energy efficiency through user-directed vectorization and task-based parallelization? 

    Caminal, Helena; Caballero, Diego; Cebrián, Juan M.; Ferrer, Roger; Casas, Marc; Moreto Planas, Miquel; Martorell Bofill, Xavier; Valero Cortés, Mateo (Barcelona Supercomputing Center, 2015-05-05)
    Accés obert
    Heterogeneity, parallelization and vectorization are key techniques to improve the performance and energy efficiency of modern computing systems. However, programming and maintaining code for these architectures poses a ...
  • Improving scalability of task-based programs 

    Brumar, Iulian; Casas, Marc; Moreto Planas, Miquel (Barcelona Supercomputing Center, 2015-05-05)
    Text en actes de congrés
    Accés obert
    In a multi-core era, parallel programming allows further performance improvements, but with an important programmability cost. We envision that the best approach to parallel programming that can exceed the programability, ...
  • Iteration-fusing conjugate gradient 

    Zhuang, Sicong; Casas, Marc (Association for Computing Machinery (ACM), 2017-06)
    Comunicació de congrés
    Accés obert
    This paper presents the Iteration-Fusing Conjugate Gradient (IFCG) approach which is an evolution of the Conjugate Gradient method that consists in i) letting computations from different iterations to overlap between them ...
  • MUSA: a multi-level simulation approach for next-generation HPC machines 

    Grass, Thomas; Allande, César; Armejach, Adrià; Rico, Alejandro; Ayguadé Parra, Eduard; Labarta, Jesús; Valero Cortés, Mateo; Casas, Marc; Moreto Planas, Miquel (Institute of Electrical and Electronics Engineers (IEEE), 2016)
    Text en actes de congrés
    Accés restringit per política de l'editorial
    The complexity of High Performance Computing (HPC) systems is increasing in the number of components and their heterogeneity. Interactions between software and hardware involve many different aspects which are typically ...
  • Prediction of the impact of network switch utilization on application performance via active measurement 

    Casas, Marc; Bronevetsky, Greg (Elsevier, 2017-09)
    Article
    Accés restringit per política de l'editorial
    Although one of the key characteristics of High Performance Computing (HPC) infrastructures are their fast interconnecting networks, the increasingly large computational capacity of HPC nodes and the subsequent growth of ...
  • Reducing cache coherence traffic with hierarchical directory cache and NUMA-aware runtime scheduling 

    Caheny, Paul; Casas, Marc; Moreto Planas, Miquel; Gloaguen, Hervé; Saintes, Maxime; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2016)
    Text en actes de congrés
    Accés restringit per política de l'editorial
    Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provide for scaling core count and memory capacity. Also, the flat memory address space they offer considerably improves ...
  • Runtime-guided mitigation of manufacturing variability in power-constrained multi-socket NUMA nodes 

    Chasapis, Dimitrios; Casas, Marc; Moreto Planas, Miquel; Schulz, Martin; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Valero Cortés, Mateo (Association for Computing Machinery (ACM), 2016)
    Comunicació de congrés
    Accés restringit per política de l'editorial
  • Simulating whole supercomputer applications 

    González García, Juan; Casas, Marc; Giménez Lucas, Judit; Moreto Planas, Miquel; Ramírez Bellido, Alejandro; Labarta Mancho, Jesús José; Valero Cortés, Mateo (2011-06)
    Article
    Accés restringit per política de l'editorial
    Detailed simulations of large scale message-passing interface parallel applications are extremely time consuming and resource intensive. A new methodology that combines signal processing and data mining techniques plus a ...
  • Task scheduling techniques for asymmetric multi-core systems 

    Chronaki, Kallia; Rico, Alejandro; Casas, Marc; Moreto Planas, Miquel; Badia Sala, Rosa Maria; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Valero Cortés, Mateo (2016-11-29)
    Article
    Accés obert
    As performance and energy efficiency have become the main challenges for next-generation high-performance computing, asymmetric multi-core architectures can provide solutions to tackle these issues. Parallel programming ...
  • TaskPoint: sampled simulation of task-based programs 

    Grass, Thomas Dieter; Rico, Alejandro; Casas, Marc; Moreto Planas, Miquel; Ayguadé Parra, Eduard (Institute of Electrical and Electronics Engineers (IEEE), 2016)
    Text en actes de congrés
    Accés obert
    Sampled simulation is a mature technique for reducing simulation time of single-threaded programs, but it is not directly applicable to simulation of multi-threaded architectures. Recent multi-threaded sampling techniques ...
  • Using graph partitioning to accelerate task-based parallel applications 

    Sánchez Barrera, Isaac; Casas, Marc; Moreto Planas, Miquel; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Valero Cortés, Mateo (Barcelona Supercomputing Center, 2015-05-05)
    Text en actes de congrés
    Accés obert
    Current high performance computing architectures are composed of large shared memory NUMA nodes, among other components. Such nodes are becoming increasingly complex as they have several NUMA domains with different access ...