Now showing items 21-40 of 40

    • Iteration-fusing conjugate gradient for sparse linear systems with MPI + OmpSs 

      Barreda, María; Aliaga, José I; Beltran, Vicenç; Casas, Marc (Springer Link, 2020)
      Article
      Open Access
      In this paper, we target the parallel solution of sparse linear systems via iterative Krylov subspace-based method enhanced with a block-Jacobi preconditioner on a cluster of multicore processors. In order to tackle ...
    • MUSA: a multi-level simulation approach for next-generation HPC machines 

      Grass, Thomas; Allande, César; Armejach, Adrià; Rico, Alejandro; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Valero Cortés, Mateo; Casas, Marc; Moreto Planas, Miquel (Institute of Electrical and Electronics Engineers (IEEE), 2016)
      Conference report
      Restricted access - publisher's policy
      The complexity of High Performance Computing (HPC) systems is increasing in the number of components and their heterogeneity. Interactions between software and hardware involve many different aspects which are typically ...
    • On the benefits of tasking with OpenMP 

      Rico, Alejandro; Sánchez Barrera, Isaac; Joao, Jose A.; Randall, Joshua; Casas, Marc; Moreto Planas, Miquel (Springer, 2019)
      Conference report
      Open Access
      Tasking promises a model to program parallel applications that provides intuitive semantics. In the case of tasks with dependences, it also promises better load balancing by removing global synchronizations (barriers), and ...
    • On the maturity of parallel applications for asymmetric multi-core processors 

      Chronaki, Kallia; Moreto Planas, Miquel; Casas, Marc; Rico, Alejandro; Badia Sala, Rosa Maria; Ayguadé Parra, Eduard; Valero Cortés, Mateo (Elsevier, 2019-05-01)
      Article
      Open Access
      Asymmetric multi-cores (AMCs) are a successful architectural solution for both mobile devices and supercomputers. By maintaining two types of cores (fast and slow) AMCs are able to provide high performance under the facility ...
    • Optimizing computation-communication overlap in asynchronous task-based programs 

      Castillo, Emilio; Jain, Nikhil; Casas, Marc; Moreto Planas, Miquel; Schulz, Martin; Beivide Palacio, Julio Ramon; Valero Cortés, Mateo; Bhatele, Abhinav (Association for Computing Machinery (ACM), 2019)
      Conference report
      Open Access
      Asynchronous task-based programming models are gaining popularity to address the programmability and performance challenges in high performance computing. One of the main attractions of these models and runtimes is their ...
    • Performance and energy effects on task-based parallelized applications: User-directed versus manual vectorization 

      Caminal Pallarés, Helena; Caballero de Gea, Diego; Cebrián González, Juan Manuel; Ferrer, Roger; Casas, Marc; Moreto Planas, Miquel; Martorell Bofill, Xavier; Valero Cortés, Mateo (2018-06)
      Article
      Open Access
      Heterogeneity, parallelization and vectorization are key techniques to improve the performance and energy efficiency of modern computing systems. However, programming and maintaining code for these architectures poses a ...
    • Power efficient job scheduling by predicting the impact of processor manufacturing variability 

      Chasapis, Dimitrios; Moreto Planas, Miquel; Schulz, Martin; Rountree, Barry; Valero Cortés, Mateo; Casas, Marc (Association for Computing Machinery (ACM), 2019)
      Conference report
      Open Access
      Modern CPUs suffer from performance and power consumption variability due to the manufacturing process. As a result, systems that do not consider such variability caused by manufacturing issues lead to performance degradations ...
    • Prediction of the impact of network switch utilization on application performance via active measurement 

      Casas, Marc; Bronevetsky, Greg (Elsevier, 2017-09)
      Article
      Open Access
      Although one of the key characteristics of High Performance Computing (HPC) infrastructures are their fast interconnecting networks, the increasingly large computational capacity of HPC nodes and the subsequent growth of ...
    • Reducing cache coherence traffic with hierarchical directory cache and NUMA-aware runtime scheduling 

      Caheny, Paul; Casas, Marc; Moreto Planas, Miquel; Gloaguen, Hervé; Saintes, Maxime; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2016)
      Conference report
      Restricted access - publisher's policy
      Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provide for scaling core count and memory capacity. Also, the flat memory address space they offer considerably improves ...
    • Resilient gossip-inspired all-reduce algorithms for high-performance computing - Potential, limitations, and open questions 

      Casas, Marc; Gansterer, Wilfried N.; Wimmer, Elias (SAGE Publications, 2018-04-09)
      Article
      Open Access
      We investigate the usefulness of gossip-based reduction algorithms in a high-performance computing (HPC) context. We compare them to state-of-the-art deterministic parallel reduction algorithms in terms of fault tolerance ...
    • Runtime-assisted cache coherence deactivation in task parallel programs 

      Caheny, Paul; Álvarez Martí, Lluc; Valero Cortés, Mateo; Moreto Planas, Miquel; Casas, Marc (Association for Computing Machinery (ACM), 2018)
      Conference report
      Open Access
      With increasing core counts, the scalability of directory-based cache coherence has become a challenging problem. To reduce the area and power needs of the directory, recent proposals reduce its size by classifying data ...
    • Runtime-guided management of stacked DRAM memories in task parallel programs 

      Álvarez Martí, Lluc; Casas, Marc; Labarta Mancho, Jesús José; Ayguadé Parra, Eduard; Valero Cortés, Mateo; Moreto Planas, Miquel (Association for Computing Machinery (ACM), 2018)
      Conference report
      Open Access
      Stacked DRAM memories have become a reality in High-Performance Computing (HPC) architectures. These memories provide much higher bandwidth while consuming less power than traditional off-chip memories, but their limited ...
    • Runtime-guided mitigation of manufacturing variability in power-constrained multi-socket NUMA nodes 

      Chasapis, Dimitrios; Casas, Marc; Moreto Planas, Miquel; Schulz, Martin; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Valero Cortés, Mateo (Association for Computing Machinery (ACM), 2016)
      Conference lecture
      Open Access
    • Simulating whole supercomputer applications 

      González García, Juan; Casas, Marc; Giménez Lucas, Judit; Moreto Planas, Miquel; Ramírez Bellido, Alejandro; Labarta Mancho, Jesús José; Valero Cortés, Mateo (2011-06)
      Article
      Restricted access - publisher's policy
      Detailed simulations of large scale message-passing interface parallel applications are extremely time consuming and resource intensive. A new methodology that combines signal processing and data mining techniques plus a ...
    • Stencil codes on a vector length agnostic architecture 

      Armejach Sanosa, Adrià; Caminal Pallarés, Helena; Cebrián González, Juan Manuel; González-Alberquilla, Rekai; Adeniyi-Jones, Chris; Valero Cortés, Mateo; Casas, Marc; Moreto Planas, Miquel (Association for Computing Machinery (ACM), 2018)
      Conference report
      Open Access
      Data-level parallelism is frequently ignored or underutilized. Achieved through vector/SIMD capabilities, it can provide substantial performance improvements on top of widely used techniques such as thread-level parallelism. ...
    • Task scheduling techniques for asymmetric multi-core systems 

      Chronaki, Kallia; Rico, Alejandro; Casas, Marc; Moreto Planas, Miquel; Badia Sala, Rosa Maria; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Valero Cortés, Mateo (2017-07-01)
      Article
      Open Access
      As performance and energy efficiency have become the main challenges for next-generation high-performance computing, asymmetric multi-core architectures can provide solutions to tackle these issues. Parallel programming ...
    • TaskGenX: A Hardware-Software Proposal for Accelerating Task Parallelism 

      Chronaki, Kallia; Casas, Marc; Moreto Planas, Miquel; Bosch Pons, Jaume; Badia Sala, Rosa Maria (Springer, 2018-05-29)
      Conference lecture
      Open Access
      As chip multi-processors (CMPs) are becoming more and more complex, software solutions such as parallel programming models are attracting a lot of attention. Task-based parallel programming models offer an appealing approach ...
    • TaskPoint: sampled simulation of task-based programs 

      Grass, Thomas Dieter; Rico, Alejandro; Casas, Marc; Moreto Planas, Miquel; Ayguadé Parra, Eduard (Institute of Electrical and Electronics Engineers (IEEE), 2016)
      Conference report
      Open Access
      Sampled simulation is a mature technique for reducing simulation time of single-threaded programs, but it is not directly applicable to simulation of multi-threaded architectures. Recent multi-threaded sampling techniques ...
    • The HPCG benchmark: analysis, shared memory preliminary improvements and evaluation on an Arm-based platform 

      Ruiz, Daniel; Mantovani, Filippo; Casas, Marc; Labarta Mancho, Jesús José; Spiga, Filippo (2018)
      External research report
      Open Access
      The High-Performance Conjugate Gradient (HPCG) benchmark complements the LINPACK benchmark in the performance evaluation coverage of large High-Performance Computing (HPC) systems. Due to its lower arithmetic intensity and ...
    • Using graph partitioning to accelerate task-based parallel applications 

      Sánchez Barrera, Isaac; Casas, Marc; Moreto Planas, Miquel; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Valero Cortés, Mateo (Barcelona Supercomputing Center, 2015-05-05)
      Conference report
      Open Access
      Current high performance computing architectures are composed of large shared memory NUMA nodes, among other components. Such nodes are becoming increasingly complex as they have several NUMA domains with different access ...