Now showing items 1-20 of 20

    • A Review of Lightweight Thread Approaches for High Performance Computing 

      Castelló, Adrián; Peña, Antonio J.; Seo, Sangmin; Mayo, Rafael; Balaji, Pavan; Quintana-Ortí, Enrique S. (IEEE, 2016-12-08)
      Conference report
      Open Access
      High-level, directive-based solutions are becoming the programming models (PMs) of the multi/many-core architectures. Several solutions relying on operating system (OS) threads perfectly work with a moderate number of ...
    • Automating the application data placement in hybrid memory systems 

      Servat, Harald; Peña, Antonio J.; Llort, German; Mercadal, Estanislao; Hoppe, Hans-Christian; Labarta Mancho, Jesús José (Institute of Electrical and Electronics Engineers (IEEE), 2017)
      Conference report
      Open Access
      Multi-tiered memory systems, such as those based on Intel® Xeon Phi™processors, are equipped with several memory tiers with different characteristics including, among others, capacity, access latency, bandwidth, energy ...
    • cuHinesBatch: solving multiple hines systems on GPUs Human Brain Project 

      Valero-Lara, Pedro; Martinez-Perez, Ivan; Peña, Antonio J.; Martorell Bofill, Xavier; Sirvent, Raul; Labarta Mancho, Jesús José (Elsevier, 2017)
      Article
      Open Access
      The simulation of the behavior of the Human Brain is one of the most important challenges today in computing. The main problem consists of finding efficient ways to manipulate and compute the huge volume of data that this ...
    • cuThomasBatch and cuThomasVBatch, CUDA Routines to compute batch of tridiagonal systems on NVIDIA GPUs 

      Valero Lara, Pedro; Martinez Pérez, Ivan; Sirvent, Raül; Martorell Bofill, Xavier; Peña, Antonio J. (Wiley, 2018-01-01)
      Article
      Open Access
      The solving of tridiagonal systems is one of the most computationally expensive parts in many applications, so that multiple studies have explored the use of NVIDIA GPUs to accelerate such computation. However, these studies ...
    • Dynamic Adaptable Asynchronous Progress Model for MPI RMA Multiphase Applications 

      Si, Min; Peña, Antonio J.; Hammond, Jeff; Balaji, Pavan; Takagi, Masamichi; Ishikawa, Yutaka (IEEE, 2018-09-01)
      Article
      Open Access
      Casper is a process-based asynchronous progress model for MPI one-sided communication on multi- and many-core architectures. The one-sided communication is not truly one-sided in most MPI implementations: the target process ...
    • Efficient data sharing on heterogeneous systems 

      García-Flores, Víctor; Ayguadé Parra, Eduard; Peña, Antonio J. (Institute of Electrical and Electronics Engineers (IEEE), 2017)
      Conference report
      Restricted access - publisher's policy
      General-purpose computing on GPUs has become more accessible due to features such as shared virtual memory and demand paging. Unfortunately it comes at a price, and that is performance. Automatic memory management is ...
    • Efficient Scalable Computing through Flexible Applications and Adaptive Workloads 

      Iserte, Sergio; Mayo, Rafael; Quintana-Ortí, Enrique S.; Beltran Querol, Vicenç; Peña, Antonio J. (IEEE, 2017-09-07)
      Conference lecture
      Open Access
      In this paper we introduce a methodology for dynamic job reconfiguration driven by the programming model runtime in collaboration with the global resource manager. We improve the system throughput by exploiting malleability ...
    • Enabling CUDA acceleration within virtual machines using rCUDA 

      Duato, José; Peña, Antonio J.; Silla, Federico; Fernández, Juan C.; Mayo, Rafael; Quintana-Ortí, Enrique S. (IEEE, 2012-02-16)
      Conference lecture
      Open Access
      The hardware and software advances of Graphics Processing Units (GPUs) have favored the development of GPGPU (General-Purpose Computation on GPUs) and its adoption in many scientific, engineering, and industrial areas. ...
    • Exploring the interoperability of remote GPGPU virtualization using rCUDA and directive-based programming models 

      Castelló, Adrian; Peña, Antonio J.; Mayo, Rafael; Planas, Judit; Quintana-Ortí, Enrique S.; Balaji, Pavan (Springer US, 2016-06-21)
      Article
      Open Access
      Directive-based programming models, such as OpenMP, OpenACC, and OmpSs, enable users to accelerate applications by using coprocessors with little effort. These devices offer significant computing power, but their use can ...
    • Exploring the Vision Processing Unit as Co-Processor for Inference 

      Rivas-Gomez, Sergio; Peña, Antonio J.; Moloney, David; Laure, Erwin; Markidis, Stefano (IEEE, 2018-08-06)
      Conference lecture
      Open Access
      The success of the exascale supercomputer is largely debated to remain dependent on novel breakthroughs in technology that effectively reduce the power consumption and thermal dissipation requirements. In this work, we ...
    • GLT: A Unified API for Lightweight Thread Libraries 

      Castelló, Adrián; Seo, Sangmin; Mayo, Rafael; Balaji, Pavan; Quintana-Ortí, Enrique S.; Peña, Antonio J. (Springer, 2017-08)
      Conference report
      Open Access
      In recent years, several lightweight thread (LWT) libraries have emerged to tackle exascale challenges. These offer programming models (PMs) based on user-level threads and incorporate their own lightweight mechanisms. ...
    • GLTO: On the Adequacy of Lightweight Thread Approaches for OpenMP Implementations 

      Castelló, Adrián; Mayo, Rafael; Quintana-Ortí, Enrique S.; Seo, Sangmin; Balaji, Pavan; Peña, Antonio J. (IEEE, 2017-09-07)
      Conference lecture
      Open Access
      OpenMP is the de facto standard application programming interface (API) for on-node parallelism. The most popular OpenMP runtimes rely on POSIX threads (pthreads) implementations that offer an excellent performance for ...
    • Improving the interoperability between MPI and task-based programming models 

      Sala Penadés, Kevin; Bellón, Jorge; Farré, Pau; Teruel, Xavier; Pérez, Josep M.; Peña, Antonio J.; Holmes, Daniel; Beltran Querol, Vicenç; Labarta Mancho, Jesús José (Association for Computing Machinery (ACM), 2018)
      Conference report
      Open Access
      In this paper we propose an API to pause and resume task execution depending on external events. We leverage this generic API to improve the interoperability between MPI synchronous communication primitives and tasks. When ...
    • Integrating blocking and non-blocking MPI primitives with task-based programming models 

      Sala Penadés, Kevin; Teruel García, Xavier; Pérez Cáncer, Josep Maria; Peña, Antonio J.; Beltran, Vicenç; Labarta Mancho, Jesús José (2019-07)
      Article
      Restricted access - publisher's policy
      In this paper we present the Task-Aware MPI library (TAMPI) that integrates both blocking and non-blocking MPI primitives with task-based programming models. The TAMPI library leverages two new runtime APIs to improve both ...
    • Integrating memory perspective into the BSC performance tools 

      Servat, Harald; Labarta Mancho, Jesús José; Hoppe, Hans-Christian; Gimenez, Judit; Peña, Antonio J. (Institute of Electrical and Electronics Engineers (IEEE), 2017)
      Conference report
      Open Access
      The growing gap between processor and memory speeds results in complex memory hierarchies as processors evolve to mitigate such differences by taking advantage of locality of reference. In this direction, the BSC performance ...
    • MPI+OpenMP tasking scalability for multi-morphology simulations of the human brain 

      Valero-Lara, Pedro; Sirvent, Raül; Peña, Antonio J.; Labarta Mancho, Jesús José (2019-05)
      Article
      Restricted access - publisher's policy
      The simulation of the behavior of the human brain is one of the most ambitious challenges today with a non-end of important applications. We can find many different initiatives in the USA, Europe and Japan which attempt ...
    • On the adequacy of lightweight thread approaches for high-level parallel programming models 

      Castelló, Adrián; Mayo Gual, Rafael; Sala Penadés, Kevin; Beltran Querol, Vicenç; Balaji, Pavan; Peña, Antonio J. (Elsevier, 2018-07)
      Article
      Open Access
      High-level parallel programming models (PMs) are becoming crucial in order to extract the computational power of current on-node multi-threaded parallelism. The most popular PMs, such as OpenMP or OmpSs, are directive-based: ...
    • Supporting automatic recovery in offloaded distributed programming models through MPI-3 techniques 

      Peña, Antonio J.; Beltran Querol, Vicenç; Clauss, Carsten; Moschny, Thomas (ACM Digital Library, 2017-06-15)
      Conference lecture
      Open Access
      In this paper we describe the design of fault tolerance capabilities for general-purpose offload semantics, based on the OmpSs programming model. Using ParaStation MPI, a production MPI-3.1 implementation, we explore the ...
    • Tasking in accelerators: performance evaluation 

      Toledo, Leonel; Peña, Antonio J.; Catalán, Sandra; Valero-Lara, Pedro (Institute of Electrical and Electronics Engineers (IEEE), 2019)
      Conference report
      Open Access
      In this work, we analyze the implications and results of implementing dynamic parallelism, concurrent kernels and CUDA Graphs to solve task-oriented problems. As a benchmark we propose three different methods for solving ...
    • Understanding memory access patterns using the BSC performance tools 

      Servat, Harald; Labarta Mancho, Jesús José; Hoppe, Hans-Christian; Giménez, Judit; Peña, Antonio J. (Elsevier, 2018-10)
      Article
      Open Access
      The growing gap between processor and memory speeds has lead to complex memory hierarchies as processors evolve to mitigate such divergence by exploiting the locality of reference. In this direction, the BSC performance ...