Browsing by Author "Peña, Antonio J."
Now showing items 1-20 of 21
-
A Review of Lightweight Thread Approaches for High Performance Computing
Castelló, Adrián; Peña, Antonio J.; Seo, Sangmin; Mayo, Rafael; Balaji, Pavan; Quintana Ortí, Enrique Salvador (IEEE, 2016-12-08)
Conference report
Open AccessHigh-level, directive-based solutions are becoming the programming models (PMs) of the multi/many-core architectures. Several solutions relying on operating system (OS) threads perfectly work with a moderate number of ... -
Automating the application data placement in hybrid memory systems
Servat, Harald; Peña, Antonio J.; Llort, German; Mercadal, Estanislao; Hoppe, Hans-Christian; Labarta Mancho, Jesús José (Institute of Electrical and Electronics Engineers (IEEE), 2017)
Conference report
Open AccessMulti-tiered memory systems, such as those based on Intel® Xeon Phi™processors, are equipped with several memory tiers with different characteristics including, among others, capacity, access latency, bandwidth, energy ... -
cuHinesBatch: solving multiple hines systems on GPUs Human Brain Project
Valero-Lara, Pedro; Martinez-Perez, Ivan; Peña, Antonio J.; Martorell Bofill, Xavier; Sirvent, Raul; Labarta Mancho, Jesús José (Elsevier, 2017)
Article
Open AccessThe simulation of the behavior of the Human Brain is one of the most important challenges today in computing. The main problem consists of finding efficient ways to manipulate and compute the huge volume of data that this ... -
cuThomasBatch and cuThomasVBatch, CUDA Routines to compute batch of tridiagonal systems on NVIDIA GPUs
Valero Lara, Pedro; Martinez Pérez, Ivan; Sirvent, Raül; Martorell Bofill, Xavier; Peña, Antonio J. (Wiley, 2018-01-01)
Article
Open AccessThe solving of tridiagonal systems is one of the most computationally expensive parts in many applications, so that multiple studies have explored the use of NVIDIA GPUs to accelerate such computation. However, these studies ... -
Dynamic Adaptable Asynchronous Progress Model for MPI RMA Multiphase Applications
Si, Min; Peña, Antonio J.; Hammond, Jeff; Balaji, Pavan; Takagi, Masamichi; Ishikawa, Yutaka (IEEE, 2018-09-01)
Article
Open AccessCasper is a process-based asynchronous progress model for MPI one-sided communication on multi- and many-core architectures. The one-sided communication is not truly one-sided in most MPI implementations: the target process ... -
Efficient data sharing on heterogeneous systems
García-Flores, Víctor; Ayguadé Parra, Eduard; Peña, Antonio J. (Institute of Electrical and Electronics Engineers (IEEE), 2017)
Conference report
Restricted access - publisher's policyGeneral-purpose computing on GPUs has become more accessible due to features such as shared virtual memory and demand paging. Unfortunately it comes at a price, and that is performance. Automatic memory management is ... -
Efficient Scalable Computing through Flexible Applications and Adaptive Workloads
Iserte, Sergio; Mayo, Rafael; Quintana Ortí, Enrique Salvador; Beltran Querol, Vicenç; Peña, Antonio J. (IEEE, 2017-09-07)
Conference lecture
Open AccessIn this paper we introduce a methodology for dynamic job reconfiguration driven by the programming model runtime in collaboration with the global resource manager. We improve the system throughput by exploiting malleability ... -
Enabling CUDA acceleration within virtual machines using rCUDA
Duato, José; Peña, Antonio J.; Silla, Federico; Fernández, Juan C.; Mayo, Rafael; Quintana Ortí, Enrique Salvador (IEEE, 2012-02-16)
Conference lecture
Open AccessThe hardware and software advances of Graphics Processing Units (GPUs) have favored the development of GPGPU (General-Purpose Computation on GPUs) and its adoption in many scientific, engineering, and industrial areas. ... -
Exploring the interoperability of remote GPGPU virtualization using rCUDA and directive-based programming models
Castelló, Adrian; Peña, Antonio J.; Mayo, Rafael; Planas, Judit; Quintana Ortí, Enrique Salvador; Balaji, Pavan (Springer US, 2016-06-21)
Article
Open AccessDirective-based programming models, such as OpenMP, OpenACC, and OmpSs, enable users to accelerate applications by using coprocessors with little effort. These devices offer significant computing power, but their use can ... -
Exploring the Vision Processing Unit as Co-Processor for Inference
Rivas-Gomez, Sergio; Peña, Antonio J.; Moloney, David; Laure, Erwin; Markidis, Stefano (IEEE, 2018-08-06)
Conference lecture
Open AccessThe success of the exascale supercomputer is largely debated to remain dependent on novel breakthroughs in technology that effectively reduce the power consumption and thermal dissipation requirements. In this work, we ... -
GLT: A Unified API for Lightweight Thread Libraries
Castelló, Adrián; Seo, Sangmin; Mayo, Rafael; Balaji, Pavan; Quintana Ortí, Enrique Salvador; Peña, Antonio J. (Springer, 2017-08)
Conference report
Open AccessIn recent years, several lightweight thread (LWT) libraries have emerged to tackle exascale challenges. These offer programming models (PMs) based on user-level threads and incorporate their own lightweight mechanisms. ... -
GLTO: On the Adequacy of Lightweight Thread Approaches for OpenMP Implementations
Castelló, Adrián; Mayo, Rafael; Quintana Ortí, Enrique Salvador; Seo, Sangmin; Balaji, Pavan; Peña, Antonio J. (IEEE, 2017-09-07)
Conference lecture
Open AccessOpenMP is the de facto standard application programming interface (API) for on-node parallelism. The most popular OpenMP runtimes rely on POSIX threads (pthreads) implementations that offer an excellent performance for ... -
Improving the interoperability between MPI and task-based programming models
Sala Penadés, Kevin; Bellón, Jorge; Farré, Pau; Teruel, Xavier; Pérez, Josep M.; Peña, Antonio J.; Holmes, Daniel; Beltran Querol, Vicenç; Labarta Mancho, Jesús José (Association for Computing Machinery (ACM), 2018)
Conference report
Open AccessIn this paper we propose an API to pause and resume task execution depending on external events. We leverage this generic API to improve the interoperability between MPI synchronous communication primitives and tasks. When ... -
Integrating blocking and non-blocking MPI primitives with task-based programming models
Sala Penadés, Kevin; Teruel García, Xavier; Pérez Cáncer, Josep Maria; Peña, Antonio J.; Beltran, Vicenç; Labarta Mancho, Jesús José (2019-07)
Article
Open AccessIn this paper we present the Task-Aware MPI library (TAMPI) that integrates both blocking and non-blocking MPI primitives with task-based programming models. The TAMPI library leverages two new runtime APIs to improve both ... -
Integrating memory perspective into the BSC performance tools
Servat, Harald; Labarta Mancho, Jesús José; Hoppe, Hans-Christian; Gimenez, Judit; Peña, Antonio J. (Institute of Electrical and Electronics Engineers (IEEE), 2017)
Conference report
Open AccessThe growing gap between processor and memory speeds results in complex memory hierarchies as processors evolve to mitigate such differences by taking advantage of locality of reference. In this direction, the BSC performance ... -
MPI+OpenMP tasking scalability for multi-morphology simulations of the human brain
Valero-Lara, Pedro; Sirvent, Raül; Peña, Antonio J.; Labarta Mancho, Jesús José (2019-05)
Article
Open AccessThe simulation of the behavior of the human brain is one of the most ambitious challenges today with a non-end of important applications. We can find many different initiatives in the USA, Europe and Japan which attempt ... -
On the adequacy of lightweight thread approaches for high-level parallel programming models
Castelló, Adrián; Mayo Gual, Rafael; Sala Penadés, Kevin; Beltran Querol, Vicenç; Balaji, Pavan; Peña, Antonio J. (Elsevier, 2018-07)
Article
Open AccessHigh-level parallel programming models (PMs) are becoming crucial in order to extract the computational power of current on-node multi-threaded parallelism. The most popular PMs, such as OpenMP or OmpSs, are directive-based: ... -
Simulating the behavior of the human brain on GPUS
Valero-Lara, Pedro; Martinez-Perez, Ivan; Sirvent, Raul; Peña, Antonio J.; Martorell Bofill, Xavier; Labarta Mancho, Jesús José (2018-01-01)
Article
Open AccessThe simulation of the behavior of the Human Brain is one of the most important challenges in computing today. The main problem consists of finding efficient ways to manipulate and compute the huge volume of data that this ... -
Supporting automatic recovery in offloaded distributed programming models through MPI-3 techniques
Peña, Antonio J.; Beltran Querol, Vicenç; Clauss, Carsten; Moschny, Thomas (ACM Digital Library, 2017-06-15)
Conference lecture
Open AccessIn this paper we describe the design of fault tolerance capabilities for general-purpose offload semantics, based on the OmpSs programming model. Using ParaStation MPI, a production MPI-3.1 implementation, we explore the ... -
Tasking in accelerators: performance evaluation
Toledo, Leonel; Peña, Antonio J.; Catalán, Sandra; Valero-Lara, Pedro (Institute of Electrical and Electronics Engineers (IEEE), 2019)
Conference report
Open AccessIn this work, we analyze the implications and results of implementing dynamic parallelism, concurrent kernels and CUDA Graphs to solve task-oriented problems. As a benchmark we propose three different methods for solving ...