Now showing items 1-20 of 137

    • A library implementation of the nano-threads programming model 

      Martorell Bofill, Xavier; Labarta Mancho, Jesús José; Navarro, Nacho; Ayguadé Parra, Eduard (Springer, 1996)
      Conference report
      Open Access
      In this paper we describe the design and implementation of a user-level thread package based on the nano-threads programming model, whose goal is to efficiently manage the application parallelism at user-level. Nano-thread ...
    • A methodology approach to compare performance of parallel programming models for shared-memory architectures 

      Utrera Iglesias, Gladys Miriam; Gil, Marisa; Martorell Bofill, Xavier (Springer, 2020)
      Part of book or chapter of book
      Open Access
      The majority of current HPC applications are composed of complex and irregular data structures that involve techniques such as linear algebra, graph algorithms, and resource management, for which new platforms with varying ...
    • A module-based cell processor simulator 

      Cabarcas Jaramillo, Felipe; Rico Carro, Alejandro; Rodenas, David; Martorell Bofill, Xavier; Ramírez Bellido, Alejandro; Ayguadé Parra, Eduard (European Network of Excellence on High Performance and Embedded Architecture and Compilation (HiPEAC), 2006)
      Conference lecture
      Open Access
      An interesting design alternative to replication-based chip multiprocessors is to create heterogeneous chip multiprocessors composed of several different cores, with one or more of them running the operating system and ...
    • A novel asynchronous software cache implementation for the Cell-BE processor 

      Balart, J; González Tallada, Marc; Martorell Bofill, Xavier; Ayguadé Parra, Eduard; Sura, Z; Chen, T; Zhang, T; O'Brien, Kevin; O'Brien, Kathryn (2008-10)
      Article
      Restricted access - publisher's policy
      This paper describes the implementation of a runtime library for asynchronous communication in the Cell BE processor. The runtime library implementation provides with several services that allow the compiler to generate ...
    • A proposal for error handling in OpenMP 

      Duran González, Alejandro; Ferrer, Roger; Costa Prats, Juan José; González Tallada, Marc; Martorell Bofill, Xavier; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José (2008)
      Article
      Restricted access - publisher's policy
      OpenMP has been focused in performance applied to numerical applications, but when we try to move this focus to other kind of applications, like Web servers, we detect one important lack. In these applications, performance ...
    • A proposal for task-generating loops in OpenMP 

      Teruel, Xavier; Klemm, Michael; Li, Kelvin; Martorell Bofill, Xavier; Olivier, Stephen; Terboven, Christian (Springer, 2013)
      Conference report
      Restricted access - publisher's policy
      With the addition of the OpenMP* tasking model, programmers are able to improve and extend the parallelization opportunities of their codes. Programmers can also distribute the creation of tasks using a worksharing construct, ...
    • A streaming machine description and programming model 

      Carpenter, Paul Matthew; Ródenas Picó, David; Martorell Bofill, Xavier; Ramírez Bellido, Alejandro; Ayguadé Parra, Eduard (2007-07)
      Article
      Restricted access - publisher's policy
      In this paper we present the initial development of a streaming environment based on a programming model and machine description. The stream programming model consists of an extension to the C language and it’s translation ...
    • Accelerating boosting-based face detection on GPUs 

      Oro, David; Fernández, Carles; Segura, Carlos; Martorell Bofill, Xavier; Hernando Pericás, Francisco Javier (2012)
      Conference report
      Restricted access - publisher's policy
      The goal of face detection is to determine the presence of faces in arbitrary images, along with their locations and dimensions. As it happens with any graphics workloads, these algorithms benefit from data-level ...
    • Accelerating software memory compression on the Cell/B.E. 

      Beltran Querol, Vicenç; Martorell Bofill, Xavier; Torres Viñals, Jordi; Ayguadé Parra, Eduard (2008)
      Conference report
      Restricted access - publisher's policy
      The idea of transparently compressing and decompressing the content of main memory to virtually enlarge their capacity has been previously proposed and studied in the literature. The rationale behind this idea lies in the ...
    • Accelerating SpMV on FPGAs through block-row compress: a task-based approach 

      Oliver Segura, José; Álvarez Martínez, Carlos; Cervero García, Teresa; Martorell Bofill, Xavier; Davis, John D.; Ayguadé Parra, Eduard (Institute of Electrical and Electronics Engineers (IEEE), 2023)
      Conference lecture
      Open Access
      Sparse Matrix-Vector multiplication (SpMV), computing y=α⋅A×x+β⋅y where y,x are dense vectors, α,β two scalar constants, and A is a sparse matrix, is a key kernel in many HPC applications. It exhibits a kind of memory ...
    • Achieving high memory performance from heterogeneous architectures with the SARC programming model 

      Ferrer, Roger; Beltran Querol, Vicenç; González Tallada, Marc; Martorell Bofill, Xavier; Ayguadé Parra, Eduard (ACM, 2009)
      Conference lecture
      Restricted access - publisher's policy
      Current heterogeneous multicore architectures, including the Cell/B.E., GPUs, and future developments, like Larrabee, require enormous programming efforts to efficiently run current parallel applications, achieving high ...
    • ACOTES project: Advanced compiler technologies for embedded streaming 

      Duranton, M.; Munk, H.; Ayguadé Parra, Eduard; Bastoul, C.; Carpenter, Paul Matthew; Chamski, Z.; Cohen, A.; Cornero, M.; Dumont, P.; Pop, S.; Pop, A.; Ornstein, A.; Nuzman, D.; Miranda, C.; Martorell Bofill, Xavier; Lindwer, M.; Ladelsky, R.; Ferrer, Roger; Fellahi, M.; Pouchet, L. N; Zaks, A.; Shvadron, U.; Trifunovic, K.; Rohou, E.; Rosen, I.; Ramírez Bellido, Alejandro; Ródenas, D. (2011-04)
      Article
      Open Access
      Streaming applications are built of data-driven, computational components, consuming and producing unbounded data streams. Streaming oriented systems have become dominant in a wide range of domains, including embedded ...
    • An OpenMP* barrier using SIMD instructions for Intel® Xeon Phi™ coprocessor 

      Caballero, Diego; Duran González, Alejandro; Martorell Bofill, Xavier (Springer, 2013)
      Conference report
      Restricted access - publisher's policy
      Barrier synchronisation is a widely-studied topic since the supercomputer era due to its significant impact on the overall performance of parallel applications. With the current shift to many-core architectures, such as ...
    • Analyzing the impact of communication imbalance in high-speed networks 

      Utrera Iglesias, Gladys Miriam; Gil, Marisa; Martorell Bofill, Xavier (2017-12-21)
      Article
      Open Access
      In this work we analyze the communication load imbalance generated by irregular-data applications running in a multi-node cluster. Experimental approaches to diminish communication load imbalance are evaluated using a ...
    • Analyzing the performance of hierarchical collective algorithms on ARM-based multicore clusters 

      Utrera Iglesias, Gladys Miriam; Gil, Marisa; Martorell Bofill, Xavier (Institute of Electrical and Electronics Engineers (IEEE), 2022)
      Conference lecture
      Open Access
      MPI is the de facto communication standard library for parallel applications in distributed memory architectures. Collective operations performance is critical in HPC applications as they can become the bottleneck of their ...
    • Application acceleration on FPGAs with OmpSs@FPGA 

      Bosch, Jaume; Tan, Xubin; Filgueras Izquierdo, Antonio; Vidal, Miquel; Mateu, Marc; Jiménez-González, Daniel; Álvarez, Carlos; Martorell Bofill, Xavier; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José (Institute of Electrical and Electronics Engineers (IEEE), 2019)
      Conference report
      Open Access
      OmpSs@FPGA is the flavor of OmpSs that allows offloading application functionality to FPGAs. Similarly to OpenMP, it is based on compiler directives. While the OpenMP specification also includes support for heterogeneous ...
    • Applying interposition techniques for performance analysis of OPENMP parallel applications 

      González Tallada, Marc; Serra, Albert; Martorell Bofill, Xavier; Oliver Segura, José; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Navarro, Nacho (Institute of Electrical and Electronics Engineers (IEEE), 2000)
      Conference report
      Open Access
      Tuning parallel applications requires the use of effective tools for detecting performance bottlenecks. Along a parallel program execution, many individual situations of performance degradation may arise. We believe that ...
    • Asynchronous runtime with distributed manager for task-based programming models 

      Bosch Pons, Jaume; Álvarez Martínez, Carlos; Jiménez González, Daniel; Martorell Bofill, Xavier; Ayguadé Parra, Eduard (2020-09)
      Article
      Open Access
      Parallel task-based programming models, like OpenMP, allow application developers to easily create a parallel version of their sequential codes. The standard OpenMP 4.0 introduced the possibility of describing a set of ...
    • Auto-tuned OpenCL kernel co-execution in OmpSs for heterogeneous systems 

      Pérez, Borja; Stafford, Esteban; Bosque Orero, José Luis; Beivide Palacio, Ramon; Mateo Bellido, Sergi; Teruel García, Xavier; Martorell Bofill, Xavier; Ayguadé Parra, Eduard (2019-03-01)
      Article
      Open Access
      The emergence of heterogeneous systems has been very notable recently. The nodes of the most powerful computers integrate several compute accelerators, like GPUs. Profiting from such node configurations is not a trivial ...
    • Automatic communication coalescing for irregular computations in UPC language 

      Alvanos, Michail; Tiotto, Ettore; Farreras Esclusa, Montserrat; Martorell Bofill, Xavier (IBM, 2012)
      Conference report
      Restricted access - publisher's policy
      Partitioned Global Address Space (PGAS) languages appeared to address programmer productivity in large scale parallel machines. However, fine grain accesses on shared structures have been identified as one of the main ...