Recent Submissions

  • PLANAR: a programmable accelerator for near-memory data rearrangement 

    Barredo Ferreira, Adrián; Armejach Sanosa, Adrià; Beard, Jonathan C.; Moreto Planas, Miquel (Association for Computing Machinery (ACM), 2021)
    Conference report
    Open Access
    Many applications employ irregular and sparse memory accesses that cannot take advantage of existing cache hierarchies in high performance processors. To solve this problem, Data Layout Transformation (DLT) techniques ...
  • User-level dynamic page migration for multiprogrammed shared-memory multiprocessors 

    Nikolopoulos, Dimitrios S.; Papatheodorou, Theodore S.; Polychronopoulos, Constantine D.; Labarta Mancho, Jesús José; Ayguadé Parra, Eduard (Institute of Electrical and Electronics Engineers (IEEE), 2000)
    Conference report
    Open Access
    This paper presents algorithms for improving the performance of parallel programs on multiprogrammed shared-memory NUMA multiprocessors, via the use of user-level dynamic page migration. The idea that drives the algorithms ...
  • A case for user-level dynamic page migration 

    Nikolopoulos, Dimitrios; Papatheodorou, Theodore; Polychronopoulos, Constantine D.; Labarta Mancho, Jesús José; Ayguadé Parra, Eduard (Association for Computing Machinery (ACM), 2000)
    Conference report
    Open Access
    This paper presents user-level dynamic page migration, a runtime technique which transparently enables parallel programs to tune their memory performance on distributed shared memory multiprocessors, with feedback obtained ...
  • Applying interposition techniques for performance analysis of OPENMP parallel applications 

    González Tallada, Marc; Serra, Albert; Martorell Bofill, Xavier; Oliver Segura, José; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Navarro, Nacho (Institute of Electrical and Electronics Engineers (IEEE), 2000)
    Conference report
    Open Access
    Tuning parallel applications requires the use of effective tools for detecting performance bottlenecks. Along a parallel program execution, many individual situations of performance degradation may arise. We believe that ...
  • VIA: A smart scratchpad for vector units with application to sparse matrix computations 

    Pavón Rivera, Julián; Vargas Valdivieso, Iván; Barredo Ferreira, Adrián; Marimon Illana, Joan; Moreto Planas, Miquel; Moll Echeto, Francisco de Borja; Unsal, Osman Sabri; Valero Cortés, Mateo; Cristal Kestelman, Adrián (Institute of Electrical and Electronics Engineers (IEEE), 2021)
    Conference report
    Open Access
    Sparse matrix operations are critical kernels in multiple application domains such as High Performance Computing, artificial intelligence and big data. Vector processing is widely used to improve performance on mathematical ...
  • Thread fork/join techniques for multi-level parallelism exploitation in NUMA multiprocessors 

    Martorell Bofill, Xavier; Ayguadé Parra, Eduard; Navarro, Nacho; Corbalán González, Julita; González Tallada, Marc; Labarta Mancho, Jesús José (Association for Computing Machinery (ACM), 1999)
    Conference report
    Open Access
    This paper presents some techniques for efficient thread forking and joining in parallel execution environments, taking into consideration the physical structure of NUMA machines and the support for multi-level parallelization ...
  • Kernel-level scheduling for the nano-threads programming model 

    Polychronopoulos, Eleftherios D.; Martorell Bofill, Xavier; Nikolopoulos, Dimitrios S.; Labarta Mancho, Jesús José; Papatheodorou, Theodore S.; Navarro, Nacho (Associaton for Computing Machinery (ACM), 1998)
    Conference report
    Open Access
    Multiprocessor systems are increasingly becoming the sys- tems of choice for low and high-end servers, running such diverse tasks as number crunching, large-scale simulations, data base engines and world wide web server ...
  • Data distribution and loop parallelization for shared-memory multiprocessors 

    Ayguadé Parra, Eduard; García Almiñana, Jordi; Grande Ayan, Ma. Luz; Labarta Mancho, Jesús José (Springer, 1996)
    Conference report
    Open Access
    Shared-memory multiprocessor systems can achieve high performance levels when appropriate work parallelization and data distribution are performed. These two actions are not independent and decisions have to be taken in a ...
  • A library implementation of the nano-threads programming model 

    Martorell Bofill, Xavier; Labarta Mancho, Jesús José; Navarro, Nacho; Ayguadé Parra, Eduard (Springer, 1996)
    Conference report
    Open Access
    In this paper we describe the design and implementation of a user-level thread package based on the nano-threads programming model, whose goal is to efficiently manage the application parallelism at user-level. Nano-thread ...
  • 5GCroCo Barcelona trial site for cross-border anticipated cooperative collision avoidance 

    Vazquez-Gallego, Francisco; Casellas Gordillo, Rosa Maria; Vilalta Cañellas, Ricard; Sedar, Mohottige Roshan Madhusanka; Alemany Prats, Pol; Martínez Sevillano, Rubén; Alonso Zárate, Jesús; Moscatelli, Francesca; Guilhot, Denis; Echarri, José Miguel; Dizambourg, Laurent (Institute of Electrical and Electronics Engineers (IEEE), 2020)
    Conference report
    Restricted access - publisher's policy
    Cooperative, connected and automated mobility (CCAM) services along different countries require cross-border solutions to support seamless delivery of services in a multioperator, multi-telco-vendor, and multi-car-manufacturer ...
  • Generating a periodic pattern for VLIW 

    Barrado Muxí, Cristina; Labarta Mancho, Jesús José; Ayguadé Parra, Eduard; Valero Cortés, Mateo (Universidad de Málaga, 1995)
    Conference report
    Open Access
    Fine-grain parallelism available in VLIW and superscalar processors can be mainly exploited in computational intensive loops. Aggressive scheduling techniques are required to fully exploit this parallelism. In this paper ...
  • On automatic loop data-mapping for distributed-memory multiprocessors 

    Torres Viñals, Jordi; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Llaberia Griñó, José M.; Valero Cortés, Mateo (Springer, 1991)
    Conference report
    Open Access
    In this paper we present a unified approach for compiling programs for Distributed-Memory Multiprocessors (DMM). Parallelization of sequential programs for DMM is much more difficult to achieve than for shared memory systems ...

View more