Recent Submissions

  • Characterizing and improving the performance of many-core task-based parallel programming runtimes 

    Bosch, Jaume; Tan, Xubin; Álvarez Martínez, Carlos; Jiménez González, Daniel; Martorell Bofill, Xavier; Ayguadé Parra, Eduard (2017)
    Conference report
    Restricted access - publisher's policy
    Parallel task-based programming models like OpenMP support the declaration of task data dependences. This information is used to delay the task execution until the task data is available. The dependences between tasks are ...
  • Skip RNN: learning to skip state updates in recurrent neural networks 

    Campos Camunez, Victor; Jou, Brendan; Giró Nieto, Xavier; Torres Viñals, Jordi; Chang, Shih-Fu (2018)
    Conference lecture
    Open Access
    Recurrent Neural Networks (RNNs) continue to show outstanding performance in sequence modeling tasks. However, training RNNs on long sequences often face challenges like slow inference, vanishing gradients and difficulty ...
  • A template system for the efficient compilation of domain abstractions onto reconfigurable computers 

    Shafiq, Muhammad; Pericàs Gleim, Miquel; Ayguadé Parra, Eduard (2011)
    Conference report
    Restricted access - publisher's policy
    Past research has addressed the issue of using FPGAs as accelerators for HPC systems. However, writing low level code for an efficient, portable and scalable architecture altogether has been always a ...
  • Asynchronous PGAS runtime for Myrinet networks 

    Farreras Esclusa, Montserrat; Almasi, George (Association for Computing Machinery (ACM), 2010)
    Conference report
    Restricted access - publisher's policy
    PGAS languages aim to enhance productivity for large scale systems. The IBM Asynchronous PGAS runtime (APGAS) supports various high productivity programming languages including UPC, X10 and CAF. The runtime has been designed ...
  • Productive cluster programming with OmpSs 

    Bueno Hedo, Javier; Martinell, Lluis; Duran Gonzalez, Alejandro; Farreras Esclusa, Montserrat; Martorell Bofill, Xavier; Badia Sala, Rosa Maria; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José (Springer, 2011)
    Conference report
    Restricted access - publisher's policy
    Clusters of SMPs are ubiquitous. They have been traditionally programmed by using MPI. But, the productivity of MPI programmers is low because of the complexity of expressing parallelism and communication, and the difficulty ...
  • Graph partitioning applied to DAG scheduling to reduce NUMA effects 

    Sánchez Barrera, Isaac; Casas, Marc; Moreto Planas, Miquel; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Valero Cortés, Mateo (Association for Computing Machinery (ACM), 2018)
    Conference lecture
    Restricted access - publisher's policy
    The complexity of shared memory systems is becoming more relevant as the number of memory domains increases, with different access latencies and bandwidth rates depending on the proximity between the cores and the devices ...
  • Saiph: towards a DSL for high-performance computational fluid dynamics 

    Macià, Sandra; Mateo, Sergi; Martínez-Ferrer, Pedro J.; Beltran, Vicenç; Mira, Daniel; Ayguadé Parra, Eduard (Association for Computing Machinery (ACM), 2018)
    Conference report
    Restricted access - publisher's policy
    Nowadays high-performance computing is taking an increasingly central role in scientific research while computer architectures are becoming more heterogeneous and complex with different parallel programming models and ...
  • Architectural support for task dependence management with flexible software scheduling 

    Castillo, Emilio; Álvarez, Lluc; Moreto Planas, Miquel; Casas, Marc; Vallejo, Enrique; Bosque, Jose L.; Beivide Palacio, Ramon; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2018)
    Conference report
    Open Access
    The growing complexity of multi-core architectures has motivated a wide range of software mechanisms to improve the orchestration of parallel executions. Task parallelism has become a very attractive approach thanks to its ...
  • Improving OpenStack Swift interaction with the I/O stack to enable software defined storage 

    Nou, Ramon; Miranda, Alberto; Siquier, Marc; Cortés, Toni (Institute of Electrical and Electronics Engineers (IEEE), 2018)
    Conference report
    Open Access
    This paper analyses how OpenStack Swift, a distributed object storage service for a globally used middleware, interacts with the I/O subsystem through the Operating System. This interaction, which seems organised and clean ...
  • Efficient exception handling support for GPUs 

    Tanasic, Ivan; Gelado Fernandez, Isaac; Jorda, Marc; Ayguadé Parra, Eduard; Navarro, Nacho (Association for Computing Machinery (ACM), 2017)
    Conference report
    Restricted access - publisher's policy
    Operating systems have long relied on the exception handling mechanism to implement numerous virtual memory features and optimizations. However, today's GPUs have a limited support for exceptions, which prevents implementation ...
  • Extending OmpSs for OpenCL kernel co-execution in heterogeneous systems 

    Pérez, Borja; Stafford, Esteban; Bosque, Jose L.; Beivide Palacio, Ramon; Mateo, Sergi; Teruel, Xavier; Martorell Bofill, Xavier; Ayguadé Parra, Eduard (Institute of Electrical and Electronics Engineers (IEEE), 2017)
    Conference report
    Open Access
    Heterogeneous systems have a very high potential performance but present difficulties in their programming. OmpSs is a well known framework for task based parallel applications, which is an interesting tool to simplify the ...
  • Gestión de contenidos en caches operando a bajo voltaje 

    Ferrerón, Alexandra; Alastruey, Jesús; Suárez Gracía, Dario; Monreal Arnal, Teresa; Ibáñez Marín, Pablo Enrique; Viñals Yúfera, Víctor (2016)
    Conference report
    Open Access
    La eficiencia energética de las caches en chip puede mejorarse reduciendo su voltaje de alimentación (Vdd ). Sin embargo, este escalado de Vdd está limitado a una tensión Vddmin por debajo de la cual algunas celdas SRAM ...

View more