En aquest grup s´investiga en tècniques que permeten millorar l´eficiència dels sistemes de computació d?altes prestacions. Aquest objectiu es tracta des de perspectives diverses que requereixen un cert grau de cooperació: arquitectura del sistema uniprocessador i multiprocessador, compilador, sistema operatiu, eines d´anàlisi, visualització i predicció, algorismes i aplicacions. Per mesurar l´eficiència es consideren mètriques que van més enllà del temps d´execució dels programes. En particular es consideren aspectes relacionats amb el disseny del sistema (cicle d´operació, àrea i consum de potència del processador i la jerarquia de memòria, escalabilitat de l´organització uniprocessador i multiprocessador), amb la verificació funcional dels sistemes, amb la facilitat i la portabilitat del model de programació i amb el rendiment en entorns multiprogramats i distribuïts, entre altres.

http://futur.upc.edu/CAP

The group aims to improve the efficiency of high-performance computing systems. To that end, it employs a variety of approaches that require a certain level of cooperation and integration: microarchitecture and multiprocessor architecture, compilers, operating systems, analysis, visualisation and prediction tools, algorithms and applications. When measuring efficiency, in addition to the traditional approach that takes the execution time into account, we use metrics that consider design factors such as cycle time, area and power dissipation of the processor and memory hierarchy, scalability of the microarchitecture and multiprocessor organisation, system correctness, portability and ease of use of programming models, and performance when running on multiuser, multiprogrammed and distributed environments, among others.

http://futur.upc.edu/CAP

The group aims to improve the efficiency of high-performance computing systems. To that end, it employs a variety of approaches that require a certain level of cooperation and integration: microarchitecture and multiprocessor architecture, compilers, operating systems, analysis, visualisation and prediction tools, algorithms and applications. When measuring efficiency, in addition to the traditional approach that takes the execution time into account, we use metrics that consider design factors such as cycle time, area and power dissipation of the processor and memory hierarchy, scalability of the microarchitecture and multiprocessor organisation, system correctness, portability and ease of use of programming models, and performance when running on multiuser, multiprogrammed and distributed environments, among others.

http://futur.upc.edu/CAP

Recent Submissions

  • Skip RNN: learning to skip state updates in recurrent neural networks 

    Campos Camunez, Victor; Jou, Brendan; Giró Nieto, Xavier; Torres Viñals, Jordi; Chang, Shih-Fu (2018)
    Conference lecture
    Open Access
    Recurrent Neural Networks (RNNs) continue to show outstanding performance in sequence modeling tasks. However, training RNNs on long sequences often face challenges like slow inference, vanishing gradients and difficulty ...
  • Asynchronous and exact forward recovery for detected errors in iterative solvers 

    Jaulmes, Luc Etienne; Casas, Marc; Moreto Planas, Miquel; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Valero Cortés, Mateo (2018-03-19)
    Article
    Open Access
    Current trends and projections show that faults in computer systems become increasingly common. Such errors may be detected, and possibly corrected transparently, e.g. by Error Correcting Codes (ECC). For a program to be ...
  • A template system for the efficient compilation of domain abstractions onto reconfigurable computers 

    Shafiq, Muhammad; Pericàs Gleim, Miquel; Ayguadé Parra, Eduard (2011)
    Conference report
    Restricted access - publisher's policy
    Past research has addressed the issue of using FPGAs as accelerators for HPC systems. However, writing low level code for an efficient, portable and scalable architecture altogether has been always a ...
  • Asynchronous PGAS runtime for Myrinet networks 

    Farreras Esclusa, Montserrat; Almasi, George (Association for Computing Machinery (ACM), 2010)
    Conference report
    Restricted access - publisher's policy
    PGAS languages aim to enhance productivity for large scale systems. The IBM Asynchronous PGAS runtime (APGAS) supports various high productivity programming languages including UPC, X10 and CAF. The runtime has been designed ...
  • Productive cluster programming with OmpSs 

    Bueno Hedo, Javier; Martinell, Lluis; Duran Gonzalez, Alejandro; Farreras Esclusa, Montserrat; Martorell Bofill, Xavier; Badia Sala, Rosa Maria; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José (Springer, 2011)
    Conference report
    Restricted access - publisher's policy
    Clusters of SMPs are ubiquitous. They have been traditionally programmed by using MPI. But, the productivity of MPI programmers is low because of the complexity of expressing parallelism and communication, and the difficulty ...
  • Runtime vs. manual data distribution for architecture-agnostic shared-memory programming models 

    Nikolopoulos, Dimitrios; Ayguadé Parra, Eduard; Polychronopoulos, C D (2002-08)
    Article
    Restricted access - publisher's policy
    This paper compares data distribution methodologies for scaling the performance of OpenMP on NUMA architectures. We investigate the performance of automatic page placement algorithms implemented in the operating system, ...
  • Static and dynamic locality optimizations using integer linear programming 

    Kandemir, M; Banerjee, P; Choudhary, A; Ramanujam, J; Ayguadé Parra, Eduard (2001-09)
    Article
    Open Access
    The delivered performance on modern processors that employ deep memory hierarchies is closely related to the performance of the memory subsystem. Compiler optimizations aimed at improving cache locality are critical in ...
  • Graph partitioning applied to DAG scheduling to reduce NUMA effects 

    Sánchez Barrera, Isaac; Casas, Marc; Moreto Planas, Miquel; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Valero Cortés, Mateo (Association for Computing Machinery (ACM), 2018)
    Conference lecture
    Restricted access - publisher's policy
    The complexity of shared memory systems is becoming more relevant as the number of memory domains increases, with different access latencies and bandwidth rates depending on the proximity between the cores and the devices ...
  • Tools and techniques for automatic data layout: a case study 

    Ayguadé Parra, Eduard; García Almiñana, Jordi; Kremer, Ulrich (1998-05)
    Article
    Restricted access - publisher's policy
    Parallel architectures with physically distributed memory providing computing cycles and large amounts of memory are becoming more and more common. To make such architectures truly usable, programming models and support ...
  • Saiph: towards a DSL for high-performance computational fluid dynamics 

    Macià, Sandra; Mateo, Sergi; Martínez-Ferrer, Pedro J.; Beltran, Vicenç; Mira, Daniel; Ayguadé Parra, Eduard (Association for Computing Machinery (ACM), 2018)
    Conference report
    Restricted access - publisher's policy
    Nowadays high-performance computing is taking an increasingly central role in scientific research while computer architectures are becoming more heterogeneous and complex with different parallel programming models and ...
  • A general guide to applying machine learning to computer architecture 

    Nemirovsky, Daniel; Arkose, Tugberk; Markovic, Nikola; Nemirovsky, Mario; Unsal, Osman Sabri; Cristal Kestelman, Adrián; Valero Cortés, Mateo (2018)
    Article
    Open Access
    The resurgence of machine learning since the late 1990s has been enabled by significant advances in computing performance and the growth of big data. The ability of these algorithms to detect complex patterns in data which ...
  • Reducing cache coherence traffic with a NUMA-aware runtime approach 

    Caheny, Paul; Alvarez, Lluc; Derradji, Said; Valero Cortés, Mateo; Moreto Planas, Miquel; Casas Guix, Marc (2018-05)
    Article
    Open Access
    Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provide for scaling core count and memory capacity. Also, the flat memory address space they offer considerably improves ...

View more