En aquest grup s´investiga en tècniques que permeten millorar l´eficiència dels sistemes de computació d?altes prestacions. Aquest objectiu es tracta des de perspectives diverses que requereixen un cert grau de cooperació: arquitectura del sistema uniprocessador i multiprocessador, compilador, sistema operatiu, eines d´anàlisi, visualització i predicció, algorismes i aplicacions. Per mesurar l´eficiència es consideren mètriques que van més enllà del temps d´execució dels programes. En particular es consideren aspectes relacionats amb el disseny del sistema (cicle d´operació, àrea i consum de potència del processador i la jerarquia de memòria, escalabilitat de l´organització uniprocessador i multiprocessador), amb la verificació funcional dels sistemes, amb la facilitat i la portabilitat del model de programació i amb el rendiment en entorns multiprogramats i distribuïts, entre altres.

The group aims to improve the efficiency of high-performance computing systems. To that end, it employs a variety of approaches that require a certain level of cooperation and integration: microarchitecture and multiprocessor architecture, compilers, operating systems, analysis, visualisation and prediction tools, algorithms and applications. When measuring efficiency, in addition to the traditional approach that takes the execution time into account, we use metrics that consider design factors such as cycle time, area and power dissipation of the processor and memory hierarchy, scalability of the microarchitecture and multiprocessor organisation, system correctness, portability and ease of use of programming models, and performance when running on multiuser, multiprogrammed and distributed environments, among others.

The group aims to improve the efficiency of high-performance computing systems. To that end, it employs a variety of approaches that require a certain level of cooperation and integration: microarchitecture and multiprocessor architecture, compilers, operating systems, analysis, visualisation and prediction tools, algorithms and applications. When measuring efficiency, in addition to the traditional approach that takes the execution time into account, we use metrics that consider design factors such as cycle time, area and power dissipation of the processor and memory hierarchy, scalability of the microarchitecture and multiprocessor organisation, system correctness, portability and ease of use of programming models, and performance when running on multiuser, multiprogrammed and distributed environments, among others.

Recent Submissions

  • Distributed training of deep neural networks with spark: The MareNostrum experience 

    Cruz, Leonel; Tous Liesa, Rubén; Otero Calviño, Beatriz (Elsevier, 2019-07-01)
    Article
    Restricted access - publisher's policy
    Deployment of a distributed deep learning technology stack on a large parallel system is a very complex process, involving the integration and configuration of several layers of both, general-purpose and custom software. ...
  • Using Arm’s scalable vector extension on stencil codes 

    Armejach Sanosa, Adrià; Caminal Pallarés, Helena; Cebrián González, Juan Manuel; Langarita, Rubén; González-Alberquilla, Rekai; Adeniyi-Jones, Chris; Valero Cortés, Mateo; Casas Guix, Marc; Moreto Planas, Miquel (2019-04-08)
    Article
    Restricted access - publisher's policy
    Data-level parallelism is frequently ignored or underutilized. Achieved through vector/SIMD capabilities, it can provide substantial performance improvements on top of widely used techniques such as thread-level parallelism. ...
  • Accelerating hyperparameter optimisation with PyCOMPSs 

    Kahira, Albert Njoroge; Bautista Gomez, Leonardo Arturo; Conejero, Javier; Badia Sala, Rosa Maria (Association for Computing Machinery (ACM), 2019)
    Conference report
    Open Access
    Machine Learning applications now span across multiple domains due to the increase in computational power of modern systems. There has been a recent surge in Machine Learning applications in High Performance Computing (HPC) ...
  • Task Packing: Efficient task scheduling in unbalanced parallel programs to maximize CPU utilization 

    Utrera Iglesias, Gladys Miriam; Farreras Esclusa, Montse; Fornés de Juan, Jordi (Elsevier, 2019-12)
    Article
    Restricted access - publisher's policy
    Load imbalance in parallel systems can be generated by external factors to the currently running applications like operating system noise or the underlying hardware like a heterogeneous cluster. HPC applications working ...
  • Holistic slowdown driven scheduling and resource management for malleable jobs 

    D'Amico, Marco; Jokanovic, Ana; Corbalán González, Julita (Association for Computing Machinery (ACM), 2019)
    Conference report
    Open Access
    In job scheduling, the concept of malleability has been explored since many years ago. Research shows that malleability improves system performance, but its utilization in HPC never became widespread. The causes are the ...
  • The cooperative parallel: A discussion about run-time schedulers for nested parallelism 

    Royuela, Sara; Serrano, Maria A.; García Gasulla, Marta; Mateo Bellido, Sergi; Labarta Mancho, Jesús José; Quiñones Moreno, Eduardo (Springer, 2019)
    Conference report
    Open Access
    Nested parallelism is a well-known parallelization strategy to exploit irregular parallelism in HPC applications. This strategy also fits in critical real-time embedded systems, composed of a set of concurrent functionalities. ...
  • Artificial neural networks as emerging tools for earthquake detection 

    Rojas, Otilio; Otero Calviño, Beatriz; Alvarado, Leonardo; Mus, Sergi; Tous Liesa, Rubén (2019)
    Article
    Open Access
    As seismic networks continue to spread and monitoring sensors become more ef¿cient, the abundance of data highly surpasses the processing capabilities of earthquake interpretation analysts. Earthquake catalogs are fundamental ...
  • Assembling a high-productivity DSL for computational fluid dynamics 

    Macià, Sandra; Martínez-Ferrer, Pedro J.; Mateo, Sergi; Beltran Querol, Vicenç; Ayguadé Parra, Eduard (Association for Computing Machinery (ACM), 2019)
    Conference report
    Open Access
    As we move towards exascale computing, an abstraction for effective parallel computation is increasingly needed to overcome the maintainability and portability of scientific applications while ensuring the efficient and ...
  • Wav2Pix: speech-conditioned face generation using generative adversarial networks 

    Cardoso Duarte, Amanda; Roldan, Francisco; Tubau, Miquel; Escur, Janna; Pascual de la Puente, Santiago; Salvador Aguilera, Amaia; Mohedano, Eva; McGuinness, Kevin; Torres Viñals, Jordi; Giró Nieto, Xavier (Institute of Electrical and Electronics Engineers (IEEE), 2019)
    Conference lecture
    Restricted access - publisher's policy
    Speech is a rich biometric signal that contains information about the identity, gender and emotional state of the speaker. In this work, we explore its potential to generate face images of a speaker by conditioning a ...
  • PROFET: modeling system performance and energy without simulating the CPU 

    Radulovic, Milan; Sánchez-Verdejo, Rommel; Carpenter, Paul Matthew; Radojkovic, Petar; Jacob, Bruce; Ayguadé Parra, Eduard (2019-06)
    Article
    Open Access
    The approaching end of DRAM scaling and expansion of emerging memory technologies is motivating a lot of research in future memory systems. Novel memory systems are typically explored by hardware simulators that are slow ...
  • Application Acceleration on FPGAs with OmpSs@FPGA 

    Bosch, Jaume; Tan, Xubin; Filgueras Izquierdo, Antonio; Vidal, Miquel; Mateu, Marc; Jiménez-González, Daniel; Álvarez, Carlos; Martorell Bofill, Xavier; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José (Institute of Electrical and Electronics Engineers (IEEE), 2019)
    Conference report
    Open Access
    OmpSs@FPGA is the flavor of OmpSs that allows offloading application functionality to FPGAs. Similarly to OpenMP, it is based on compiler directives. While the OpenMP specification also includes support for heterogeneous ...
  • Increasing the number of strides for conflict-free vector access 

    Valero Cortés, Mateo; Lang, Tomas; Llaberia Griñó, José M.; Peiron Guàrdia, Montse; Ayguadé Parra, Eduard; Navarro Guerrero, Juan José (1992-05)
    Article
    Open Access
    Address transformation schemes, such as skewing and linear transformations, have been proposed to achieve conflict-free vector access for some strides in vector processors with multi-module memories. In this paper, we ...

View more