En aquest grup s´investiga en tècniques que permeten millorar l´eficiència dels sistemes de computació d?altes prestacions. Aquest objectiu es tracta des de perspectives diverses que requereixen un cert grau de cooperació: arquitectura del sistema uniprocessador i multiprocessador, compilador, sistema operatiu, eines d´anàlisi, visualització i predicció, algorismes i aplicacions. Per mesurar l´eficiència es consideren mètriques que van més enllà del temps d´execució dels programes. En particular es consideren aspectes relacionats amb el disseny del sistema (cicle d´operació, àrea i consum de potència del processador i la jerarquia de memòria, escalabilitat de l´organització uniprocessador i multiprocessador), amb la verificació funcional dels sistemes, amb la facilitat i la portabilitat del model de programació i amb el rendiment en entorns multiprogramats i distribuïts, entre altres.

http://futur.upc.edu/CAP

The group aims to improve the efficiency of high-performance computing systems. To that end, it employs a variety of approaches that require a certain level of cooperation and integration: microarchitecture and multiprocessor architecture, compilers, operating systems, analysis, visualisation and prediction tools, algorithms and applications. When measuring efficiency, in addition to the traditional approach that takes the execution time into account, we use metrics that consider design factors such as cycle time, area and power dissipation of the processor and memory hierarchy, scalability of the microarchitecture and multiprocessor organisation, system correctness, portability and ease of use of programming models, and performance when running on multiuser, multiprogrammed and distributed environments, among others.

http://futur.upc.edu/CAP

The group aims to improve the efficiency of high-performance computing systems. To that end, it employs a variety of approaches that require a certain level of cooperation and integration: microarchitecture and multiprocessor architecture, compilers, operating systems, analysis, visualisation and prediction tools, algorithms and applications. When measuring efficiency, in addition to the traditional approach that takes the execution time into account, we use metrics that consider design factors such as cycle time, area and power dissipation of the processor and memory hierarchy, scalability of the microarchitecture and multiprocessor organisation, system correctness, portability and ease of use of programming models, and performance when running on multiuser, multiprogrammed and distributed environments, among others.

http://futur.upc.edu/CAP

Enviaments recents

  • Reducing cache coherence traffic with a NUMA-aware runtime approach 

    Caheny, Paul; Alvarez, Lluc; Derradji, Said; Valero Cortés, Mateo; Moreto Planas, Miquel; Casas Guix, Marc (2018-05)
    Article
    Accés obert
    Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provide for scaling core count and memory capacity. Also, the flat memory address space they offer considerably improves ...
  • Architectural support for task dependence management with flexible software scheduling 

    Castillo, Emilio; Álvarez, Lluc; Moreto Planas, Miquel; Casas, Marc; Vallejo, Enrique; Bosque, Jose L.; Beivide Palacio, Ramon; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2018)
    Text en actes de congrés
    Accés obert
    The growing complexity of multi-core architectures has motivated a wide range of software mechanisms to improve the orchestration of parallel executions. Task parallelism has become a very attractive approach thanks to its ...
  • Vector processing-aware advanced clock-gating techniques for low-power fused multiply-add 

    Ratkovic, Ivan; Palomar Pérez, Óscar; Stanic, Milan; Unsal, Osman Sabri; Cristal Kestelman, Adrián; Valero Cortés, Mateo (2018-04-04)
    Article
    Accés obert
    The need for power efficiency is driving a rethink of design decisions in processor architectures. While vector processors succeeded in the high-performance market in the past, they need a retailoring for the mobile market ...
  • Improving OpenStack Swift interaction with the I/O stack to enable software defined storage 

    Nou, Ramon; Miranda, Alberto; Siquier, Marc; Cortés, Toni (Institute of Electrical and Electronics Engineers (IEEE), 2018)
    Text en actes de congrés
    Accés obert
    This paper analyses how OpenStack Swift, a distributed object storage service for a globally used middleware, interacts with the I/O subsystem through the Operating System. This interaction, which seems organised and clean ...
  • Efficient exception handling support for GPUs 

    Tanasic, Ivan; Gelado Fernandez, Isaac; Jorda, Marc; Ayguadé Parra, Eduard; Navarro, Nacho (Association for Computing Machinery (ACM), 2017)
    Text en actes de congrés
    Accés restringit per política de l'editorial
    Operating systems have long relied on the exception handling mechanism to implement numerous virtual memory features and optimizations. However, today's GPUs have a limited support for exceptions, which prevents implementation ...
  • A proposal to develop and assess professional skills in Engineering Final Year Projects 

    Sánchez Carracedo, Fermín; Climent Vilaró, Joan; Corbalán González, Julita; Fonseca Casas, Pau; García Almiñana, Jordi; Herrero Zaragoza, José Ramón; Rodríguez Hontoria, Horacio; Sancho Samsó, María Ribera (Tempus Publications, 2018-03)
    Article
    Accés restringit per política de l'editorial
    In this paper we discuss the result of piloting a methodology for Engineering Final Year Projects (FYP) assessment that takes into consideration professional skills acquisition. The FYP is structured around three milestones; ...
  • On the behavior of convolutional nets for feature extraction 

    Garcia-Gasulla, Dario; Pares, Ferran; Vilalta, Armand; Moreno, Jonatan; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Cortés García, Claudio Ulises; Suzumura, Toyotaro (2018-03)
    Article
    Accés obert
    Deep neural networks are representation learning techniques. During training, a deep net is capable of generating a descriptive language of unprecedented size and detail in machine learning. Extracting the descriptive ...
  • Extending OmpSs for OpenCL kernel co-execution in heterogeneous systems 

    Pérez, Borja; Stafford, Esteban; Bosque, Jose L.; Beivide Palacio, Ramon; Mateo, Sergi; Teruel, Xavier; Martorell Bofill, Xavier; Ayguadé Parra, Eduard (Institute of Electrical and Electronics Engineers (IEEE), 2017)
    Text en actes de congrés
    Accés obert
    Heterogeneous systems have a very high potential performance but present difficulties in their programming. OmpSs is a well known framework for task based parallel applications, which is an interesting tool to simplify the ...
  • Las mentiras del EEES 

    Sánchez Carracedo, Fermín (Asociación de Enseñantes Universitarios de la Informática (AENUI), 2018-01)
    Article
    Accés obert
    En este artículo se presenta el punto de vista del autor sobre cómo se han implantado los planes de estudio del EEES en España y algunas de las cosas que, en su opinión, no se han hecho bien. El EEES despertó muchas ...
  • A path-level exact parallelization strategy for sequential simulation 

    Peredo, Oscar; Baeza, Daniel; Ortiz, Julian; Herrero Zaragoza, José Ramón (2018-01-01)
    Article
    Accés restringit per política de l'editorial
    Sequential Simulation is a well known method in geostatistical modelling. Following the Bayesian approach for simulation of conditionally dependent random events, Sequential Indicator Simulation (SIS) method draws simulated ...
  • Gestión de contenidos en caches operando a bajo voltaje 

    Ferrerón, Alexandra; Alastruey, Jesús; Suárez Gracía, Dario; Monreal Arnal, Teresa; Ibáñez Marín, Pablo Enrique; Viñals Yúfera, Víctor (2016)
    Text en actes de congrés
    Accés obert
    La eficiencia energética de las caches en chip puede mejorarse reduciendo su voltaje de alimentación (Vdd ). Sin embargo, este escalado de Vdd está limitado a una tensión Vddmin por debajo de la cual algunas celdas SRAM ...
  • Selección de contenidos basada en reuso para caches compartidas en exclusión 

    Díaz Maag, Javier; Monreal Arnal, Teresa; Viñals Yúfera, Víctor; Ibáñez Marín, Pablo Enrique; Llaberia Griño, José María (2015)
    Text en actes de congrés
    Accés obert
    Publicaciones previas revelan que el flujo de referencias que llega a la cache compartida (SLLC) de un chip multiprocesador muestra poca localidad temporal. Sin embargo, muestra localidad de reuso, es decir, los bloques ...

Mostra'n més