Now showing items 1-20 of 262

  • A block algorithm for the algebraic path problem and its execution on a systolic array 

    Núñez, Fernando J.; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 1989)
    Conference report
    Open Access
    The solution of the algebraic path problem (APP) for arbitrarily sized graphs by a fixed-size systolic array processor (SAP) is addressed. The APP is decomposed into two subproblems, and SAP is designed for each one. Both ...
  • A case for merging the ILP and DLP paradigms 

    Quintana Rodríguez, Francisca; Espasa Sans, Roger; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 1998)
    Conference report
    Open Access
    The goal of this paper is to show that instruction level parallelism (ILP) and data-level parallelism (DLP) can be merged in a single architecture to execute vectorizable code at a performance level that can not be achieved ...
  • A case for resource-conscious out-of-order processors 

    Cristal Kestelman, Adrián; Martínez, José F; Llosa Espuny, José Francisco; Valero Cortés, Mateo (2003-12)
    Article
    Open Access
    Modern out-of-order processors tolerate long-latency memory operations by supporting a large number of in-flight instructions. This is achieved in part through proper sizing of critical resources, such as register files ...
  • A discrete optimization problem in local networks and data alignment 

    Fiol Mora, Miquel Àngel; Andrés Yebra, José Luis; Alegre de Miguel, Ignacio; Valero Cortés, Mateo (1987-06)
    Article
    Restricted access - publisher's policy
    This paper presents the solution of the following optimization problem that appears in the design of double-loop structures for local networks and also in data memory, allocation and data alignment in SIMD processors. Consider ...
  • A distributed processor state management architecture for large-window processors 

    González, Isidro; Galluzzi, Marco; Veidenbaum, Alexander V.; Ramírez, Marco Antonio; Cristal Kestelman, Adrián; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2008)
    Conference report
    Open Access
    Processor architectures with large instruction windows have been proposed to expose more instruction-level parallelism (ILP) and increase performance. Some of the proposed architectures replace a re-order buffer (ROB) with ...
  • A DRAM/SRAM memory scheme for fast packet buffers 

    García Vidal, Jorge; March, Maribel; Cerdà Alabern, Llorenç; Corbal San Adrián, Jesús; Valero Cortés, Mateo (2006-05)
    Article
    Open Access
    We address the design of high-speed packet buffers for Internet routers. We use a general DRAM/SRAM architecture for which previous proposals can be seen as particular cases. For this architecture, large SRAMs are needed ...
  • A dynamic scheduler for balancing HPC applications 

    Boneti, Carlos; Gioiosa, Roberto; Cazorla, Francisco; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2008)
    Conference report
    Open Access
    Load imbalance cause significant performance degradation in High Performance Computing applications. In our previous work we showed that load imbalance can be alleviated by modern MT processors that provide mechanisms for ...
  • A fully parameterizable low power design of vector fused multiply-add using active clock-gating techniques 

    Ratkovic, Ivan; Palomar, Oscar; Stanic, Milan; Unsal, Osman Sabri; Cristal Kestelman, Adrián; Valero Cortés, Mateo (Association for Computing Machinery (ACM), 2016)
    Conference report
    Restricted access - publisher's policy
    The need for power-efficiency is driving a rethink of design decisions in processor architectures. While vector processors succeeded in the high-performance market in the past, they need a re-tailoring for the mobile market ...
  • A highly scalable parallel implementation of H.264 

    Azevedo, Arnaldo; Juurlink, Ben; Meenderinck, Cor; Terechko, Andrei; Hoogerbrugge, Jan; Álvarez Mesa, Mauricio; Ramírez Bellido, Alejandro; Valero Cortés, Mateo (2011)
    Article
    Open Access
    Developing parallel applications that can harness and efficiently use future many-core architectures is the key challenge for scalable computing systems. We contribute to this challenge by presenting a parallel implementation ...
  • A new pointer-based instruction queue design and its power-performance evaluation 

    Ramírez, Marco A; Cristal Kestelman, Adrián; Veidenbaum, Alexander V; Villa, Luis; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2005)
    Conference report
    Open Access
    Instruction queues consume a significant amount of power in a high-performance processor. The wakeup logic delay is also a critical timing parameter. This paper compares a commonly used CAM-based instruction queue organization ...
  • A novel architecture for large windows processors 

    González, Isidro; Galluzzi, Marco; Veidenbaum, Alex; Ramírez, Marco Antonio; Cristal Kestelman, Adrián; Valero Cortés, Mateo (2007-11)
    External research report
    Open Access
    Several processor architectures with large instruction windows have been proposed. They improve performance by maintaining hundreds of instructions in flight to increase the level of instruction parallelism (ILP). Such ...
  • A simulation framework to automatically analyze the communication-computation overlap in scientific applications 

    Subotic, Vladimir; Sancho, Jose Carlos; Labarta Mancho, Jesús José; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2010)
    Conference report
    Open Access
    Overlapping communication and computation has been devised as an attractive technique to alleviate the huge application's network requirements at large scale. Overlapping will allow to fully or partially hide the long ...
  • A systolic algorithm for the fast computation of the connected components of a graph 

    Núñez, Fernando J.; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 1988)
    Conference report
    Open Access
    The authors consider the description of a systolic algorithm to solve the connected-component problem. It is executed in a ring topology with N processors, requiring O(Nlog N) time without regard to the graph's sparsity. ...
  • A two level load/store queue based on execution locality 

    Pericàs Gleim, Miquel; Cristal Kestelman, Adrián; Cazorla, Francisco; González García, Rubén; Veidenbaum, Alexander V; Jiménez, Daniel A.; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2008)
    Conference report
    Open Access
    Multicore processors have emerged as a powerful platform on which to efficiently exploit thread-level parallelism (TLP). However, due to Amdahl’s Law, such designs will be increasingly limited by the remaining sequential ...
  • Access to streams in multiprocessor systems 

    Valero Cortés, Mateo; Peirón Guardia, Montse; Ayguadé Parra, Eduard (Institute of Electrical and Electronics Engineers (IEEE), 1993)
    Conference report
    Open Access
    When accessing streams in vector multiprocessor machines, degradation in the interconnection network and conflicts in the memory modules are the factors that reduce the efficiency of the system. In this paper, we present ...
  • Access to vectors in multi-module memories 

    Valero Cortés, Mateo; Peiron Guàrdia, Montse; Ayguadé Parra, Eduard (Institute of Electrical and Electronics Engineers (IEEE), 1994)
    Conference report
    Open Access
    The poor bandwidth obtained from memory when conflicts arise in the modules or in the interconnection network degrades the performance of computers. Address transformation schemes, such as interleaving, skewing and linear ...
  • ADAM : an efficient data management mechanism for hybrid high and ultra-low voltage operation caches 

    Maric, Bojan; Abella Ferrer, Jaume; Valero Cortés, Mateo (2012)
    Conference report
    Restricted access - publisher's policy
    Semiconductor technology evolution enables the design of ultra-low-cost chips (e.g., below 1 USD) required for new market segments such as environment, urban life and body monitoring, etc. Recently, hybrid-operation (high ...
  • Adapting cache partitioning algorithms to pseudo-LRU replacement policies 

    Kedzierski, Kamil; Moreto Planas, Miquel; Cazorla, Francisco; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2010)
    Conference report
    Open Access
    Recent studies have shown that cache partitioning is an efficient technique to improve throughput, fairness and Quality of Service (QoS) in CMP processors. The cache partitioning algorithms proposed so far assume Least ...
  • Adaptive and application dependent runtime guided hardware prefetcher reconfiguration on the IBM Power7 

    Prat Robles, David; Ortega, Cristobal; Casas Guix, Marc; Moretó Planas, Miquel; Valero Cortés, Mateo (2015)
    Conference report
    Open Access
  • Advanced pattern based memory controller for FPGA based HPC applications 

    Hussain, Tassadaq; Palomar Pérez, Óscar; Unsal, Osman Sabri; Cristal Kestelman, Adrián; Ayguadé Parra, Eduard; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2014)
    Conference report
    Restricted access - publisher's policy
    The ever-increasing complexity of high-performance computing applications limits performance due to memory constraints in FPGAs. To address this issue, we propose the Advanced Pattern based Memory Controller (APMC), which ...