Now showing items 1-20 of 344

    • A block algorithm for the algebraic path problem and its execution on a systolic array 

      Núñez, Fernando J.; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 1989)
      Conference report
      Open Access
      The solution of the algebraic path problem (APP) for arbitrarily sized graphs by a fixed-size systolic array processor (SAP) is addressed. The APP is decomposed into two subproblems, and SAP is designed for each one. Both ...
    • A case for merging the ILP and DLP paradigms 

      Quintana Rodríguez, Francisca; Espasa Sans, Roger; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 1998)
      Conference report
      Open Access
      The goal of this paper is to show that instruction level parallelism (ILP) and data-level parallelism (DLP) can be merged in a single architecture to execute vectorizable code at a performance level that can not be achieved ...
    • A case for resource-conscious out-of-order processors 

      Cristal Kestelman, Adrián; Martínez, José F; Llosa Espuny, José Francisco; Valero Cortés, Mateo (2003-12)
      Article
      Open Access
      Modern out-of-order processors tolerate long-latency memory operations by supporting a large number of in-flight instructions. This is achieved in part through proper sizing of critical resources, such as register files ...
    • A complexity-effective simultaneous multithreading architecture 

      Acosta Ojeda, Carmelo Alexis; Falcón Samper, Ayose Jesús; Ramírez Bellido, Alejandro; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2005)
      Conference report
      Open Access
      Different applications may exhibit radically different behaviors and thus have very different requirements in terms of hardware support. In simultaneous multithreading (SMT) architectures, the hardware is shared among ...
    • A conflict-free memory banking architecture for fast VOQ packet buffers 

      García Vidal, Jorge; Cerdà Alabern, Llorenç; Corbal San Adrián, Jesús; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2003)
      Conference report
      Open Access
      In order to support the enormous growth of the Internet, innovative research in every router subsystem is needed. We focus our attention on packet buffer design for routers supporting high-speed line rates. More specifically, ...
    • A content aware integer register file organization 

      González García, Rubén; Cristal Kestelman, Adrián; Ortega Fernández, Daniel; Veidenbaum, Alex; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2004)
      Conference report
      Open Access
      A register file is a critical component of a modern superscalar processor. It has a large number of entries and read/write ports in order to enable high levels of instruction parallelism. As a result, the register file's ...
    • A decoupled KILO-instruction processor 

      Pericàs Gleim, Miquel; Cristal Kestelman, Adrián; González García, Rubén; Jiménez, Daniel A.; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2006)
      Conference report
      Open Access
      Building processors with large instruction windows has been proposed as a mechanism for overcoming the memory wall, but finding a feasible and implementable design has been an elusive goal. Traditional processors are ...
    • A discrete optimization problem in local networks and data alignment 

      Fiol Mora, Miquel Àngel; Andrés Yebra, José Luis; Alegre de Miguel, Ignacio; Valero Cortés, Mateo (1987-06)
      Article
      Restricted access - publisher's policy
      This paper presents the solution of the following optimization problem that appears in the design of double-loop structures for local networks and also in data memory, allocation and data alignment in SIMD processors. Consider ...
    • A distributed processor state management architecture for large-window processors 

      González, Isidro; Galluzzi, Marco; Veidenbaum, Alexander V.; Ramírez, Marco Antonio; Cristal Kestelman, Adrián; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2008)
      Conference report
      Open Access
      Processor architectures with large instruction windows have been proposed to expose more instruction-level parallelism (ILP) and increase performance. Some of the proposed architectures replace a re-order buffer (ROB) with ...
    • A DRAM/SRAM memory scheme for fast packet buffers 

      García Vidal, Jorge; March, Maribel; Cerdà Alabern, Llorenç; Corbal San Adrián, Jesús; Valero Cortés, Mateo (2006-05)
      Article
      Open Access
      We address the design of high-speed packet buffers for Internet routers. We use a general DRAM/SRAM architecture for which previous proposals can be seen as particular cases. For this architecture, large SRAMs are needed ...
    • A dynamic scheduler for balancing HPC applications 

      Boneti, Carlos; Gioiosa, Roberto; Cazorla, Francisco; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2008)
      Conference report
      Open Access
      Load imbalance cause significant performance degradation in High Performance Computing applications. In our previous work we showed that load imbalance can be alleviated by modern MT processors that provide mechanisms for ...
    • A flexible heterogeneous multi-core architecture 

      Pericàs Gleim, Miquel; Cristal Kestelman, Adrián; Cazorla, Francisco; González García, Rubén; Jiménez, Daniel A.; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2007)
      Conference report
      Open Access
      Multi-core processors naturally exploit thread-level parallelism (TLP). However, extracting instruction-level parallelism (ILP) from individual applications or threads is still a challenge as application mixes in this ...
    • A fully parameterizable low power design of vector fused multiply-add using active clock-gating techniques 

      Ratkovic, Ivan; Palomar, Oscar; Stanic, Milan; Unsal, Osman Sabri; Cristal Kestelman, Adrián; Valero Cortés, Mateo (Association for Computing Machinery (ACM), 2016)
      Conference report
      Restricted access - publisher's policy
      The need for power-efficiency is driving a rethink of design decisions in processor architectures. While vector processors succeeded in the high-performance market in the past, they need a re-tailoring for the mobile market ...
    • A general guide to applying machine learning to computer architecture 

      Nemirovsky, Daniel; Arkose, Tugberk; Markovic, Nikola; Nemirovsky, Mario; Unsal, Osman Sabri; Cristal Kestelman, Adrián; Valero Cortés, Mateo (2018)
      Article
      Open Access
      The resurgence of machine learning since the late 1990s has been enabled by significant advances in computing performance and the growth of big data. The ability of these algorithms to detect complex patterns in data which ...
    • A hardware runtime for task-based programming models 

      Tan, Xubin; Bosch, Jaume; Álvarez, Carlos; Jiménez González, Daniel; Ayguadé Parra, Eduard; Valero Cortés, Mateo (2019-09-01)
      Article
      Open Access
      Task-based programming models such as OpenMP 5.0 and OmpSs are simple to use and powerful enough to exploit task parallelism of applications over multicore, manycore and heterogeneous systems. However, their software-only ...
    • A highly scalable parallel implementation of H.264 

      Azevedo, Arnaldo; Juurlink, Ben; Meenderinck, Cor; Terechko, Andrei; Hoogerbrugge, Jan; Álvarez Mesa, Mauricio; Ramírez Bellido, Alejandro; Valero Cortés, Mateo (2011)
      Article
      Open Access
      Developing parallel applications that can harness and efficiently use future many-core architectures is the key challenge for scalable computing systems. We contribute to this challenge by presenting a parallel implementation ...
    • A low-complexity, high-performance fetch unit for simultaneous multithreading processors 

      Falcón Samper, Ayose Jesús; Ramírez Bellido, Alejandro; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2004)
      Conference report
      Open Access
      Simultaneous multithreading (SMT) is an architectural technique that allows for the parallel execution of several threads simultaneously. Fetch performance has been identified as the most important bottleneck for SMT ...
    • A new pointer-based instruction queue design and its power-performance evaluation 

      Ramírez, Marco A; Cristal Kestelman, Adrián; Veidenbaum, Alexander V; Villa, Luis; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2005)
      Conference report
      Open Access
      Instruction queues consume a significant amount of power in a high-performance processor. The wakeup logic delay is also a critical timing parameter. This paper compares a commonly used CAM-based instruction queue organization ...
    • A novel architecture for large windows processors 

      González, Isidro; Galluzzi, Marco; Veidenbaum, Alex; Ramírez, Marco Antonio; Cristal Kestelman, Adrián; Valero Cortés, Mateo (2007-11)
      External research report
      Open Access
      Several processor architectures with large instruction windows have been proposed. They improve performance by maintaining hundreds of instructions in flight to increase the level of instruction parallelism (ILP). Such ...
    • A performance characterization of high definition digital video decoding using H.264/AVC 

      Álvarez Mesa, Mauricio; Salamí San Juan, Esther; Ramírez Bellido, Alejandro; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2005)
      Conference report
      Open Access
      H.264/AVC is a new international video coding standard that provides higher coding efficiency with respect to previous standards at the expense of a higher computational complexity. The complexity is even higher when ...