• A block algorithm for the algebraic path problem and its execution on a systolic array 

    Núñez, Fernando J.; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 1989)
    Texto en actas de congreso
    Acceso abierto
    The solution of the algebraic path problem (APP) for arbitrarily sized graphs by a fixed-size systolic array processor (SAP) is addressed. The APP is decomposed into two subproblems, and SAP is designed for each one. Both ...
  • A case for merging the ILP and DLP paradigms 

    Quintana Rodríguez, Francisca; Espasa Sans, Roger; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 1998)
    Texto en actas de congreso
    Acceso abierto
    The goal of this paper is to show that instruction level parallelism (ILP) and data-level parallelism (DLP) can be merged in a single architecture to execute vectorizable code at a performance level that can not be achieved ...
  • A case for resource-conscious out-of-order processors 

    Cristal Kestelman, Adrián; Martínez, José F; Llosa Espuny, José Francisco; Valero Cortés, Mateo (2003-12)
    Artículo
    Acceso abierto
    Modern out-of-order processors tolerate long-latency memory operations by supporting a large number of in-flight instructions. This is achieved in part through proper sizing of critical resources, such as register files ...
  • Access to streams in multiprocessor systems 

    Valero Cortés, Mateo; Peirón Guardia, Montse; Ayguadé Parra, Eduard (Institute of Electrical and Electronics Engineers (IEEE), 1993)
    Texto en actas de congreso
    Acceso abierto
    When accessing streams in vector multiprocessor machines, degradation in the interconnection network and conflicts in the memory modules are the factors that reduce the efficiency of the system. In this paper, we present ...
  • Access to vectors in multi-module memories 

    Valero Cortés, Mateo; Peiron Guàrdia, Montse; Ayguadé Parra, Eduard (Institute of Electrical and Electronics Engineers (IEEE), 1994)
    Texto en actas de congreso
    Acceso abierto
    The poor bandwidth obtained from memory when conflicts arise in the modules or in the interconnection network degrades the performance of computers. Address transformation schemes, such as interleaving, skewing and linear ...
  • A complexity-effective simultaneous multithreading architecture 

    Acosta Ojeda, Carmelo Alexis; Falcón Samper, Ayose Jesús; Ramírez Bellido, Alejandro; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2005)
    Texto en actas de congreso
    Acceso abierto
    Different applications may exhibit radically different behaviors and thus have very different requirements in terms of hardware support. In simultaneous multithreading (SMT) architectures, the hardware is shared among ...
  • A conflict-free memory banking architecture for fast VOQ packet buffers 

    García Vidal, Jorge; Cerdà Alabern, Llorenç; Corbal San Adrián, Jesús; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2003)
    Texto en actas de congreso
    Acceso abierto
    In order to support the enormous growth of the Internet, innovative research in every router subsystem is needed. We focus our attention on packet buffer design for routers supporting high-speed line rates. More specifically, ...
  • A content aware integer register file organization 

    González García, Rubén; Cristal Kestelman, Adrián; Ortega Fernández, Daniel; Veidenbaum, Alex; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2004)
    Texto en actas de congreso
    Acceso abierto
    A register file is a critical component of a modern superscalar processor. It has a large number of entries and read/write ports in order to enable high levels of instruction parallelism. As a result, the register file's ...
  • ADAM : an efficient data management mechanism for hybrid high and ultra-low voltage operation caches 

    Maric, Bojan; Abella Ferrer, Jaume; Valero Cortés, Mateo (2012)
    Texto en actas de congreso
    Acceso restringido por política de la editorial
    Semiconductor technology evolution enables the design of ultra-low-cost chips (e.g., below 1 USD) required for new market segments such as environment, urban life and body monitoring, etc. Recently, hybrid-operation (high ...
  • Adapting cache partitioning algorithms to pseudo-LRU replacement policies 

    Kedzierski, Kamil; Moreto Planas, Miquel; Cazorla, Francisco; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2010)
    Texto en actas de congreso
    Acceso abierto
    Recent studies have shown that cache partitioning is an efficient technique to improve throughput, fairness and Quality of Service (QoS) in CMP processors. The cache partitioning algorithms proposed so far assume Least ...
  • Adaptive and application dependent runtime guided hardware prefetcher reconfiguration on the IBM Power7 

    Prat Robles, David; Ortega, Cristobal; Casas Guix, Marc; Moretó Planas, Miquel; Valero Cortés, Mateo (2015)
    Texto en actas de congreso
    Acceso abierto
  • A decoupled KILO-instruction processor 

    Pericàs Gleim, Miquel; Cristal Kestelman, Adrián; González García, Rubén; Jiménez, Daniel A.; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2006)
    Texto en actas de congreso
    Acceso abierto
    Building processors with large instruction windows has been proposed as a mechanism for overcoming the memory wall, but finding a feasible and implementable design has been an elusive goal. Traditional processors are ...
  • A discrete optimization problem in local networks and data alignment 

    Fiol Mora, Miquel Àngel; Andrés Yebra, José Luis; Alegre de Miguel, Ignacio; Valero Cortés, Mateo (1987-06)
    Artículo
    Acceso restringido por política de la editorial
    This paper presents the solution of the following optimization problem that appears in the design of double-loop structures for local networks and also in data memory, allocation and data alignment in SIMD processors. Consider ...
  • A distributed processor state management architecture for large-window processors 

    González, Isidro; Galluzzi, Marco; Veidenbaum, Alexander V.; Ramírez, Marco Antonio; Cristal Kestelman, Adrián; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2008)
    Texto en actas de congreso
    Acceso abierto
    Processor architectures with large instruction windows have been proposed to expose more instruction-level parallelism (ILP) and increase performance. Some of the proposed architectures replace a re-order buffer (ROB) with ...
  • A DRAM/SRAM memory scheme for fast packet buffers 

    García Vidal, Jorge; March, Maribel; Cerdà Alabern, Llorenç; Corbal San Adrián, Jesús; Valero Cortés, Mateo (2006-05)
    Artículo
    Acceso abierto
    We address the design of high-speed packet buffers for Internet routers. We use a general DRAM/SRAM architecture for which previous proposals can be seen as particular cases. For this architecture, large SRAMs are needed ...
  • Advanced pattern based memory controller for FPGA based HPC applications 

    Hussain, Tassadaq; Palomar Pérez, Óscar; Unsal, Osman Sabri; Cristal Kestelman, Adrián; Ayguadé Parra, Eduard; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2014)
    Texto en actas de congreso
    Acceso restringido por política de la editorial
    The ever-increasing complexity of high-performance computing applications limits performance due to memory constraints in FPGAs. To address this issue, we propose the Advanced Pattern based Memory Controller (APMC), which ...
  • A dynamic scheduler for balancing HPC applications 

    Boneti, Carlos; Gioiosa, Roberto; Cazorla, Francisco; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2008)
    Texto en actas de congreso
    Acceso abierto
    Load imbalance cause significant performance degradation in High Performance Computing applications. In our previous work we showed that load imbalance can be alleviated by modern MT processors that provide mechanisms for ...
  • A flexible heterogeneous multi-core architecture 

    Pericàs Gleim, Miquel; Cristal Kestelman, Adrián; Cazorla, Francisco; González García, Rubén; Jiménez, Daniel A.; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2007)
    Texto en actas de congreso
    Acceso abierto
    Multi-core processors naturally exploit thread-level parallelism (TLP). However, extracting instruction-level parallelism (ILP) from individual applications or threads is still a challenge as application mixes in this ...
  • A fully parameterizable low power design of vector fused multiply-add using active clock-gating techniques 

    Ratkovic, Ivan; Palomar, Oscar; Stanic, Milan; Unsal, Osman Sabri; Cristal Kestelman, Adrián; Valero Cortés, Mateo (Association for Computing Machinery (ACM), 2016)
    Texto en actas de congreso
    Acceso restringido por política de la editorial
    The need for power-efficiency is driving a rethink of design decisions in processor architectures. While vector processors succeeded in the high-performance market in the past, they need a re-tailoring for the mobile market ...
  • A general guide to applying machine learning to computer architecture 

    Nemirovsky, Daniel; Arkose, Tugberk; Markovic, Nikola; Nemirovsky, Mario; Unsal, Osman Sabri; Cristal Kestelman, Adrián; Valero Cortés, Mateo (2018)
    Artículo
    Acceso abierto
    The resurgence of machine learning since the late 1990s has been enabled by significant advances in computing performance and the growth of big data. The ability of these algorithms to detect complex patterns in data which ...