Now showing items 1-20 of 93

  • A case for resource-conscious out-of-order processors 

    Cristal Kestelman, Adrián; Martínez, José F; Llosa Espuny, José Francisco; Valero Cortés, Mateo (2003-12)
    Article
    Open Access
    Modern out-of-order processors tolerate long-latency memory operations by supporting a large number of in-flight instructions. This is achieved in part through proper sizing of critical resources, such as register files ...
  • A case study of hybrid dataflow and shared-memory programming models: Dependency-based parallel game engine 

    Gajinov, Vladimir; Eric, Igor; Stojanovic, Saa; Milutinovic, Veljko; Unsal, Osman Sabri; Ayguadé Parra, Eduard; Cristal Kestelman, Adrián (Institute of Electrical and Electronics Engineers (IEEE), 2014)
    Conference report
    Restricted access - publisher's policy
    Recently proposed hybrid dataflow and shared memory programming models combine these two underlying models in order to support a wider range of problems naturally. The effectiveness of such hybrid models for parallel ...
  • A content aware integer register file organization 

    González García, Rubén; Cristal Kestelman, Adrián; Ortega Fernández, Daniel; Veidenbaum, Alex; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2004)
    Conference report
    Open Access
    A register file is a critical component of a modern superscalar processor. It has a large number of entries and read/write ports in order to enable high levels of instruction parallelism. As a result, the register file's ...
  • A decoupled KILO-instruction processor 

    Pericàs Gleim, Miquel; Cristal Kestelman, Adrián; González García, Rubén; Jiménez, Daniel A.; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2006)
    Conference report
    Open Access
    Building processors with large instruction windows has been proposed as a mechanism for overcoming the memory wall, but finding a feasible and implementable design has been an elusive goal. Traditional processors are ...
  • A distributed processor state management architecture for large-window processors 

    González, Isidro; Galluzzi, Marco; Veidenbaum, Alexander V.; Ramírez, Marco Antonio; Cristal Kestelman, Adrián; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2008)
    Conference report
    Open Access
    Processor architectures with large instruction windows have been proposed to expose more instruction-level parallelism (ILP) and increase performance. Some of the proposed architectures replace a re-order buffer (ROB) with ...
  • Advanced pattern based memory controller for FPGA based HPC applications 

    Hussain, Tassadaq; Palomar Pérez, Óscar; Unsal, Osman Sabri; Cristal Kestelman, Adrián; Ayguadé Parra, Eduard; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2014)
    Conference report
    Restricted access - publisher's policy
    The ever-increasing complexity of high-performance computing applications limits performance due to memory constraints in FPGAs. To address this issue, we propose the Advanced Pattern based Memory Controller (APMC), which ...
  • A flexible heterogeneous multi-core architecture 

    Pericàs Gleim, Miquel; Cristal Kestelman, Adrián; Cazorla, Francisco; González García, Rubén; Jiménez, Daniel A.; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2007)
    Conference report
    Open Access
    Multi-core processors naturally exploit thread-level parallelism (TLP). However, extracting instruction-level parallelism (ILP) from individual applications or threads is still a challenge as application mixes in this ...
  • A fully parameterizable low power design of vector fused multiply-add using active clock-gating techniques 

    Ratkovic, Ivan; Palomar, Oscar; Stanic, Milan; Unsal, Osman Sabri; Cristal Kestelman, Adrián; Valero Cortés, Mateo (Association for Computing Machinery (ACM), 2016)
    Conference report
    Restricted access - publisher's policy
    The need for power-efficiency is driving a rethink of design decisions in processor architectures. While vector processors succeeded in the high-performance market in the past, they need a re-tailoring for the mobile market ...
  • A general guide to applying machine learning to computer architecture 

    Nemirovsky, Daniel; Arkose, Tugberk; Markovic, Nikola; Nemirovsky, Mario; Unsal, Osman Sabri; Cristal Kestelman, Adrián; Valero Cortés, Mateo (2018)
    Article
    Open Access
    The resurgence of machine learning since the late 1990s has been enabled by significant advances in computing performance and the growth of big data. The ability of these algorithms to detect complex patterns in data which ...
  • AMMC: advance multi-core memory controller 

    Hussain, Tassadaq; Palomar Pérez, Óscar; Unsal, Osman Sabri; Cristal Kestelman, Adrián; Ayguadé Parra, Eduard; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2014)
    Conference lecture
    Open Access
    In this work, we propose an efficient scheduler and intelligent memory manager known as AMMC (Advanced Multi-Core Memory Controller), which proficiently handles data movement and computational tasks. The proposed AMMC ...
  • An empirical evaluation of High-Level Synthesis languages and tools for database acceleration 

    Arcas Abella, Oriol; Ndu, Geoffrey; Sönmez, Nehir; Ghasempour, Mohsen; Armejach, Adrià; Navaridas, Javier; Song, Wei; Mawer, John; Cristal Kestelman, Adrián; Lujan, Mikel (Institute of Electrical and Electronics Engineers (IEEE), 2014)
    Conference report
    Open Access
    High Level Synthesis (HLS) languages and tools are emerging as the most promising technique to make FPGAs more accessible to software developers. Nevertheless, picking the most suitable HLS for a certain class of algorithms ...
  • A new pointer-based instruction queue design and its power-performance evaluation 

    Ramírez, Marco A; Cristal Kestelman, Adrián; Veidenbaum, Alexander V; Villa, Luis; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2005)
    Conference report
    Open Access
    Instruction queues consume a significant amount of power in a high-performance processor. The wakeup logic delay is also a critical timing parameter. This paper compares a commonly used CAM-based instruction queue organization ...
  • An integrated vector-scalar design on an in-order ARM core 

    Stanic, Milan; Palomar Pérez, Óscar; Hayes, Timothy; Ratkovic, Ivan; Cristal Kestelman, Adrián; Unsal, Osman Sabri; Valero Cortés, Mateo (2017-07)
    Article
    Open Access
    In the low-end mobile processor market, power, energy, and area budgets are significantly lower than in the server/desktop/laptop/high-end mobile markets. It has been shown that vector processors are a highly energy-efficient ...
  • A novel architecture for large windows processors 

    González, Isidro; Galluzzi, Marco; Veidenbaum, Alex; Ramírez, Marco Antonio; Cristal Kestelman, Adrián; Valero Cortés, Mateo (2007-11)
    External research report
    Open Access
    Several processor architectures with large instruction windows have been proposed. They improve performance by maintaining hundreds of instructions in flight to increase the level of instruction parallelism (ILP). Such ...
  • Architecture for object-oriented programming model 

    Markovic, Nikola; González, Rubén; Unsal, Osman Sabri; Valero Cortés, Mateo; Cristal Kestelman, Adrián (2009)
    External research report
    Open Access
    Current mainstream architectures have ISAs that are not able to maintain all the information provided by the application programmer using a high level programming language. Typically, the information that is lost in compiling ...
  • Atomic quake: using transactional memory in an interactive mulitplayer game Server 

    Zyulkyarov, Ferad; Gajinov, Vladimir; Unsal, Osman Sabri; Cristal Kestelman, Adrián; Ayguadé Parra, Eduard; Harris, Tim; Valero Cortés, Mateo (2009)
    Conference report
    Open Access
    Transactional Memory (TM) is being studied widely as a new technique for synchronizing concurrent accesses to shared memory data structures for use in multi-core systems. Much of the initial work on TM has been evaluated ...
  • A two level load/store queue based on execution locality 

    Pericàs Gleim, Miquel; Cristal Kestelman, Adrián; Cazorla, Francisco; González García, Rubén; Veidenbaum, Alexander V; Jiménez, Daniel A.; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2008)
    Conference report
    Open Access
    Multicore processors have emerged as a powerful platform on which to efficiently exploit thread-level parallelism (TLP). However, due to Amdahl’s Law, such designs will be increasingly limited by the remaining sequential ...
  • Chained in-order/out-of-order doublecore architecture 

    Pericàs Gleim, Miquel; Cristal Kestelman, Adrián; González García, Rubén; Jiménez González, Daniel; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2005)
    Conference report
    Open Access
    Complexity is one of the most important problems facing microarchitects. It is exacerbated by the application of optimizations, by scaling to higher issue widths and, in general, by increasing the size of microprocessor ...
  • Circuit design of a dual-versioning L1 data cache for optimistic concurrency 

    Seyedi, Azam; Armejach, Adrià; Cristal Kestelman, Adrián; Unsal, Osman Sabri; Hur, Ibrahim; Valero Cortés, Mateo (2011)
    External research report
    Open Access
    This paper proposes a novel L1 data cache design with dual-versioning SRAM cells (dvSRAM) for chip multi-processors (CMP) that implement optimistic concurrency proposals. In this new cache architecture, each dvSRAM cell ...
  • Circuit design of a novel adaptable and reliable L1 data cache 

    Seyedi, Azam; Yalcin, Gulay; Unsal, Osman Sabri; Cristal Kestelman, Adrián (2013)
    Conference report
    Open Access
    This paper proposes a novel adaptable and reliable L1 data cache design (Adapcache) with the unique capability of automatically adapting itself for different supply voltage levels and providing the highest reliability. ...