Author retrospective for "Software trace cache"

View/Open
Cita com:
hdl:2117/28191
Document typeConference report
Defense date2014
PublisherAssociation for Computing Machinery (ACM)
Rights accessOpen Access
All rights reserved. This work is protected by the corresponding intellectual and industrial
property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public
communication or transformation of this work are prohibited without permission of the copyright holder
Abstract
In superscalar processors, capable of issuing and executing multiple instructions per cycle, fetch performance represents an upper bound to the overall processor performance. Unless there is some form of instruction re-use mechanism, you cannot execute instructions faster than you can fetch them.
Instruction Level Parallelism, represented by wide issue out oforder superscalar processors, was the trending topic during the end of the 90's and early 2000's. It is indeed the most promising way to continue improving processor performance in a way that does not impact application development, unlike current multicore architectures which require parallelizing the applications (a process that is still far from being automated in the general case). Widening superscalar processor issue was the promise of neverending improvements to single thread performance, as identified by Yale N. Patt et al. in the 1997 special issue of IEEE Computer about "Billion transistor processors" [1].
However, instruction fetch performance is limited by the control flow of the program. The basic fetch stage implementation can read instructions from a single cache line, starting from the current fetch address and up to the next control flow instruction. That is one basic block per cycle at most.
Given that the typical basic block size in SPEC integer benchmarks is 4-6 instructions, fetch performance was limited to those same 4-6 instructions per cycle, making 8-wide and 16-wide superscalar processors impractical. It became imperative to find mechanisms to fetch more than 8 instructions per cycle, and that meant fetching more than one basic block per cycle.
CitationAlex Ramirez [et al.]. Author retrospective for "Software trace cache". A: International Conference on Supercomputing. "ICS '14: proceedings of the 28th ACM International conference on Supercomputing". Munich: Association for Computing Machinery (ACM), 2014, p. 45-47.
ISBN978-1-4503-2642-1
Files | Description | Size | Format | View |
---|---|---|---|---|
p45-ramirez.pdf | 649,4Kb | View/Open |