Author retrospective for "Software trace cache"

Ramírez Bellido, Alejandro; Falcón Samper, Ayose Jesus; Santana Jaria, Oliverio J.; Valero Cortés, Mateo

doi:10.1145/2591635.2594508

Visualitza/Obre

p45-ramirez.pdf (649,4Kb)

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Ramírez Bellido, Alejandro

Falcón Samper, Ayose Jesus

Santana Jaria, Oliverio J.

Valero Cortés, Mateo

Tipus de documentText en actes de congrés

Data publicació2014

EditorAssociation for Computing Machinery (ACM)

Condicions d'accésAccés obert

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

Abstract

In superscalar processors, capable of issuing and executing multiple instructions per cycle, fetch performance represents an upper bound to the overall processor performance. Unless there is some form of instruction re-use mechanism, you cannot execute instructions faster than you can fetch them. Instruction Level Parallelism, represented by wide issue out oforder superscalar processors, was the trending topic during the end of the 90's and early 2000's. It is indeed the most promising way to continue improving processor performance in a way that does not impact application development, unlike current multicore architectures which require parallelizing the applications (a process that is still far from being automated in the general case). Widening superscalar processor issue was the promise of neverending improvements to single thread performance, as identified by Yale N. Patt et al. in the 1997 special issue of IEEE Computer about "Billion transistor processors" [1]. However, instruction fetch performance is limited by the control flow of the program. The basic fetch stage implementation can read instructions from a single cache line, starting from the current fetch address and up to the next control flow instruction. That is one basic block per cycle at most. Given that the typical basic block size in SPEC integer benchmarks is 4-6 instructions, fetch performance was limited to those same 4-6 instructions per cycle, making 8-wide and 16-wide superscalar processors impractical. It became imperative to find mechanisms to fetch more than 8 instructions per cycle, and that meant fetching more than one basic block per cycle.

CitacióAlex Ramirez [et al.]. Author retrospective for "Software trace cache". A: International Conference on Supercomputing. "ICS '14: proceedings of the 28th ACM International conference on Supercomputing". Munich: Association for Computing Machinery (ACM), 2014, p. 45-47.

URIhttp://hdl.handle.net/2117/28191

DOI10.1145/2591635.2594508

ISBN978-1-4503-2642-1

Versió de l'editorhttp://dl.acm.org/citation.cfm?id=2591635.2594508&coll=DL&dl=ACM&CFID=517086675&CFTOKEN=63947307

Col·leccions

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
p45-ramirez.pdf		649,4Kb	PDF	Visualitza/Obre

UPCommons. Portal del coneixement obert de la UPC

Author retrospective for "Software trace cache"

Visualitza/Obre

Explora