Mostra el registre d'ítem simple

dc.contributor.authorFalcón Samper, Ayose Jesús
dc.contributor.authorRamírez Bellido, Alejandro
dc.contributor.authorValero Cortés, Mateo
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.date.accessioned2017-05-18T07:14:20Z
dc.date.available2017-05-18T07:14:20Z
dc.date.issued2005
dc.identifier.citationFalcón, A., Ramírez, A., Valero, M. Effective instruction prefetching via fetch prestaging. A: IEEE International Parallel and Distributed Processing Symposium. "19th IEEE International Parallel and Distributed Processing Syposium: April 4-8, 2005, Denver, Colorado: proceedings". Denver, Colorado: Institute of Electrical and Electronics Engineers (IEEE), 2005, p. 1-10.
dc.identifier.isbn0-7695-2312-9
dc.identifier.urihttp://hdl.handle.net/2117/104589
dc.description.abstractAs technological process shrinks and clock rate increases, instruction caches can no longer be accessed in one cycle. Alternatives are implementing smaller caches (with higher miss rate) or large caches with a pipelined access (with higher branch misprediction penalty). In both cases, the performance obtained is far from the obtained by an ideal large cache with one-cycle access. In this paper we present cache line guided prestaging (CLGP), a novel mechanism that overcomes the limitations of current instruction cache implementations. CLGP employs prefetching to charge future cache lines into a set of fast prestage buffers. These buffers are managed efficiently by the CLGP algorithm, trying to fetch from them as much as possible. Therefore, the number of fetches served by the main instruction cache is highly reduced, and so the negative impact of its access latency on the overall performance. With the best CLGP configuration using a 4 KB I-cache, speedups of 3.5% (at 0.09 /spl mu/m) and 12.5% (at 0.045 /spl mu/m) are obtained over an equivalent fetch directed prefetching configuration, and 39% (at 0.09 /spl mu/m) and 48% (at 0.045 /spl mu/m) over using a pipelined instruction cache without prefetching. Moreover, our results show that CLGP with a 2.5 KB of total cache budget can obtain a similar performance than using a 64 KB pipelined I-cache without prefetching, that is equivalent performance at 6.4X our hardware budget.
dc.format.extent10 p.
dc.language.isoeng
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)
dc.subjectÀrees temàtiques de la UPC::Informàtica::Arquitectura de computadors
dc.subject.lcshMicroprocessors -- Design and construction
dc.subject.otherInstruction sets
dc.subject.otherCache storage
dc.subject.otherPipeline processing
dc.titleEffective instruction prefetching via fetch prestaging
dc.typeConference report
dc.subject.lemacMicroprocessadors -- Disseny i construcció
dc.contributor.groupUniversitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
dc.identifier.doi10.1109/IPDPS.2005.188
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttp://ieeexplore.ieee.org/document/1419838/
dc.rights.accessOpen Access
local.identifier.drac2421099
dc.description.versionPostprint (published version)
local.citation.authorFalcón, A.; Ramírez, A.; Valero, M.
local.citation.contributorIEEE International Parallel and Distributed Processing Symposium
local.citation.pubplaceDenver, Colorado
local.citation.publicationName19th IEEE International Parallel and Distributed Processing Syposium: April 4-8, 2005, Denver, Colorado: proceedings
local.citation.startingPage1
local.citation.endingPage10


Fitxers d'aquest items

Thumbnail

Aquest ítem apareix a les col·leccions següents

Mostra el registre d'ítem simple