Mostra el registre d'ítem simple
SAMIE-LSQ: set-associative multiple-instruction entry load/store queue
dc.contributor.author | Abella Ferrer, Jaume |
dc.contributor.author | González Colás, Antonio María |
dc.contributor.other | Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors |
dc.date.accessioned | 2017-02-16T12:31:32Z |
dc.date.available | 2017-02-16T12:31:32Z |
dc.date.issued | 2006 |
dc.identifier.citation | Abella, J., González, A. SAMIE-LSQ: set-associative multiple-instruction entry load/store queue. A: IEEE International Parallel and Distributed Processing Symposium. "Proceeding of the 20th IEEE International Parallel & Distributed Processing Symposium". Ixia, Rodes: IEEE Computer Society, 2006, p. 1-10. |
dc.identifier.isbn | 1-42440054-6 |
dc.identifier.uri | http://hdl.handle.net/2117/101144 |
dc.description.abstract | The load/store queue (LSQ) is one of the most complex parts of contemporary processors. Its latency is critical for the processor performance and it is usually one of the processor hotspots. This paper presents a highly banked, set-associative, multiple-instruction entry LSQ (SAMIE-LSQ,) that achieves high performance with small energy requirements. The SAMIE-LSQ classifies the memory instructions (loads and stores) based on the address to be accessed, and groups those instructions accessing the same cache line in the same entry. Our approach relies on the fact that many in-flight memory instructions access the same cache lines. Each SAMIE-LSQ entry has space for several memory instructions accessing the same cache line. This arrangement has a number of advantages. First, it significantly reduces the address comparison activity needed for memory disambiguation since there are less addresses to be compared. It also reduces the activity in the data TLB, the cache tag and cache data arrays. This is achieved by caching the cache line location and address translation in the corresponding SAMIE-LSQ entry once the access of one of the instructions in an entry is performed, so instructions that share an entry can reuse the translation, avoid the tag check and get the data directly from the concrete cache way without checking the others. Besides, the delay of the proposed scheme is lower than that required by a conventional LSQ. We show that the SAMIE-LSQ saves 82% dynamic energy for the load/store queue, 42% for the LI data cache and 73% for the data TLB, with a negligible impact on performance (0.6%) |
dc.format.extent | 10 p. |
dc.language.iso | eng |
dc.publisher | IEEE Computer Society |
dc.subject | Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors |
dc.subject.lcsh | Cache memory |
dc.subject.lcsh | Microprocessors |
dc.subject.other | Delay |
dc.subject.other | Power dissipation |
dc.subject.other | Cooling |
dc.subject.other | Pipelines |
dc.subject.other | Computer architecture |
dc.subject.other | Concrete |
dc.subject.other | Costs |
dc.title | SAMIE-LSQ: set-associative multiple-instruction entry load/store queue |
dc.type | Conference report |
dc.subject.lemac | Memòria cau |
dc.subject.lemac | Microprocessadors |
dc.contributor.group | Universitat Politècnica de Catalunya. ARCO - Microarquitectura i Compiladors |
dc.identifier.doi | 10.1109/IPDPS.2006.1639290 |
dc.description.peerreviewed | Peer Reviewed |
dc.relation.publisherversion | http://ieeexplore.ieee.org/document/1639290/ |
dc.rights.access | Open Access |
local.identifier.drac | 2358229 |
dc.description.version | Postprint (published version) |
local.citation.author | Abella, J.; González, A. |
local.citation.contributor | IEEE International Parallel and Distributed Processing Symposium |
local.citation.pubplace | Ixia, Rodes |
local.citation.publicationName | Proceeding of the 20th IEEE International Parallel & Distributed Processing Symposium |
local.citation.startingPage | 1 |
local.citation.endingPage | 10 |