A two level load/store queue based on execution locality

dc.contributor.authorPericàs Gleim, Miquel
dc.contributor.authorCristal Kestelman, Adrián
dc.contributor.authorCazorla, Francisco
dc.contributor.authorGonzález García, Rubén
dc.contributor.authorVeidenbaum, Alexander V
dc.contributor.authorJiménez, Daniel A.
dc.contributor.authorValero Cortés, Mateo
dc.contributor.groupUniversitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.date.accessioned2017-06-27T07:08:37Z
dc.date.available2017-06-27T07:08:37Z
dc.date.issued2008
dc.description.abstractMulticore processors have emerged as a powerful platform on which to efficiently exploit thread-level parallelism (TLP). However, due to Amdahl’s Law, such designs will be increasingly limited by the remaining sequential components of applications. To overcome this limitation it is necessary to design processors with many lower–performance cores for TLP and some high-performance cores designed to execute sequential algorithms. Such cores will need to address the memory-wall by implementing kilo-instruction windows. Large window processors require large Load/Store Queues that would be too slow if implemented using current CAMbased designs. This paper proposes an Epoch-based Load Store Queue (ELSQ), a new design based on Execution Locality. It is integrated into a large-window processor that has a fast, out-of-order core operating only on L1/L2 cache hits and N slower cores that process L2 misses and their dependent instructions. The large LSQ is coupled with the slow cores and is partitioned into N small and local LSQs, one per core. We evaluate ELSQ in a large-window environment, finding that it enables high performance at low power. By exploiting locality among loads and stores, ELSQ outperforms even an idealized central LSQ when implemented on top of a decoupled processor design.
dc.description.peerreviewedPeer Reviewed
dc.description.versionPostprint (published version)
dc.format.extent12 p.
dc.identifier.citationPericàs, M., Cristal, A., Cazorla, F., González, R., Veidenbaum, A.V., Jiménez, D. A., Valero, M. A two level load/store queue based on execution locality. A: International Symposium on Computer Arquitecture. "ISCA 2008 Proceedings: 35th International Symposium on Computer Architecture: 21-25 June 2008, Beijing, China". Beijing: Institute of Electrical and Electronics Engineers (IEEE), 2008, p. 25-36.
dc.identifier.doi10.1109/ISCA.2008.10
dc.identifier.isbn978-0-7695-3174-8
dc.identifier.urihttps://hdl.handle.net/2117/105883
dc.language.isoeng
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)
dc.relation.publisherversionhttp://ieeexplore.ieee.org/document/4556713/
dc.rights.accessOpen Access
dc.subjectÀrees temàtiques de la UPC::Informàtica::Arquitectura de computadors
dc.subject.lcshCache memory
dc.subject.lcshParallel processing (Electronic computers)
dc.subject.lemacMemòria cau
dc.subject.lemacProcessament en paral·lel (Ordinadors)
dc.subject.otherParallel processing
dc.subject.otherCache storage
dc.titleA two level load/store queue based on execution locality
dc.typeConference report
dspace.entity.typePublication
local.citation.authorPericàs, M.; Cristal, A.; Cazorla, F.; González, R.; Veidenbaum, A.V.; Jiménez, D. A.; Valero, M.
local.citation.contributorInternational Symposium on Computer Arquitecture
local.citation.endingPage36
local.citation.publicationNameISCA 2008 Proceedings: 35th International Symposium on Computer Architecture: 21-25 June 2008, Beijing, China
local.citation.pubplaceBeijing
local.citation.startingPage25
local.identifier.drac2364391

Fitxers

Paquet original

Mostrant 1 - 1 de 1
Carregant...
Miniatura
Nom:
04556713.pdf
Mida:
623 KB
Format:
Adobe Portable Document Format