A two level load/store queue based on execution locality

Pericàs Gleim, Miquel; Cristal Kestelman, Adrián; Cazorla, Francisco; González García, Rubén; Veidenbaum, Alexander V; Jiménez, Daniel A.; Valero Cortés, Mateo

doi:10.1109/ISCA.2008.10

dc.contributor.author	Pericàs Gleim, Miquel
dc.contributor.author	Cristal Kestelman, Adrián
dc.contributor.author	Cazorla, Francisco
dc.contributor.author	González García, Rubén
dc.contributor.author	Veidenbaum, Alexander V
dc.contributor.author	Jiménez, Daniel A.
dc.contributor.author	Valero Cortés, Mateo
dc.contributor.other	Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.date.accessioned	2017-06-27T07:08:37Z
dc.date.available	2017-06-27T07:08:37Z
dc.date.issued	2008
dc.identifier.citation	Pericàs, M., Cristal, A., Cazorla, F., González, R., Veidenbaum, A.V., Jiménez, D. A., Valero, M. A two level load/store queue based on execution locality. A: International Symposium on Computer Arquitecture. "ISCA 2008 Proceedings: 35th International Symposium on Computer Architecture: 21-25 June 2008, Beijing, China". Beijing: Institute of Electrical and Electronics Engineers (IEEE), 2008, p. 25-36.
dc.identifier.isbn	978-0-7695-3174-8
dc.identifier.uri	http://hdl.handle.net/2117/105883
dc.description.abstract	Multicore processors have emerged as a powerful platform on which to efficiently exploit thread-level parallelism (TLP). However, due to Amdahl’s Law, such designs will be increasingly limited by the remaining sequential components of applications. To overcome this limitation it is necessary to design processors with many lower–performance cores for TLP and some high-performance cores designed to execute sequential algorithms. Such cores will need to address the memory-wall by implementing kilo-instruction windows. Large window processors require large Load/Store Queues that would be too slow if implemented using current CAMbased designs. This paper proposes an Epoch-based Load Store Queue (ELSQ), a new design based on Execution Locality. It is integrated into a large-window processor that has a fast, out-of-order core operating only on L1/L2 cache hits and N slower cores that process L2 misses and their dependent instructions. The large LSQ is coupled with the slow cores and is partitioned into N small and local LSQs, one per core. We evaluate ELSQ in a large-window environment, finding that it enables high performance at low power. By exploiting locality among loads and stores, ELSQ outperforms even an idealized central LSQ when implemented on top of a decoupled processor design.
dc.format.extent	12 p.
dc.language.iso	eng
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)
dc.subject	Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors
dc.subject.lcsh	Cache memory
dc.subject.lcsh	Parallel processing (Electronic computers)
dc.subject.other	Parallel processing
dc.subject.other	Cache storage
dc.title	A two level load/store queue based on execution locality
dc.type	Conference report
dc.subject.lemac	Memòria cau
dc.subject.lemac	Processament en paral·lel (Ordinadors)
dc.contributor.group	Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
dc.identifier.doi	10.1109/ISCA.2008.10
dc.description.peerreviewed	Peer Reviewed
dc.relation.publisherversion	http://ieeexplore.ieee.org/document/4556713/
dc.rights.access	Open Access
local.identifier.drac	2364391
dc.description.version	Postprint (published version)
local.citation.author	Pericàs, M.; Cristal, A.; Cazorla, F.; González, R.; Veidenbaum, A.V.; Jiménez, D. A.; Valero, M.
local.citation.contributor	International Symposium on Computer Arquitecture
local.citation.pubplace	Beijing
local.citation.publicationName	ISCA 2008 Proceedings: 35th International Symposium on Computer Architecture: 21-25 June 2008, Beijing, China
local.citation.startingPage	25
local.citation.endingPage	36

Fitxers d'aquest items

Nom:: 04556713.pdf
Mida:: 623,0Kb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Ponències/Comunicacions de congressos [784]
Ponències/Comunicacions de congressos [1.954]

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

A two level load/store queue based on execution locality

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora