Local scheduling techniques for memory coherence in a clustered VLIW processor with a distributed data cache

Gibert Codina, Enric; Sánchez, Jesús; González Colás, Antonio María

doi:10.1109/CGO.2003.1191545

Visualitza/Obre

01191545.pdf (226,2Kb)

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Gibert Codina, Enric

Sánchez, Jesús

González Colás, Antonio María

Tipus de documentText en actes de congrés

Data publicació2003

EditorInstitute of Electrical and Electronics Engineers (IEEE)

Condicions d'accésAccés obert

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

Abstract

Clustering is a common technique to deal with wire delays. Fully-distributed architectures, where the register file, the functional units and the cache memory are partitioned, are particularly effective to deal with these constraints and besides they are very scalable. However the distribution of the data cache introduces a new problem: memory instructions may reach the cache in an order different to the sequential program order, thus possibly violating its contents. In this paper two local scheduling mechanisms that guarantee the serialization of aliased memory instructions are proposed and evaluated: the construction of memory dependent chains (MDC solution), and two transformations (store replication and load-store synchronization) applied to the original data dependence graph (DDGT solution). These solutions do not require any extra hardware. The proposed scheduling techniques are evaluated for a word-interleaved cache clustered VLIW processor (although these techniques can also be used for any other distributed cache configuration). Results for the Mediabench benchmark suite demonstrate the effectiveness of such techniques. In particular, the DDGT solution increases the proportion of local accesses by 16% compared to MDC, and stall time is reduced by 32% since load instructions can be freely scheduled in any cluster However the MDC solution reduces compute time and it often outperforms the former. Finally the impact of both techniques on an architecture with attraction buffers is studied and evaluated.

CitacióGibert, E., Sánchez, J., González, A. Local scheduling techniques for memory coherence in a clustered VLIW processor with a distributed data cache. A: International Symposium on Code Generation and Optimization. "International Symposium on Code Generation and Optimization, CGO 2003: 23-26 March 2003, San Francisco, California". San Francisco, CA: Institute of Electrical and Electronics Engineers (IEEE), 2003, p. 193-203.

URIhttp://hdl.handle.net/2117/100451

DOI10.1109/CGO.2003.1191545

ISBN0-7695-1913-X

Versió de l'editorhttp://ieeexplore.ieee.org/document/1191545/

Col·leccions

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
01191545.pdf		226,2Kb	PDF	Visualitza/Obre

UPCommons. Portal del coneixement obert de la UPC

Local scheduling techniques for memory coherence in a clustered VLIW processor with a distributed data cache

Visualitza/Obre

Explora