Inherently workload-balanced clustered microarchitecture
Document typeConference report
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Rights accessOpen Access
The performance of clustered microarchitectures relies on steering schemes that try to find the best trade-off between workload balance and inter-cluster communication penalties. In previously proposed clustered processors, reducing communication penalties and balancing the workload are opposite targets, since improving one usually implies a detriment in the other. In this paper we propose a new clustered microarchitecture that can minimize communication penalties without compromising workload balance. The key idea is to arrange the clusters in a ring topology in such a way that results of one cluster can be forwarded to the neighbor cluster with a very short latency. In this way, minimizing communication penalties is favored when the producer of a value and its consumer are placed in adjacent clusters, which also favors workload balance. The proposed microarchitecture is shown to outperform a state-of-the-art clustered processor. For instance, for an 8-cluster configuration and just one fully pipelined unidirectional bus, 15% speedup is achieved on average for FP programs.
CitationAbella, J., Gonzalez, A. Inherently workload-balanced clustered microarchitecture. A: IEEE International Parallel and Distributed Processing Symposium. "19th IEEE International Parallel and Distributed Processing Syposium: April 4-8, 2005, Denver, Colorado: proceedings". Denver, Colorado: Institute of Electrical and Electronics Engineers (IEEE), 2005, p. 1-10.