Effective usage of vector registers in decoupled vector architectures
Document typeConference report
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Rights accessOpen Access
The paper presents a study of the impact of reducing the vector register size in a decoupled vector architecture. In traditional in-order vector architectures long vector registers have typically been the norm. The authors present data which shows that, even for highly vectorizable codes, only a small fraction of all elements of a long vector register are actually used. They also show that reducing the register size in a traditional vector architecture in an attempt to reduce hardware cost and maximize register utilization results in a severe performance degradation. However they combine the decoupling technique with the vector register reduction and show that the resulting architecture tolerates very well the register size cuts. They simulate a selection of Perfect Club and Specfp92 programs using a trace driven approach and compare the execution time in a conventional vector architecture with a decoupled vector architecture using different registers sizes. Halving the register size and using decoupling provides speedups between 1.04-1.49 over a traditional in-order vector machines. Even reducing the register length to 1/4 the original size (and in some cases, to 1/8) the performance of the decoupled machine is better than a conventional vector model. Moreover they observe that the resulting decoupled machine with short registers tolerates very well long memory latencies.
CitationVilla, L., Espasa, R., Valero, M. Effective usage of vector registers in decoupled vector architectures. A: Euromicro International Conference on Parallel, Distributed, and Network-Based Processing. "Proceedings of the 6th EUROMICRO Workshop on Parallel and Distributed Processing, PDP'98: University of Madrid: January 21-23, 1998, Madrid, Spain". Madrid: Institute of Electrical and Electronics Engineers (IEEE), 1998, p. 495-501.