Optimizing resource utilization with software-based temporal multi-threading (sTMT)
Document typeConference report
Rights accessRestricted access - publisher's policy
Compute and memory access units are two of the most important resources to appropriately manage in current and future multi–/many–core architectures. Memory bandwidth and computational capacity need to be exploited in a combined way to achieve the best system performance. Coarse–grain multi– threading, also known as temporal multi–threading (TMT), is a well known technique that improves overall resource utilization by time–multiplexing the execution of a reduced number of hardware threads that are switched in case of a high–latency event, such as a memory miss. Hence, the processor does not stall on memory misses and the number of in–fly memory operations is increased, improving the overall processor resource utilization. In this paper, we propose a software–based implementation of TMT that supports and unbounded number of threads and enables a flexible combination of multiple computational kernels. Our TMT implementation is based on micro–threads that combine fast cooperative and preemptive context switches to overcome some intrinsic limitations of current TMT hardware implementations, such as the reduced and fixed number of hardware threads available. Our proposal is demonstrated with an implementation on the Cell/B.E. which is evaluated using heterogeneous mixes of memory–/CPU–bound kernels. Experimental results show how the proposed technique reduce the execution time of several benchmarks by up to 78%.
CitationBeltran, V.; Ayguade, E. Optimizing resource utilization with software-based temporal multi-threading (sTMT). A: International Conference on High Performance Computing. "19th International Conference on High Performance Computing". Pune: 2013, p. 1-10.