On the efficiency of reductions in µ-SIMD media extensions
Document typeConference report
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Rights accessOpen Access
Many important multimedia applications contain a significant fraction of reduction operations. Although, in general, multimedia applications are characterized for having high amounts of Data Level Parallelism, reductions and accumulations are difficult to parallelize and show a poor tolerance to increases in the latency of the instructions. This is specially significant for µ-SIMD extensions such as MMX or AltiVec. To overcome the problem of reductions in µ-SIMD ISAs, designers tend to include more and more complex instructions able to deal with the most common forms of reductions in multimedia. As long as the number of processor pipeline stages grows, the number of cycles needed to execute these multimedia instructions increases with every processor generation, severely compromising performance. The paper presents an in-depth discussion of how reductions/accumulations are performed in current µ-SIMD architectures and evaluates the performance trade-offs for near-future highly aggressive superscalar processors with three different styles of µ-SIMD extensions. We compare a MMX-like alternative to a MDMX-like extension that has packed accumulators to attack the reduction problem, and we also compare it to MOM, a matrix register ISA. We show that while packed accumulators present several advantages, they introduce artificial recurrences that severely degrade performance for processors with high number of registers and long latency operations. On the other hand, the paper demonstrates that longer SIMD media extensions such as MOM can take great advantage of accumulators by exploiting the associative parallelism implicit in reductions.
CitationCorbal, J., Espasa, R., Valero, M. On the efficiency of reductions in µ-SIMD media extensions. A: International Conference on Parallel Architectures and Compilation Techniques. "2001 International Conference on Parallel Architectures and Compilation Techniques: 8-12 September 2001 Barcelona, Catalunya, Spain: proceedings". Barcelona: Institute of Electrical and Electronics Engineers (IEEE), 2001, p. 83-94.