Show simple item record

dc.contributor.authorCorbal San Adrián, Jesús
dc.contributor.authorEspasa Sans, Roger
dc.contributor.authorValero Cortés, Mateo
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.identifier.citationCorbal, J., Espasa, R., Valero, M. On the efficiency of reductions in µ-SIMD media extensions. A: International Conference on Parallel Architectures and Compilation Techniques. "2001 International Conference on Parallel Architectures and Compilation Techniques: 8-12 September 2001 Barcelona, Catalunya, Spain: proceedings". Barcelona: Institute of Electrical and Electronics Engineers (IEEE), 2001, p. 83-94.
dc.description.abstractMany important multimedia applications contain a significant fraction of reduction operations. Although, in general, multimedia applications are characterized for having high amounts of Data Level Parallelism, reductions and accumulations are difficult to parallelize and show a poor tolerance to increases in the latency of the instructions. This is specially significant for µ-SIMD extensions such as MMX or AltiVec. To overcome the problem of reductions in µ-SIMD ISAs, designers tend to include more and more complex instructions able to deal with the most common forms of reductions in multimedia. As long as the number of processor pipeline stages grows, the number of cycles needed to execute these multimedia instructions increases with every processor generation, severely compromising performance. The paper presents an in-depth discussion of how reductions/accumulations are performed in current µ-SIMD architectures and evaluates the performance trade-offs for near-future highly aggressive superscalar processors with three different styles of µ-SIMD extensions. We compare a MMX-like alternative to a MDMX-like extension that has packed accumulators to attack the reduction problem, and we also compare it to MOM, a matrix register ISA. We show that while packed accumulators present several advantages, they introduce artificial recurrences that severely degrade performance for processors with high number of registers and long latency operations. On the other hand, the paper demonstrates that longer SIMD media extensions such as MOM can take great advantage of accumulators by exploiting the associative parallelism implicit in reductions.
dc.format.extent12 p.
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)
dc.subjectÀrees temàtiques de la UPC::Informàtica::Arquitectura de computadors
dc.subject.lcshMultimedia systems
dc.subject.lcshParallel processing (Electronic computers)
dc.subject.otherPipeline processing
dc.subject.otherInstruction sets
dc.titleOn the efficiency of reductions in µ-SIMD media extensions
dc.typeConference report
dc.subject.lemacSistemes multimèdia
dc.subject.lemacProcessament en paral·lel (Ordinadors)
dc.contributor.groupUniversitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
dc.description.peerreviewedPeer Reviewed
dc.rights.accessOpen Access
dc.description.versionPostprint (published version)
local.citation.authorCorbal, J.; Espasa, R.; Valero, M.
local.citation.contributorInternational Conference on Parallel Architectures and Compilation Techniques
local.citation.publicationName2001 International Conference on Parallel Architectures and Compilation Techniques: 8-12 September 2001 Barcelona, Catalunya, Spain: proceedings

Files in this item


This item appears in the following Collection(s)

Show simple item record

All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder