Mostra el registre d'ítem simple
Performance evaluation of macroblock-level parallelization of H.264 decoding on a cc-NUMA multiprocessor architecture
dc.contributor.author | Álvarez Mesa, Mauricio |
dc.contributor.author | Ramírez Bellido, Alejandro |
dc.contributor.author | Valero Cortés, Mateo |
dc.contributor.author | Azevedo, Arnaldo |
dc.contributor.author | Meenderinck, Cor |
dc.contributor.author | Juurlink, Ben |
dc.contributor.other | Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors |
dc.date.accessioned | 2010-07-01T10:19:11Z |
dc.date.available | 2010-07-01T10:19:11Z |
dc.date.created | 2009-04-23 |
dc.date.issued | 2009-04-23 |
dc.identifier.citation | Álvarez, M. [et al.]. Performance evaluation of macroblock-level parallelization of H.264 decoding on a cc-NUMA multiprocessor architecture. A: 2009 Colombian Computing Conference. "Cuarto Congreso Colombiano de Computación, 4CCC: abril 23-25, 2009, Bucaramanga, Colombia". Bucaramanga: 2009, p. 108-117. |
dc.identifier.isbn | 978-958-8166-43-8 |
dc.identifier.uri | http://hdl.handle.net/2117/7947 |
dc.description.abstract | This paper presents a study of the performance scalability of a macroblock-level parallelization of the H.264 decoder for High De nition (HD) applications on a multiprocessor architecture. We have implemented this parallelization on a cache coherent Non-uniform Memory Access (cc-NUMA) shared memory multiprocessor (SMP) and compared the results with the theoretical expectations. Three di erent scheduling techniques were analyzed: static, dynamic and dynamic with tail-submit. A dynamic scheduling approach with a tail-submit optimization presents the best performance obtaining a maximum speed-up of 9.5 using 24 processors. A detailed pro ling analysis showed that thread synchronization is one of the limiting factors for achieving a better parallel scalability. The paper includes an evaluation of the impact of using blocking synchronization APIs like POSIX threads and POSIX real-time extensions. Results showed that macroblock-level parallelism as a very negrain form of Thread-Level Parallelism (TLP) is highly affected by the thread synchronization overhead generated by these APIs. Other synchronization methods, possibly with hardware support, are required in order to make MB-level parallelization more scalable. |
dc.format.extent | 10 p. |
dc.language.iso | eng |
dc.subject | Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors::Arquitectures paral·leles |
dc.subject.lcsh | cc-NUMA multiprocessor architecture |
dc.subject.lcsh | H.264 |
dc.title | Performance evaluation of macroblock-level parallelization of H.264 decoding on a cc-NUMA multiprocessor architecture |
dc.type | Conference report |
dc.subject.lemac | Multiprocessadors -- Avaluació |
dc.contributor.group | Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions |
dc.description.peerreviewed | Peer Reviewed |
dc.relation.publisherversion | http://serverlab.unab.edu.co:8080/wikimedia/memorias/fullpapers/108.pdf |
dc.rights.access | Restricted access - publisher's policy |
local.identifier.drac | 2574254 |
dc.description.version | Postprint (published version) |
local.citation.author | Álvarez, M.; Ramírez, A.; Valero, M.; Azevedo, A.; Meenderinck, C.; Juurlink, B. |
local.citation.contributor | Colombian Computing Conference |
local.citation.pubplace | Bucaramanga |
local.citation.publicationName | Cuarto Congreso Colombiano de Computación, 4CCC: abril 23-25, 2009, Bucaramanga, Colombia |
local.citation.startingPage | 108 |
local.citation.endingPage | 117 |