Performance evaluation of macroblock-level parallelization of H.264 decoding on a cc-NUMA multiprocessor architecture
View/Open
Performance evaluation of macroblock-level....pdf (309,1Kb) (Restricted access)
Request copy
Què és aquest botó?
Aquest botó permet demanar una còpia d'un document restringit a l'autor. Es mostra quan:
- Disposem del correu electrònic de l'autor
- El document té una mida inferior a 20 Mb
- Es tracta d'un document d'accés restringit per decisió de l'autor o d'un document d'accés restringit per política de l'editorial
Cita com:
hdl:2117/7947
Document typeConference report
Defense date2009-04-23
Rights accessRestricted access - publisher's policy
All rights reserved. This work is protected by the corresponding intellectual and industrial
property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public
communication or transformation of this work are prohibited without permission of the copyright holder
Abstract
This paper presents a study of the performance scalability of a macroblock-level parallelization of the H.264 decoder for High De nition (HD) applications on a multiprocessor
architecture. We have implemented this parallelization on a cache coherent Non-uniform Memory Access (cc-NUMA)
shared memory multiprocessor (SMP) and compared the results with the theoretical expectations. Three di erent scheduling techniques were analyzed: static, dynamic and
dynamic with tail-submit. A dynamic scheduling approach with a tail-submit optimization presents the best performance
obtaining a maximum speed-up of 9.5 using 24 processors. A detailed pro ling analysis showed that thread synchronization is one of the limiting factors for achieving a better parallel scalability. The paper includes an evaluation of the impact of using blocking synchronization APIs like POSIX threads and POSIX real-time extensions. Results showed that macroblock-level parallelism as a very negrain form of Thread-Level Parallelism (TLP) is highly affected by the thread synchronization overhead generated by
these APIs. Other synchronization methods, possibly with hardware support, are required in order to make MB-level parallelization more scalable.
CitationÁlvarez, M. [et al.]. Performance evaluation of macroblock-level parallelization of H.264 decoding on a cc-NUMA multiprocessor architecture. A: 2009 Colombian Computing Conference. "Cuarto Congreso Colombiano de Computación, 4CCC: abril 23-25, 2009, Bucaramanga, Colombia". Bucaramanga: 2009, p. 108-117.
ISBN978-958-8166-43-8
Files | Description | Size | Format | View |
---|---|---|---|---|
Performance evaluation of macroblock-level....pdf | 309,1Kb | Restricted access |