Show simple item record

dc.contributor.authorOro, David
dc.contributor.authorFernández, Carles
dc.contributor.authorSegura, Carlos
dc.contributor.authorMartorell Bofill, Xavier
dc.contributor.authorHernando Pericás, Francisco Javier
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions
dc.identifier.citationOro, D. [et al.]. Accelerating boosting-based face detection on GPUs. A: International Conference on Parallel Processing. "ICPP 2012: the 41st International Conference on Parallel Processing: Pittsburgh, Pennsylvania, USA, 10-13 September 2012". Pittsburgh, Pennsylvania: 2012, p. 309-318.
dc.description.abstractThe goal of face detection is to determine the presence of faces in arbitrary images, along with their locations and dimensions. As it happens with any graphics workloads, these algorithms benefit from data-level parallelism. Existing parallelization efforts strictly focus on mapping different di- vide and conquer strategies into multicore CPUs and GPUs. However, even the most advanced single-chip many-core pro- cessors to date are still struggling to effectively handle real- time face detection under high-definition video workloads. To address this challenge, face detection algorithms typically avoid computations by dynamically evaluating a boosted cascade of classifiers. Unfortunately, this technique yields a low ALU occupancy in architectures such as GPUs, which heavily rely on large SIMD widths for maximizing data-level parallelism. In this paper we present several techniques to increase the performance of the cascade evaluation kernel, which is the most resource-intensive part of the face detection pipeline. Particularly, the usage of concurrent kernel execution in combination with cascades generated with the GentleBoost algorithm solves the problem of GPU underutilization, and achieves a 5X speedup in 1080p videos on average over the fastest known implementations, while slightly improving the accuracy. Finally, we also studied the parallelization of the cascade training process and its scalability under SMP platforms. The proposed parallelization strategy exploits both task and data-level parallelism and achieves a 3.5X speedup over single-threaded implementations
dc.format.extent10 p.
dc.subjectÀrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la imatge i del senyal vídeo
dc.subjectÀrees temàtiques de la UPC::Informàtica::Arquitectura de computadors::Arquitectures paral·leles
dc.subject.lcshImage processing -- Digital techniques
dc.subject.lcshParallel processing (Electronic computers)
dc.titleAccelerating boosting-based face detection on GPUs
dc.typeConference report
dc.subject.lemacImatges -- Processament --Tècniques digitals
dc.subject.lemacProcessament en paral·lel (Ordinadors)
dc.contributor.groupUniversitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
dc.contributor.groupUniversitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
dc.description.peerreviewedPeer Reviewed
dc.rights.accessRestricted access - publisher's policy
dc.description.versionPostprint (published version)
local.citation.authorOro, D.; Fernández, C.; Segura, C.; Martorell, X.; Hernando, J.
local.citation.contributorInternational Conference on Parallel Processing
local.citation.pubplacePittsburgh, Pennsylvania
local.citation.publicationNameICPP 2012: the 41st International Conference on Parallel Processing: Pittsburgh, Pennsylvania, USA, 10-13 September 2012

Files in this item


This item appears in the following Collection(s)

Show simple item record

All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder