Mostra el registre d'ítem simple
Audiovisual event detection towards scene understanding
dc.contributor.author | Canton Ferrer, Cristian |
dc.contributor.author | Butko, Taras |
dc.contributor.author | Segura, C. |
dc.contributor.author | Giró Nieto, Xavier |
dc.contributor.author | Nadeu Camprubí, Climent |
dc.contributor.author | Hernando Pericás, Francisco Javier |
dc.contributor.author | Casas Pla, Josep Ramon |
dc.contributor.other | Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions |
dc.date.accessioned | 2014-07-31T08:41:18Z |
dc.date.created | 2009 |
dc.date.issued | 2009 |
dc.identifier.citation | Canton, C. [et al.]. Audiovisual event detection towards scene understanding. A: IEEE Conference on Computer Vision and Pattern Recognition. "2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition workshops: CVPR workshops 2009: Miami Beach, Florida, USA: 20-25 June 2009". Institute of Electrical and Electronics Engineers (IEEE), 2009, p. 840-847. |
dc.identifier.isbn | 978-1-4244-3994-2 |
dc.identifier.uri | http://hdl.handle.net/2117/23653 |
dc.description.abstract | Acoustic events produced in meeting environments may contain useful information for perceptually aware interfaces and multimodal behavior analysis. In this paper, a system to detect and recognize these events from a multimodal perspective is presented combining information from multiple cameras and microphones. First, spectral and temporal features are extracted from a single audio channel and spatial localization is achieved by exploiting cross-correlation among microphone arrays. Second, several video cues obtained from multiperson tracking, motion analysis, face recognition, and object detection provide the visual counterpart of the acoustic events to be detected. A multimodal data fusion at score level is carried out using two approaches: weighted mean average and fuzzy integral. Finally, a multimodal database containing a rich variety of acoustic events has been recorded including manual annotations of the data. A set of metrics allow assessing the performance of the presented algorithms. This dataset is made publicly available for research purposes. |
dc.format.extent | 8 p. |
dc.language.iso | eng |
dc.publisher | Institute of Electrical and Electronics Engineers (IEEE) |
dc.subject | Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic |
dc.subject | Àrees temàtiques de la UPC::Informàtica |
dc.subject.lcsh | Human face recognition (Computer science) |
dc.subject.other | Audio signal processing |
dc.subject.other | Face recognition |
dc.subject.other | Motion estimation |
dc.subject.other | Object detection |
dc.subject.other | Sensor fusion |
dc.subject.other | Transforms |
dc.subject.other | Video signal processing |
dc.title | Audiovisual event detection towards scene understanding |
dc.type | Conference report |
dc.subject.lemac | Reconeixement facial (Informàtica) |
dc.contributor.group | Universitat Politècnica de Catalunya. GPI - Grup de Processament d'Imatge i Vídeo |
dc.contributor.group | Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla |
dc.identifier.doi | 10.1109/CVPRW.2009.5204264 |
dc.description.peerreviewed | Peer Reviewed |
dc.relation.publisherversion | http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=05204264 |
dc.rights.access | Restricted access - publisher's policy |
local.identifier.drac | 2416071 |
dc.description.version | Postprint (published version) |
dc.date.lift | 10000-01-01 |
local.citation.author | Canton, C.; Butko, T.; Segura, C.; Giro, X.; Nadeu, C.; Hernando, J.; Casas, J. |
local.citation.contributor | IEEE Conference on Computer Vision and Pattern Recognition |
local.citation.publicationName | 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition workshops: CVPR workshops 2009: Miami Beach, Florida, USA: 20-25 June 2009 |
local.citation.startingPage | 840 |
local.citation.endingPage | 847 |