Acoustic event detection based on feature-level fusion of audio and video modalities

Butko, Taras; Canton Ferrer, Cristian; Segura Perales, Carlos; Giró Nieto, Xavier; Nadeu Camprubí, Climent; Hernando Pericás, Francisco Javier; Casas Pla, Josep Ramon

doi:10.1155/2011/485738

dc.contributor.author	Butko, Taras
dc.contributor.author	Canton Ferrer, Cristian
dc.contributor.author	Segura Perales, Carlos
dc.contributor.author	Giró Nieto, Xavier
dc.contributor.author	Nadeu Camprubí, Climent
dc.contributor.author	Hernando Pericás, Francisco Javier
dc.contributor.author	Casas Pla, Josep Ramon
dc.contributor.other	Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions
dc.date.accessioned	2011-10-23T09:26:48Z
dc.date.available	2011-10-23T09:26:48Z
dc.date.created	2011-03-15
dc.date.issued	2011-03-15
dc.identifier.citation	Butko, T. [et al.]. Acoustic event detection based on feature-level fusion of audio and video modalities. "Eurasip journal on advances in signal processing", 15 Març 2011, vol. 2011, p. 1-11.
dc.identifier.issn	1687-6172
dc.identifier.uri	http://hdl.handle.net/2117/13630
dc.description	Research article
dc.description.abstract	Acoustic event detection (AED) aims at determining the identity of sounds and their temporal position in audio signals. When applied to spontaneously generated acoustic events, AED based only on audio information shows a large amount of errors, which are mostly due to temporal overlaps. Actually, temporal overlaps accounted for more than 70% of errors in the realworld interactive seminar recordings used in CLEAR 2007 evaluations. In this paper, we improve the recognition rate of acoustic events using information from both audio and video modalities. First, the acoustic data are processed to obtain both a set of spectrotemporal features and the 3D localization coordinates of the sound source. Second, a number of features are extracted from video recordings by means of object detection, motion analysis, and multicamera person tracking to represent the visual counterpart of several acoustic events. A feature-level fusion strategy is used, and a parallel structure of binary HMM-based detectors is employed in our work. The experimental results show that information from both the microphone array and video cameras is useful to improve the detection rate of isolated as well as spontaneously generated acoustic events.
dc.format.extent	11 p.
dc.language.iso	eng
dc.publisher	HINDAWI
dc.subject	Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic
dc.subject	Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la imatge i del senyal vídeo
dc.subject.lcsh	Acoustic event detection
dc.title	Acoustic event detection based on feature-level fusion of audio and video modalities
dc.type	Article
dc.subject.lemac	Senyal acústic -- Detecció
dc.contributor.group	Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
dc.contributor.group	Universitat Politècnica de Catalunya. GPI - Grup de Processament d'Imatge i Vídeo
dc.identifier.doi	10.1155/2011/485738
dc.description.peerreviewed	Peer Reviewed
dc.relation.publisherversion	http://www.hindawi.com/journals/asp/2011/485738/
dc.rights.access	Open Access
local.identifier.drac	5391480
dc.description.version	Postprint (published version)
local.citation.author	Butko, T.; Canton-Ferrer, C.; Segura, C.; Giro, X.; Nadeu, C.; Hernando, J.; Casas, J.
local.citation.publicationName	Eurasip journal on advances in signal processing
local.citation.volume	2011
local.citation.startingPage	1
local.citation.endingPage	11

Fitxers d'aquest items

Nom:: 485738.pdf
Mida:: 2,194Mb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

Acoustic event detection based on feature-level fusion of audio and video modalities

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora