Audiovisual event detection towards scene understanding
Document typeConference report
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Rights accessRestricted access - publisher's policy
Acoustic events produced in meeting environments may contain useful information for perceptually aware interfaces and multimodal behavior analysis. In this paper, a system to detect and recognize these events from a multimodal perspective is presented combining information from multiple cameras and microphones. First, spectral and temporal features are extracted from a single audio channel and spatial localization is achieved by exploiting cross-correlation among microphone arrays. Second, several video cues obtained from multiperson tracking, motion analysis, face recognition, and object detection provide the visual counterpart of the acoustic events to be detected. A multimodal data fusion at score level is carried out using two approaches: weighted mean average and fuzzy integral. Finally, a multimodal database containing a rich variety of acoustic events has been recorded including manual annotations of the data. A set of metrics allow assessing the performance of the presented algorithms. This dataset is made publicly available for research purposes.
CitationCanton, C. [et al.]. Audiovisual event detection towards scene understanding. A: IEEE Conference on Computer Vision and Pattern Recognition. "2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition workshops: CVPR workshops 2009: Miami Beach, Florida, USA: 20-25 June 2009". Institute of Electrical and Electronics Engineers (IEEE), 2009, p. 840-847.