Detection of overlapped acoustic events using fusion of audio and video modalities
Document typeConference report
Rights accessOpen Access
Acoustic event detection (AED) may help to describe acoustic scenes, and also contribute to improve the robustness of speech technologies. Even if the number of considered events is not large, that detection becomes a difficult task in scenarios where the AEs are produced rather spontaneously and often overlap in time with speech. In this work, fusion of audio and video information at either feature or decision level is performed, and the results are compared for different levels of signal overlaps. The best improvement with respect to an audio-only baseline system was obtained using the featurelevel fusion technique. Furthermore, a significant recognition rate improvement is observed where the AEs are overlapped with loud speech, mainly due to the fact that the video modality remains unaffected by the interfering sound.
CitationButko, T.; Nadeu, C. Detection of overlapped acoustic events using fusion of audio and video modalities. A: Jornadas en Tecnología del Habla and Iberian SLTech Workshop. "VI Jornadas en Tecnología del Habla and II Iberian SLTech Workshop". 2010, p. 165-168.