This paper describes a system to identify people in broadcast TV shows in a purely unsupervised manner. The system outputs the identity of people that appear, talk and can be identified by using information appearing in the show (in our case, text with person names). Three types of monomodal technologies are used: speech diarization, video diarization and text detection / named entity recognition. These technologies are combined using a linear programming approach
where some restrictions are imposed.
CitationIndia, M., Varas, D., Vilaplana, V., Morros, J.R., Hernando, J. UPC system for the 2015 MediaEval multimodal person discovery in broadcast TV task. A: MediaEval Multimedia Benchmark Workshop. "MediaEval 2015 Multimedia Benchmark Workshop". Wurzen: 2015.
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder. If you wish to make any use of the work not provided for in the law, please contact: email@example.com