Show simple item record

dc.contributor.authorSurís Coll-Vinent, Dídac
dc.contributor.authorDuarte, Amanda
dc.contributor.authorSalvador Aguilera, Amaia
dc.contributor.authorTorres Viñals, Jordi
dc.contributor.authorGiró Nieto, Xavier
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions
dc.date.accessioned2019-02-14T10:06:24Z
dc.date.available2019-02-14T10:06:24Z
dc.date.issued2019
dc.identifier.citationSurís, D. [et al.]. Cross-modal embeddings for video and audio retrieval. A: Women in Computer Vision Workshop. "Computer Vision, ECCV 2018 Workshops: Munich, Germany, September 8-14, 2018: proceedings, part IV". Berlín: Springer, 2019, p. 711-716.
dc.identifier.isbn978-3-030-11018-5
dc.identifier.otherhttps://imatge.upc.edu/web/publications/cross-modal-embeddings-video-and-audio-retrieval
dc.identifier.urihttp://hdl.handle.net/2117/129095
dc.description.abstractIn this work, we explore the multi-modal information provided by the Youtube-8M dataset by projecting the audio and visual features into a common feature space, to obtain joint audio-visual embeddings. These links are used to retrieve audio samples that fit well to a given silent video, and also to retrieve images that match a given query audio. The results in terms of Recall@K obtained over a subset of YouTube-8M videos show the potential of this unsupervised approach for cross-modal feature learning.
dc.format.extent6 p.
dc.language.isoeng
dc.publisherSpringer
dc.subjectÀrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la imatge i del senyal vídeo
dc.subject.lcshMachine learning
dc.subject.lcshNeural networks (Computer science)
dc.subject.lcshImage processing
dc.subject.otherCross-modal
dc.subject.otherRetrieval
dc.subject.otherYouTube-8M
dc.titleCross-modal embeddings for video and audio retrieval
dc.typeConference report
dc.subject.lemacAprenentatge automàtic
dc.subject.lemacXarxes neuronals (Informàtica)
dc.subject.lemacImatges -- Processament
dc.contributor.groupUniversitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
dc.contributor.groupUniversitat Politècnica de Catalunya. GPI - Grup de Processament d'Imatge i Vídeo
dc.identifier.doi10.1007/978-3-030-11018-5_62
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttps://link.springer.com/chapter/10.1007/978-3-030-11018-5_62
dc.rights.accessOpen Access
drac.iddocument23845936
dc.description.versionPostprint (author's final draft)
dc.relation.projectidinfo:eu-repo/grantAgreement/MINECO/1PE/TEC2016-75976-R
upcommons.citation.authorSurís, D.; Duarte, A.; Salvador, A.; Torres, J.; Giró, X.
upcommons.citation.contributorWomen in Computer Vision Workshop
upcommons.citation.pubplaceBerlín
upcommons.citation.publishedtrue
upcommons.citation.publicationNameComputer Vision, ECCV 2018 Workshops: Munich, Germany, September 8-14, 2018: proceedings, part IV
upcommons.citation.startingPage711
upcommons.citation.endingPage716


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder