Full-network embedding in a multimodal embedding pipeline
| dc.contributor.author | Vilalta Arias, Armand |
| dc.contributor.author | Garcia Gasulla, Dario |
| dc.contributor.author | Parés Pont, Ferran |
| dc.contributor.author | Moreno Vázquez, Jonatan |
| dc.contributor.author | Ayguadé Parra, Eduard |
| dc.contributor.author | Labarta Mancho, Jesús José |
| dc.contributor.author | Cortés García, Claudio Ulises |
| dc.contributor.author | Suzumura, Toyotaro |
| dc.contributor.group | Universitat Politècnica de Catalunya. KEMLG - Grup d'Enginyeria del Coneixement i Aprenentatge Automàtic |
| dc.contributor.group | Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions |
| dc.contributor.other | Universitat Politècnica de Catalunya. Doctorat en Intel·ligència Artificial |
| dc.contributor.other | Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors |
| dc.contributor.other | Universitat Politècnica de Catalunya. Departament de Ciències de la Computació |
| dc.contributor.other | Barcelona Supercomputing Center |
| dc.date.accessioned | 2020-07-09T08:59:13Z |
| dc.date.available | 2020-07-09T08:59:13Z |
| dc.date.issued | 2017 |
| dc.description.abstract | The current state-of-the-art for image annotation and image retrieval tasks is obtained through deep neural networks, which combine an image representation and a text representation into a shared embedding space. In this paper we evaluate the impact of using the Full-Network embedding in this setting, replacing the original image representation in a competitive multimodal embedding generation scheme. Unlike the one-layer image embeddings typically used by most approaches, the Full-Network embedding provides a multi-scale representation of images, which results in richer characterizations. To measure the influence of the Full-Network embedding, we evaluate its performance on three different datasets, and compare the results with the original multimodal embedding generation scheme when using a one-layer image embedding, and with the rest of the state-of-the-art. Results for image annotation and image retrieval tasks indicate that the Full-Network embedding is consistently superior to the one-layer embedding. These results motivate the integration of the Full-Network embedding on any multimodal embedding generation scheme, something feasible thanks to the flexibility of the approach |
| dc.description.peerreviewed | Peer Reviewed |
| dc.description.sponsorship | This work is partially supported by the Joint Study Agreement no. W156463 under the IBM/BSC Deep Learning Center agreement, by the Spanish Government through Programa Severo Ochoa (SEV-2015- 0493), by the Spanish Ministry of Science and Technology through TIN2015-65316-P project, by the Generalitat de Catalunya (contracts 2014-SGR-1051), and by the Core Research for Evolutional Science and Technology (CREST) program of Japan Science and Technology Agency (JST). |
| dc.description.version | Postprint (published version) |
| dc.format.extent | 9 p. |
| dc.identifier.citation | Vilalta, A. [et al.]. Full-network embedding in a multimodal embedding pipeline. A: Workshop on Semantic Deep Learning. "Proceedings of the 2nd Workshop on Semantic Deep Learning (SemDeep-2): September 19, 2017, Montpellier, France". Stroudsburg, PA: Association for Computational Linguistics, 2017, p. 24-32. |
| dc.identifier.other | https://arxiv.org/abs/1707.09872 |
| dc.identifier.uri | https://hdl.handle.net/2117/192716 |
| dc.language.iso | eng |
| dc.publisher | Association for Computational Linguistics |
| dc.relation.projectid | info:eu-repo/grantAgreement/MINECO//TIN2015-65316-P/ES/COMPUTACION DE ALTAS PRESTACIONES VII/ |
| dc.relation.projectid | info:eu-repo/grantAgreement/AGAUR/V PRI/2014 SGR 1051 |
| dc.relation.publisherversion | https://www.aclweb.org/anthology/W17-7304/ |
| dc.rights.access | Open Access |
| dc.rights.licensename | Attribution 4.0 International |
| dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ |
| dc.subject | Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàtic |
| dc.subject.lcsh | Machine learning |
| dc.subject.lcsh | Image data mining |
| dc.subject.lcsh | Image analysis |
| dc.subject.lemac | Aprenentatge automàtic |
| dc.subject.lemac | Imatges -- Anàlisi |
| dc.subject.other | Artificial intelligence |
| dc.subject.other | Deep learning |
| dc.subject.other | Transfer learning |
| dc.subject.other | Multimodal embedding |
| dc.subject.other | Image retrieval |
| dc.subject.other | Image annotation |
| dc.subject.other | Semantic deep learning |
| dc.title | Full-network embedding in a multimodal embedding pipeline |
| dc.type | Conference lecture |
| dspace.entity.type | Publication |
| local.citation.author | Vilalta, A.; Garcáa-Gasulla, D.; Parés, F.; Moreno, J.; Ayguadé, E.; Labarta, J.; Cortés, U.; Suzumura, T. |
| local.citation.contributor | Workshop on Semantic Deep Learning |
| local.citation.endingPage | 32 |
| local.citation.publicationName | Proceedings of the 2nd Workshop on Semantic Deep Learning (SemDeep-2): September 19, 2017, Montpellier, France |
| local.citation.pubplace | Stroudsburg, PA |
| local.citation.startingPage | 24 |
| local.identifier.drac | 28845399 |
Fitxers
Paquet original
1 - 1 de 1
Carregant...
- Nom:
- W17-7304.pdf
- Mida:
- 230.91 KB
- Format:
- Adobe Portable Document Format
- Descripció:

