Show simple item record

dc.contributor.authorEllebracht, Lily Delores
dc.contributor.authorRamisa Ayats, Arnau
dc.contributor.authorShantharam Madhyastha, Pranava Swaroop
dc.contributor.authorCordero Rama, Jose Alejandro
dc.contributor.authorMoreno-Noguer, Francesc
dc.contributor.authorQuattoni, Ariadna Julieta
dc.contributor.otherInstitut de Robòtica i Informàtica Industrial
dc.date.accessioned2016-03-15T15:21:34Z
dc.date.available2016-03-15T15:21:34Z
dc.date.issued2015
dc.identifier.citationEllebracht, L., Ramisa, A., Shantharam, P., Cordero, J., Moreno-Noguer, F., Quattoni, A. Semantic tuples for evaluation of image sentence generation. A: Workshop on Vision and Language. "Proceedings of the 4th Workshop on Vision and Language, 2015, Lisbon.". Lisboa: 2015, p. 18-28.
dc.identifier.urihttp://hdl.handle.net/2117/84419
dc.description.abstractThe automatic generation of image captions has received considerable attention. The problem of evaluating caption generation systems, though, has not been that much explored. We propose a novel evaluation approach based on comparing the underlying visual semantics of the candidate and ground-truth captions. With this goal in mind we have defined a semantic representation for visually descriptive language and have augmented a subset of the Flickr-8K dataset with semantic annotations. Our evaluation metric (BAST) can be used not only to compare systems but also to do error analysis and get a better understanding of the type of mistakes a system does. To compute BAST we need to predict the semantic representation for the automatically generated captions. We use the Flickr-ST dataset to train classifiers that predict STs so that evaluation can be fully automated.
dc.format.extent11 p.
dc.language.isoeng
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.subjectÀrees temàtiques de la UPC::Informàtica::Automàtica i control
dc.subject.othercomputer vision
dc.subject.othernatural language processing
dc.titleSemantic tuples for evaluation of image sentence generation
dc.typeConference report
dc.contributor.groupUniversitat Politècnica de Catalunya. ROBiri - Grup de Robòtica de l'IRI
dc.contributor.groupUniversitat Politècnica de Catalunya. GPLN - Grup de Processament del Llenguatge Natural
dc.description.peerreviewedPeer Reviewed
dc.subject.inspecClassificació INSPEC::Pattern recognition::Computer vision
dc.relation.publisherversionhttps://www.cs.cmu.edu/~ark/EMNLP-2015/proceedings/VL/pdf/VL06.pdf
dc.rights.accessOpen Access
local.identifier.drac17548529
dc.description.versionPostprint (author's final draft)
local.citation.authorEllebracht, L.; Ramisa, A.; Shantharam, P.; Cordero, J.; Moreno-Noguer, F.; Quattoni, A.
local.citation.contributorWorkshop on Vision and Language
local.citation.pubplaceLisboa
local.citation.publicationNameProceedings of the 4th Workshop on Vision and Language, 2015, Lisbon.
local.citation.startingPage18
local.citation.endingPage28


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 3.0 Spain
Except where otherwise noted, content on this work is licensed under a Creative Commons license : Attribution-NonCommercial-NoDerivs 3.0 Spain