Enhancing text spotting with a language model and visual context information

View/Open
Cita com:
hdl:2117/183084
Document typeConference report
Defense date2018
PublisherIOS Press
Rights accessOpen Access
Abstract
This paper addresses the problem of detecting and recognizing text in images acquired ‘in the wild’. This is a severely under-constrained problem which needs to tackle a number of challenges including large occlusions, changing light- ing conditions, cluttered backgrounds and different font types and sizes. In order to address this problem we leverage on recent and successful developments in the cross-fields of machine learning and natural language understanding. In particular, we initially rely on off-the-shelf deep networks already trained with large amounts of data and that provide a series of text hypotheses per input image. The outputs of this network are then combined with different priors obtained from both the se- mantic interpretation of the image and from a scene-based language model. As a result of this combination, the performance of the original network is consistently boosted. We validate our approach on ICDAR’17 shared task dataset.
CitationSabir, A.; Moreno-Noguer, F.; Padro, L. Enhancing text spotting with a language model and visual context information. A: Congrés Internacional de l’Associació Catalana d’Intel·ligència Artificial. "Artificial intelligence research and development : proceedings of the 21th International Conference of the Catalan Association for Artificial Intelligence". IOS Press, 2018, p. 271-280.
ISBN978-1-61499-917-1
Publisher versionhttp://ebooks.iospress.nl/publication/50417
Collections
- IRI - Institut de Robòtica i Informàtica Industrial, CSIC-UPC - Ponències/Comunicacions de congressos [463]
- GPLN - Grup de Processament del Llenguatge Natural - Ponències/Comunicacions de congressos [187]
- Departament de Ciències de la Computació - Ponències/Comunicacions de congressos [1.118]
- ROBiri - Grup de Robòtica de l'IRI - Ponències/Comunicacions de congressos [172]
Except where otherwise noted, content on this work
is licensed under a Creative Commons license
:
Attribution-NonCommercial-NoDerivs 3.0 Spain