Analyzing distances in word embeddings and their relation with seme analysis

View/Open
Document typeConference report
Defense date2019
PublisherIOS Press
Rights accessOpen Access
Abstract
Word embeddings have recently become a fundamental tool of Natural Language Processing, with application to tasks like machine translation or image annotation. The high-dimensional space defined by these embeddings is typically explored and exploited through distance-based operations. In this paper we work on the problem of finding words related between them in a text embedding. This relationship can be of different kind, we focus in semantic relations like synonymy and antonym. We explore the idea of using the distance between norms instead of, like other authors has done before, the vector that units them. We present different norms, some of them well known in the literature and others no so widely used and also we introduce a new one and its theoretical mathematical framework. We also give an explanation of why them work properly or not and compare their performance on the two most used embeddings, GloVe and Word2Vec.
CitationGijón, M.; Vilalta, A.; Garcia-Gasulla, D. Analyzing distances in word embeddings and their relation with seme analysis. A: International Conference of the Catalan Association for Artificial Intelligence. "Proceedings of the 22nd International Conference of the Catalan Association for Artificial Intelligence". IOS Press, 2019, p. 407-416.
ISBN978-1-64368-015-6
Publisher versionhttp://ebooks.iospress.nl/volumearticle/52866
Files | Description | Size | Format | View |
---|---|---|---|---|
Manuel_TFM_CCIA_19.pdf | 155,1Kb | View/Open |
All rights reserved. This work is protected by the corresponding intellectual and industrial
property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public
communication or transformation of this work are prohibited without permission of the copyright holder