The semanticity of catalan words: quantitative linguistics in the era of large language models

dc.contributor.authorCatala Roig, Neus
dc.contributor.authorCasas Fernández, Bernardino
dc.contributor.authorHernández Fernández, Antonio
dc.contributor.groupUniversitat Politècnica de Catalunya. IDEAI-UPC - Intelligent Data sciEnce and Artificial Intelligence Research Group
dc.contributor.groupUniversitat Politècnica de Catalunya. LQMC - Lingüística Quantitativa, Matemàtica i Computacional
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament de Ciències de la Computació
dc.contributor.otherUniversitat Politècnica de Catalunya. Institut de Ciències de l'Educació
dc.date.accessioned2024-08-02T06:54:24Z
dc.date.available2024-08-02T06:54:24Z
dc.date.issued2024-07
dc.description.abstractThe emergence of Large Language Models (LLMs) like ChatGPT has significantly transformed both theoretical and applied linguistics, raising a profound debate within linguistics. These models, such as GPT (Generative Pre-trained Transformer) series, have revolutionized the way linguists approach language analysis and comprehension. In contrast to traditional Quantitative Linguistics (QL) and conventional linguistic laws like Zipf's laws (Zipf, 1949), LLMs leverage massive datasets to generate linguistic patterns, syntactic structures, and semantic nuances in a comprehensive manner. In light of this, we recently introduced a novel quantitative concept called ”semanticity” which establishes a connection between a word’s potential meanings and its position within the linguistic network. To explore this notion, we conduct a comprehensive analysis of Catalan using extensive written corpora, leveraging the resources of the official dictionary (DIEC2). Our findings reveal that the semanticity of words provides a straightforward and quantitative classification for content and function words and for other word types in Catalan, allowing for the integration of both their semantic and syntactic attributes into this single quantitative parameter.
dc.description.peerreviewedPeer Reviewed
dc.description.versionPostprint (published version)
dc.format.extent20 p.
dc.identifier.citationCatalà Roig, N.; Casas, B.; Hernandez Fernandez, A. The semanticity of catalan words: quantitative linguistics in the era of large language models. A: "Tejiendo palabras: explorando la lengua, la lingüística y el proceso de traducción en la era de la inteligencia artificial". Madrid: Dykinson, 2024, p. 249-268.
dc.identifier.isbn9788411709231
dc.identifier.urihttps://hdl.handle.net/2117/413308
dc.language.isoeng
dc.publisherDykinson
dc.relation.publisherversionhttps://www.dykinson.com/libros/tejiendo-palabras-explorando-la-lengua-la-linguistica-y-el-proceso-de-traduccion-en-la-era-de-la-inteligencia-artificial/9788411709231/
dc.rights.accessOpen Access
dc.rights.licensenameAttribution-NonCommercial 4.0 International
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/
dc.subjectÀrees temàtiques de la UPC::Informàtica::Intel·ligència artificial
dc.subject.lcshSemantic networks (Information theory)
dc.subject.lcshCatalan language -- Semantics
dc.subject.lcshLinguistic models
dc.subject.lemacXarxes semàntiques (Teoria de la informació)
dc.subject.lemacCatalà -- Semàntica
dc.subject.lemacModels lingüístics
dc.subject.otherSemanticity
dc.subject.otherCatalan
dc.subject.otherQuantitative linguistics
dc.subject.otherDIEC2
dc.subject.otherCTILC
dc.subject.otherLinguistic networks
dc.subject.otherContent words
dc.subject.otherFunction words
dc.titleThe semanticity of catalan words: quantitative linguistics in the era of large language models
dc.typePart of book or chapter of book
dspace.entity.typePublication
local.citation.authorCatalà Roig, N.; Casas, B.; Hernandez Fernandez, A.
local.citation.endingPage268
local.citation.publicationNameTejiendo palabras: explorando la lengua, la lingüística y el proceso de traducción en la era de la inteligencia artificial
local.citation.pubplaceMadrid
local.citation.startingPage249
local.identifier.drac39521013

Fitxers

Paquet original

Mostrant 1 - 1 de 1
Carregant...
Miniatura
Nom:
semanticity_catalan_words.pdf
Mida:
476.06 KB
Format:
Adobe Portable Document Format
Descripció:
Capítol