The semanticity of catalan words: quantitative linguistics in the era of large language models
| dc.contributor.author | Catala Roig, Neus |
| dc.contributor.author | Casas Fernández, Bernardino |
| dc.contributor.author | Hernández Fernández, Antonio |
| dc.contributor.group | Universitat Politècnica de Catalunya. IDEAI-UPC - Intelligent Data sciEnce and Artificial Intelligence Research Group |
| dc.contributor.group | Universitat Politècnica de Catalunya. LQMC - Lingüística Quantitativa, Matemàtica i Computacional |
| dc.contributor.other | Universitat Politècnica de Catalunya. Departament de Ciències de la Computació |
| dc.contributor.other | Universitat Politècnica de Catalunya. Institut de Ciències de l'Educació |
| dc.date.accessioned | 2024-08-02T06:54:24Z |
| dc.date.available | 2024-08-02T06:54:24Z |
| dc.date.issued | 2024-07 |
| dc.description.abstract | The emergence of Large Language Models (LLMs) like ChatGPT has significantly transformed both theoretical and applied linguistics, raising a profound debate within linguistics. These models, such as GPT (Generative Pre-trained Transformer) series, have revolutionized the way linguists approach language analysis and comprehension. In contrast to traditional Quantitative Linguistics (QL) and conventional linguistic laws like Zipf's laws (Zipf, 1949), LLMs leverage massive datasets to generate linguistic patterns, syntactic structures, and semantic nuances in a comprehensive manner. In light of this, we recently introduced a novel quantitative concept called ”semanticity” which establishes a connection between a word’s potential meanings and its position within the linguistic network. To explore this notion, we conduct a comprehensive analysis of Catalan using extensive written corpora, leveraging the resources of the official dictionary (DIEC2). Our findings reveal that the semanticity of words provides a straightforward and quantitative classification for content and function words and for other word types in Catalan, allowing for the integration of both their semantic and syntactic attributes into this single quantitative parameter. |
| dc.description.peerreviewed | Peer Reviewed |
| dc.description.version | Postprint (published version) |
| dc.format.extent | 20 p. |
| dc.identifier.citation | Català Roig, N.; Casas, B.; Hernandez Fernandez, A. The semanticity of catalan words: quantitative linguistics in the era of large language models. A: "Tejiendo palabras: explorando la lengua, la lingüística y el proceso de traducción en la era de la inteligencia artificial". Madrid: Dykinson, 2024, p. 249-268. |
| dc.identifier.isbn | 9788411709231 |
| dc.identifier.uri | https://hdl.handle.net/2117/413308 |
| dc.language.iso | eng |
| dc.publisher | Dykinson |
| dc.relation.publisherversion | https://www.dykinson.com/libros/tejiendo-palabras-explorando-la-lengua-la-linguistica-y-el-proceso-de-traduccion-en-la-era-de-la-inteligencia-artificial/9788411709231/ |
| dc.rights.access | Open Access |
| dc.rights.licensename | Attribution-NonCommercial 4.0 International |
| dc.rights.uri | http://creativecommons.org/licenses/by-nc/4.0/ |
| dc.subject | Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial |
| dc.subject.lcsh | Semantic networks (Information theory) |
| dc.subject.lcsh | Catalan language -- Semantics |
| dc.subject.lcsh | Linguistic models |
| dc.subject.lemac | Xarxes semàntiques (Teoria de la informació) |
| dc.subject.lemac | Català -- Semàntica |
| dc.subject.lemac | Models lingüístics |
| dc.subject.other | Semanticity |
| dc.subject.other | Catalan |
| dc.subject.other | Quantitative linguistics |
| dc.subject.other | DIEC2 |
| dc.subject.other | CTILC |
| dc.subject.other | Linguistic networks |
| dc.subject.other | Content words |
| dc.subject.other | Function words |
| dc.title | The semanticity of catalan words: quantitative linguistics in the era of large language models |
| dc.type | Part of book or chapter of book |
| dspace.entity.type | Publication |
| local.citation.author | Català Roig, N.; Casas, B.; Hernandez Fernandez, A. |
| local.citation.endingPage | 268 |
| local.citation.publicationName | Tejiendo palabras: explorando la lengua, la lingüística y el proceso de traducción en la era de la inteligencia artificial |
| local.citation.pubplace | Madrid |
| local.citation.startingPage | 249 |
| local.identifier.drac | 39521013 |
Fitxers
Paquet original
1 - 1 de 1
Carregant...
- Nom:
- semanticity_catalan_words.pdf
- Mida:
- 476.06 KB
- Format:
- Adobe Portable Document Format
- Descripció:
- Capítol



