A unified formal framework for factorial and probabilistic topic modelling
View/Open
Cita com:
hdl:2117/396668
Document typeArticle
Defense date2023-10-21
PublisherMultidisciplinary Digital Publishing Institute (MDPI)
Rights accessOpen Access
Except where otherwise noted, content on this work
is licensed under a Creative Commons license
:
Attribution 4.0 International
Abstract
Topic modelling has become a highly popular technique for extracting knowledge from texts. It encompasses various method families, including Factorial methods, Probabilistic methods, and Natural Language Processing methods. This paper introduces a unified conceptual framework for Factorial and Probabilistic methods by identifying shared elements and representing them using a homogeneous notation. The paper presents 12 different methods within this framework, enabling easy comparative analysis to assess the flexibility and how realistic the assumptions of each approach are. This establishes the initial stage of a broader analysis aimed at relating all method families to this common framework, comprehensively understanding their strengths and weaknesses, and establishing general application guidelines. Also, an experimental setup reinforces the convenience of having harmonized notational schema. The paper concludes with a discussion on the presented methods and outlines future research directions.
CitationGibert, K.; Hernández, Y. A unified formal framework for factorial and probabilistic topic modelling. "Mathematics", 21 Octubre 2023, vol. 11, núm. 20, article 4375.
ISSN2227-7390
Publisher versionhttps://www.mdpi.com/2227-7390/11/20/4375
Files | Description | Size | Format | View |
---|---|---|---|---|
mathematics-11-04375-v2.pdf | 700,2Kb | View/Open |