The Self-focus category: motivation reflected on topical coverage in Wikipedia
Document typeMaster thesis
Rights accessOpen Access
”Wikipedia is a free web-based, collaborative, multilingual encyclopedia project supported by the non-profit Wikimedia Foundation” this is the way the definition of Wikipedia in the article of the English language edition starts. This means it can be modified at any time, by anyone and at any place. These bases and their participation success make of Wikipedia an excellent social object of study which, at the same time, for being a technological construct, can be approached by techniques of natural language processing, information retrieval or data mining. However, in the current research there is a clear lack of software which can make an integral approach. Taking this into account, we make an in depth characterization of Wikipedia with the end goal of understanding which elements and structures compound its data and how they can be obtained with an analytical tool. We start with the existing API called wikAPIdia, which we develope until include new functionalities and have it ready to use in multiple scenarios and problematics of social sciences. Looking for a practical case to test it, we review the current state of art in motivation of editors and the topical coverage in the repository. This allows us to consider the aim of understanding Wikipedia from the perspective of having a different cultural configuration for each language. Phrasing it as a question: ”is there a national or self-representative motivation which is reflected in the content and thus disposes them differenciately?”. Autoreferentiality is the concept we present in order to analyse this hypothetical higher interest in local content. An identification and recollection is made on articles from heterogenous topics which can refer to the local history, sport teams or pop culture, but still maintain a semantic relation to the context of editors. Later, we propose a multidimensional analysis of them on features which can be significant indicators, to reach common conclusions and evaluate the language editions through an index of Autoreferentiality. Last, we point out which is the impact of this content and the risk of not considering its existance in the design of applications based on user generated content.