Detection of fake profiles in online social networks (OSNs)
Tutor / director / evaluatorLeón Abarca, Olga
Document typeMaster thesis
Rights accessOpen Access
Nowadays the communication between people is highly influenced by the Online Social Networks (OSNs). Immense number of personal, professional and political thoughts are shared online every single day. Thus, OSNs are attractive for cyber criminals who are trying to exploit their weaknesses and vulnerabilities. Fake accounts on OSNs have become a basic threat used in different online attacks. And even if some of these attacks are harmless like generating fake accounts for “likes” on Facebook, followers on Twitter and views on YouTube, other attacks are more serious and can be dangerous online. Influence on trending topics, spread spam advertisements and false political content are just some of the examples how attackers are able to wreak havoc online by using fake profiles. With the increasing number of security and privacy threats, some of the OSNs have adopted security measures to stop the mass creation of the fake accounts. However, those measures are often ineffective by the many tools available on the underground marketplaces that allow people to cheaply acquire fake accounts. In this regard, this thesis aims to detect the fake profiles on a very popular OSN, Twitter, with the help of machine learning algorithms. The first key contribution is the research on the appropriate machine learning techniques. Support Vector Machine, Random Forest and k-Nearest Neighbour were the three supervised learning algorithms. In addition to them, one clustering algorithm was tested, namely k-Means. The next contribution is the acquisition of labelled data related to real and fake profiles in Twitter for the training phase. After analysing the behaviour of the users and their tweets activities, an extensive dataset of 12 features was created. The named Fake profile´s detection dataset plays key role in distinguishing fake accounts among real ones and it is applied on the machine learning algorithms. Analysis of the results has been performed in five different scenarios. The classifiers achieve accuracy score of around 92% for separating the fake profiles from the real ones and the clustering algorithm is able to detect all fake profiles. Finally, for testing purposes were analysed some of the followers of Donald Trump with the already trained models.