Show simple item record

dc.contributorRomero Moral, Óscar
dc.contributorCalders, Toon
dc.contributorTran, Nam-Luc
dc.contributorSkhiri, Sabri
dc.contributor.authorLeon Saiki, Edgar Isaac Hiroshi
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament d'Enginyeria de Serveis i Sistemes d'Informació
dc.description.abstractIn this age, a huge amount of data is generated every day by human interactions with services. Discovering the patterns of these data are very important to take business decisions. Due to the size of this data, it requires very high intensive computation power. Thus, many frameworks have been developed using Central Processing Units (CPU) implementations to perform this computation. For instance, a distributed and parallel programming model such as Google's MapReduce. On the other hand, since the last half decade, researchers have started using Graphics Processing Units (GPU) performance to process these huge data. Unlike CPU, GPU can execute many tasks in parallel. To measure the performance of GPU, EURA NOVA implemented two data mining algorithms (K-Means and Naive Bayes) in the framework to enable task execution in a distributed manner by considering availability of GPU power in each node. Even though the framework was successfully implemented, when compared to another CPU parallel framework, its performance was very poor. It shows that the framework does not use the performance of GPU effectively. Moreover, it contradicts with the fact that GPU can execute many tasks in parallel and thus, faster than CPU implementation. As a result, this research topic started with the objective to answer how to improve this performance. Specifically, to improve the performance of the K-Means implementation. We also included a new data mining implementation called Expectation Maximization to the framework, taking advantage of each GPU node and the distribution nodes. Furthermore, we address some good practices when implementing data mining in GPU from a sequential design. Working with general purpose GPU is still in development stage. A well known library is Thrust. We used it to achieve the above objectives. Finally, we evaluated our solutions by comparing with other existed CPU frameworks. The results show that we improved the K-Means performance more than 130x, and plugged the expectation maximization implementation into EURA NOVA's framework.
dc.publisherUniversitat Politècnica de Catalunya
dc.subjectÀrees temàtiques de la UPC::Informàtica::Sistemes d'informació::Bases de dades
dc.subject.lcshData mining
dc.subject.otherBusiness Intelligence
dc.subject.otherpattern discovery
dc.subject.otherNaive Bayes
dc.titlePerformance analysis and optimization of a distributed processing framework for data mining accelerated with graphics processing units
dc.typeMaster thesis
dc.subject.lemacMineria de dades
dc.rights.accessOpen Access
dc.audience.mediatorFacultat d'Informàtica de Barcelona

Files in this item


This item appears in the following Collection(s)

Show simple item record

All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder