Performance analysis and optimization of a distributed processing framework for data mining accelerated with graphics processing units

Leon Saiki, Edgar Isaac Hiroshi

dc.contributor	Romero Moral, Óscar
dc.contributor	Calders, Toon
dc.contributor	Tran, Nam-Luc
dc.contributor	Skhiri, Sabri
dc.contributor.author	Leon Saiki, Edgar Isaac Hiroshi
dc.contributor.other	Universitat Politècnica de Catalunya. Departament d'Enginyeria de Serveis i Sistemes d'Informació
dc.date.accessioned	2015-01-28T10:23:20Z
dc.date.available	2015-01-28T10:23:20Z
dc.date.issued	2014-09-05
dc.identifier.uri	http://hdl.handle.net/2099.1/24799
dc.description.abstract	In this age, a huge amount of data is generated every day by human interactions with services. Discovering the patterns of these data are very important to take business decisions. Due to the size of this data, it requires very high intensive computation power. Thus, many frameworks have been developed using Central Processing Units (CPU) implementations to perform this computation. For instance, a distributed and parallel programming model such as Google's MapReduce. On the other hand, since the last half decade, researchers have started using Graphics Processing Units (GPU) performance to process these huge data. Unlike CPU, GPU can execute many tasks in parallel. To measure the performance of GPU, EURA NOVA implemented two data mining algorithms (K-Means and Naive Bayes) in the framework to enable task execution in a distributed manner by considering availability of GPU power in each node. Even though the framework was successfully implemented, when compared to another CPU parallel framework, its performance was very poor. It shows that the framework does not use the performance of GPU effectively. Moreover, it contradicts with the fact that GPU can execute many tasks in parallel and thus, faster than CPU implementation. As a result, this research topic started with the objective to answer how to improve this performance. Specifically, to improve the performance of the K-Means implementation. We also included a new data mining implementation called Expectation Maximization to the framework, taking advantage of each GPU node and the distribution nodes. Furthermore, we address some good practices when implementing data mining in GPU from a sequential design. Working with general purpose GPU is still in development stage. A well known library is Thrust. We used it to achieve the above objectives. Finally, we evaluated our solutions by comparing with other existed CPU frameworks. The results show that we improved the K-Means performance more than 130x, and plugged the expectation maximization implementation into EURA NOVA's framework.
dc.language.iso	eng
dc.publisher	Universitat Politècnica de Catalunya
dc.subject	Àrees temàtiques de la UPC::Informàtica::Sistemes d'informació::Bases de dades
dc.subject.lcsh	Data mining
dc.subject.other	Business Intelligence
dc.subject.other	GPU
dc.subject.other	pattern discovery
dc.subject.other	K-Means
dc.subject.other	Naive Bayes
dc.subject.other	performance
dc.title	Performance analysis and optimization of a distributed processing framework for data mining accelerated with graphics processing units
dc.type	Master thesis
dc.subject.lemac	Mineria de dades
dc.identifier.slug	101342
dc.rights.access	Open Access
dc.date.updated	2015-01-23T16:37:28Z
dc.audience.educationlevel	Màster
dc.audience.mediator	Facultat d'Informàtica de Barcelona
dc.audience.degree	MÀSTER UNIVERSITARI ERASMUS MUNDUS EN TECNOLOGIES DE LA INFORMACIÓ PER A LA INTEL·LIGÈNCIA EMPRESARIAL (Pla 2012)

Fitxers d'aquest items

Nom:: 101342.pdf
Mida:: 2,999Mb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Màster universitari Erasmus Mundus en Tecnologies de la Informació per a la Intel·ligència Empresarial (IT4BI) [18]

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

Performance analysis and optimization of a distributed processing framework for data mining accelerated with graphics processing units

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora