Scalability study of Deep Learning algorithms in high performance computer infrastructures

Sastre Cabot, Francesc

Visualitza/Obre

122772.pdf (8,376Mb)

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Sastre Cabot, Francesc

Tutor / directorTorres Viñals, Jordi

Realitzat a/ambBarcelona Supercomputing Center

Tipus de documentProjecte Final de Màster Oficial

Data2017-04-28

Condicions d'accésAccés obert

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

Abstract

Deep learning algorithms base their success on building high learning capacity models with millions of parameters that are tuned in a data-driven fashion. These models are trained by processing millions of examples, so that the development of more accurate algorithms is usually limited by the throughput of the computing devices on which they are trained. This project show how the training of a state-of-the-art neural network for computer vision can be parallelized on a distributed GPU cluster, Minotauro GPU cluster from Barcelona Supercomputing Center with the TensorFlow framework. In this project, two approaches for distributed training are used, the synchronous and the mixed-asynchronous. The effect of distributing the training process is addressed from two different points of view. First, the scalability of the task and its performance in the distributed setting are analyzed. Second, the impact of distributed training methods on the final accuracy of the models is studied. The results show an improvement for both focused areas. On one hand, the experiments show promising results in order to train a neural network faster. The training time is decreased from 106 hours to 16 hours in mixedasynchronous and 12 hours in synchronous. On the other hand we can observe how increasing the numbers of GPUs in one node rises the throughput, images per second, in a near-linear way. Moreover the accuracy can be maintained, like the one node training, in the synchronous methods.

MatèriesComputational grids (Computer systems), Machine learning, Neural networks (Computer science), Computació distribuïda, Aprenentatge automàtic, Xarxes neuronals (Informàtica)

TitulacióMÀSTER UNIVERSITARI EN ENGINYERIA INFORMÀTICA (Pla 2012)

URIhttp://hdl.handle.net/2117/106390

Col·leccions

Màsters oficials - Màster universitari en Enginyeria Informàtica - MEI [132]

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
122772.pdf		8,376Mb	PDF	Visualitza/Obre

UPCommons. Portal del coneixement obert de la UPC

Scalability study of Deep Learning algorithms in high performance computer infrastructures

Visualitza/Obre

Explora