Scaling a convolutional neural network for classification of adjective noun pairs with TensorFlow on GPU clusters

Campos, Víctor; Sastre, Francesc; Yagües, Maurici; Torres Viñals, Jordi; Giró Nieto, Xavier

doi:10.1109/CCGRID.2017.110

Visualitza/Obre

Scaling+a+Convolutional+Neural+Network+for.pdf (308,6Kb)

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Tipus de documentText en actes de congrés

Data publicació2017

EditorInstitute of Electrical and Electronics Engineers (IEEE)

Condicions d'accésAccés obert

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

ProjectePROCESADO DE INFORMACION HETEROGENEA Y SEÑALES EN GRAFOS PARA BIG DATA. APLICACION EN CRIBADO DE ALTO RENDIMIENTO, TELEDETECCION, MULTIMEDIA Y HCI. (MINECO-TEC2013-43935-R)

Abstract

Deep neural networks have gained popularity in recent years, obtaining outstanding results in a wide range of applications such as computer vision in both academia and multiple industry areas. The progress made in recent years cannot be understood without taking into account the technological advancements seen in key domains such as High Performance Computing, more specifically in the Graphic Processing Unit (GPU) domain. These kind of deep neural networks need massive amounts of data to effectively train the millions of parameters they contain, and this training can take up to days or weeks depending on the computer hardware we are using. In this work, we present how the training of a deep neural network can be parallelized on a distributed GPU cluster. The effect of distributing the training process is addressed from two different points of view. First, the scalability of the task and its performance in the distributed setting are analyzed. Second, the impact of distributed training methods on the training times and final accuracy of the models is studied. We used TensorFlow on top of the GPU cluster of servers with 2 K80 GPU cards, at Barcelona Supercomputing Center (BSC). The results show an improvement for both focused areas. On one hand, the experiments show promising results in order to train a neural network faster. The training time is decreased from 106 hours to 16 hours in our experiments. On the other hand we can observe how increasing the numbers of GPUs in one node rises the throughput, images per second, in a near-linear way. Morever an additional distributed speedup of 10.3 is achieved with 16 nodes taking as baseline the speedup of one node.

CitacióCampos, V., Sastre, F., Yagües, M., Torres, J., Giró, X. Scaling a convolutional neural network for classification of adjective noun pairs with TensorFlow on GPU clusters. A: IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. "2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing: 14-17 May 2017, Madrid, Spain: proceedings". Madrid: Institute of Electrical and Electronics Engineers (IEEE), 2017, p. 677-682.

URIhttp://hdl.handle.net/2117/107501

DOI10.1109/CCGRID.2017.110

ISBN978-1-5090-6610-0

Versió de l'editorhttp://dl.acm.org/citation.cfm?id=3101207

Col·leccions

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
Scaling+a+Convolutional+Neural+Network+for.pdf		308,6Kb	PDF	Visualitza/Obre

UPCommons. Portal del coneixement obert de la UPC

Scaling a convolutional neural network for classification of adjective noun pairs with TensorFlow on GPU clusters

Visualitza/Obre

Explora