Show simple item record

dc.contributor.authorRamirez-Gargallo, Guillem
dc.contributor.authorGarcia-Gasulla, Marta
dc.contributor.authorMantovani, Filippo
dc.contributor.otherBarcelona Supercomputing Center
dc.date.accessioned2019-04-15T15:27:24Z
dc.date.available2019-04-15T15:27:24Z
dc.date.issued2019
dc.identifier.citationRamirez-Gargallo, G.; Garcia-Gasulla, M.; Mantovani, F. TensorFlow on state-of-the-art HPC clusters: a machine learning use case. A: "2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)". 2019, p. 1-8.
dc.identifier.urihttp://hdl.handle.net/2117/131762
dc.description.abstractThe recent rapid growth of the data-flow programming paradigm enabled the development of specific architectures, e.g., for machine learning. The most known example is the Tensor Processing Unit (TPU) by Google. Standard data-centers, however, still can not foresee large partitions dedicated to machine learning specific architectures. Within data-centers, the High-Performance Computing (HPC) clusters are highly parallel machines targeting a broad class of compute-intensive workflows, as such they can be used for tackling machine learning challenges. On top of this, HPC architectures are rapidly changing, including accelerators and instruction sets other than the classical x86 CPUs. In this blurry scenario, identifying which are the best hardware/software configurations to efficiently support machine learning workloads on HPC clusters is not trivial. In this paper, we considered the workflow of TensorFlow for image recognition. We highlight the strong dependency of the performance in the training phase on the availability of arithmetic libraries optimized for the underlying architecture. Following the example of Intel leveraging the MKL libraries for improving the TensorFlow performance, we plugged the Arm Performance Libraries into TensorFlow and tested on an HPC cluster based on Marvell ThunderX2 CPUs. Also, we performed a scalability study on three state-of-the-art HPC clusters based on different CPU architectures, x86 Intel Skylake, Arm-v8 Marvell ThunderX2, and PowerPC IBM Power9.
dc.format.extent8 p.
dc.language.isoeng
dc.publisherIEEE
dc.subjectÀrees temàtiques de la UPC::Informàtica
dc.subject.lcshHigh performance computing
dc.subject.otherTensorFlow
dc.subject.otherHigh Performance Computing
dc.subject.otherParallel Computing
dc.subject.otherMachine Learning
dc.subject.otherImage Recognition
dc.subject.otherTraining
dc.subject.otherArm
dc.subject.otherPower9
dc.subject.otherx86
dc.subject.otherClusters
dc.titleTensorFlow on state-of-the-art HPC clusters: a machine learning use case
dc.typeConference lecture
dc.subject.lemacSupercomputadors
dc.identifier.doi10.1109/CCGRID.2019.00067
dc.relation.publisherversionhttps://ieeexplore.ieee.org/document/8752892
dc.rights.accessOpen Access
dc.description.versionPostprint (author's final draft)
local.citation.publicationName2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)
local.citation.startingPage1
local.citation.endingPage8


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder