Mostra el registre d'ítem simple

dc.contributor.authorGonzález Tallada, Marc
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.date.accessioned2017-05-15T14:01:12Z
dc.date.issued2016
dc.identifier.citationGonzález, M. Coarse grain parallelization of deep neural networks. A: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. "ACM SIGPLAN Notices (Vol. 51, Issue 8, August 2016, Article No. 1)". Barcelona: Institute of Electrical and Electronics Engineers (IEEE), 2016, p. 1-12.
dc.identifier.isbn0362-1340
dc.identifier.urihttp://hdl.handle.net/2117/104446
dc.description.abstractDeep neural networks (DNN) have recently achieved extraordinary results in domains like computer vision and speech recognition. An essential element for this success has been the introduction of high performance computing (HPC) techniques in the critical step of training the neural network. This paper describes the implementation and analysis of a network-agnostic and convergence-invariant coarse-grain parallelization of the DNN training algorithm. The coarse-grain parallelization is achieved through the exploitation of the batch-level parallelism. This strategy is independent from the support of specialized and optimized libraries. Therefore, the optimization is immediately available for accelerating the DNN training. The proposal is compatible with multi-GPU execution without altering the algorithm convergence rate. The parallelization has been implemented in Caffe, a state-of-the-art DNN framework. The paper describes the code transformations for the parallelization and we also identify the limiting performance factors of the approach. We show competitive performance results for two state-of-the-art computer vision datasets, MNIST and CIFAR-10. In particular, on a 16-core Xeon E5-2667v2 at 3.30GHz we observe speedups of 8x over the sequential execution, at similar performance levels of those obtained by the GPU optimized Caffe version in a NVIDIA K40 GPU.
dc.format.extent12 p.
dc.language.isoeng
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)
dc.subjectÀrees temàtiques de la UPC::Informàtica::Programació
dc.subject.lcshNeural networks (Computer science)
dc.subject.lcshParallel programming (Computer science)
dc.subject.otherPerformance
dc.subject.otherCoarse-grain parallelism
dc.subject.otherShared memory algorithms
dc.subject.otherDeep learning
dc.subject.otherOpenMP
dc.subject.otherStochastic gradient descent
dc.titleCoarse grain parallelization of deep neural networks
dc.typeConference lecture
dc.subject.lemacXarxes neuronals (Informàtica)
dc.subject.lemacProgramació en paral·lel (Informàtica)
dc.contributor.groupUniversitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
dc.identifier.doi10.1145/2851141.2851158
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttp://dl.acm.org/citation.cfm?doid=2851141.2851158
dc.rights.accessRestricted access - publisher's policy
local.identifier.drac19856113
dc.description.versionPostprint (published version)
dc.date.lift10000-01-01
local.citation.authorGonzález, M.
local.citation.contributorACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
local.citation.pubplaceBarcelona
local.citation.publicationNameACM SIGPLAN Notices (Vol. 51, Issue 8, August 2016, Article No. 1)
local.citation.startingPage1
local.citation.endingPage12


Fitxers d'aquest items

Imatge en miniatura

Aquest ítem apareix a les col·leccions següents

Mostra el registre d'ítem simple