An oracle for guiding large-scale model/hybrid parallel training of convolutional neural networks
Visualitza/Obre
Tipus de documentComunicació de congrés
Data publicació2021
EditorEuropean Network of Excellence on High Performance and Embedded Architecture and Compilation (HiPEAC)
Condicions d'accésAccés obert
Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i
industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva
reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets
Abstract
Deep Neural Network (DNN) frameworks use distributed training to enable faster time to convergence and alleviate memory capacity limitations when training large models and/or using high dimension inputs. With the steady increase in datasets and model sizes, model/hybrid parallelism is deemed to have an important role in the future of distributed training of DNNs. We analyze the compute, communication, and memory requirements of Convolutional Neural Networks (CNNs) to understand the trade-offs between different parallelism approaches on performance and scalability. We leverage our model-driven analysis to be the basis for an oracle utility which can help in detecting the limitations and bottlenecks of different parallelism approaches at scale. We evaluate the oracle on six parallelization strategies, with four CNN models and multiple datasets (2D and 3D), on up to 1024 GPUs. The results demonstrate that the oracle has an average accuracy of about 86.74% when compared to empirical results, and as high as 97.57% for.
CitacióKahira, A. [et al.]. An oracle for guiding large-scale model/hybrid parallel training of convolutional neural networks. A: International Summer School on Advanced Computer Architecture and Compilation for High-Performance and Embedded Systems. "ACACES 2021 poster abstracts: September 15, 2021, Fiuggi, Italy". European Network of Excellence on High Performance and Embedded Architecture and Compilation (HiPEAC), 2021, p. 37-40. ISBN 978-88-905806-8-0.
ISBN978-88-905806-8-0
Col·leccions
- Doctorat en Arquitectura de Computadors - Ponències/Comunicacions de congressos [282]
- Computer Sciences - Ponències/Comunicacions de congressos [559]
- CAP - Grup de Computació d'Altes Prestacions - Ponències/Comunicacions de congressos [784]
- Departament d'Arquitectura de Computadors - Ponències/Comunicacions de congressos [1.945]
Fitxers | Descripció | Mida | Format | Visualitza |
---|---|---|---|---|
Kahira et al.pdf | 674,8Kb | Visualitza/Obre |