Improving the performance of a deep learning framework on high-performance computing (HPC) systems

Carregant...
Miniatura
El pots comprar en digital a:
El pots comprar en paper a:

Projectes de recerca

Unitats organitzatives

Número de la revista

Títol de la revista

ISSN de la revista

Títol del volum

Cita com:

Correu electrònic de l'autor

martillopartEmail separatorgmail.com

Tribunal avaluador

Realitzat a/amb

Tipus de document

Projecte Final de Màster Oficial

Condicions d'accés

Accés obert

Llicència

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització de la persona titular dels drets

Assignatures relacionades

Assignatures relacionades

Publicacions relacionades

Datasets relacionats

Datasets relacionats

Projecte CCD

Abstract

In this work we intend to improve the performance of the library Pytorch 1.13.1 in HighPerformance Computing (HPC) applications for Central Processing Units (CPUs). The Pytorch framework is an open-source library that is intended to ease the burden of programming neural networks for Machine Learning (ML) purposes. Since its creation by Facebook, Pytorch has offered parallelism features. However, these options are fixed, meaning that they can’t be changed during the training and inference processes of neural networks. In these processes, some sections of a network have more tasks that can be parallelized than others. Therefore, if the parallelism parameters offered by Pytorch can only be selected in a fixed way, the program won’t run at its best efficiency for the most part. A solution to that problem would be to dynamically change the parallelism features in Pytorch, according to the nature of the neural network architecture at a certain layer. To showcase this, we selected a type of neural network called Long Short-Term Memory (LSTM), which has varying widths. In other words, we chose a neural network containing sections where many operations can run in parallel, and sections where only a few operations can run in parallel, or none. The network we’ve used is currently the state-ofthe-art for NLP (Natural Language Processing), making it the perfect network to study for HPC applications. Such applications contain tasks like machine translation, text generation and next-word prediction. In the present study, we’ve developed a use case where an LSTM network is used in an inference process. In the use case, the network ran using three different settings: without any parallelism configurations, with fixed parallelism configurations and by dynamically tuning the parallelism configurations. The objective of this work is to show that by approaching parallelism in a dynamic way, many Pytorch applications can see a huge improvement in terms of performance. Because of that, many companies that currently use these technologies for their products, such as Google, Facebook or Tesla, could see their costs reduced by a large margin. Moreover, by improving the performance of Pytorch 1.13.1 in inference, we could deploy deep learning models in other devices that previously couldn’t run them, such as mobile and edge devices. In addition, not only there’s a substantial economic interest behind our work, but also an environmental side to it: by improving the performance of deep learning frameworks in training and inference, we can reduce the carbon footprints generated during these processes. This is especially important for HPC applications, where the environmental cost of computing is very high. To showcase these results, we programmed several use cases with Python, in a HPC environment offered by the Barcelona Supercomputing Center (BSC). In the end, we achieved a 10% improvement in performance, and created simple guidelines to outperform 78% of Pytorch’s configurations

Descripció

Provinença

Titulació

MÀSTER UNIVERSITARI EN GESTIÓ D'EMPRESES DE TECNOLOGIA I D'ENGINYERIA (Pla 2016)

Document relacionat

Citació

Ajut

DOI

Versió de l'editor

Altres identificadors

Referències