Adapting deep neural networks to a low-power environment

Mañas Sánchez, Oscar

Visualitza/Obre

126470.pdf (2,570Mb)

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Mañas Sánchez, Oscar

Tutor / directorGonzález Colás, Antonio María

Tipus de documentTreball Final de Grau

Data2017

Condicions d'accésAccés obert

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

Abstract

These days, working with deep neural networks goes hand in hand with the use of GPUs. Once a deep neural network has been trained for hours, days, or even weeks on a desktop GPU, it is deployed in the field where it runs inference computations, which are far less expensive than training. This fact, in conjunction with the mobile nature of deep learning applications, makes very interesting the possibility of running inference locally on a mobile device. There already exist mobile applications that can perform tasks involving deep neural networks, but they rely on remote servers to run the most expensive computations. This is not ideal because the user’s privacy may be compromised or the algorithm performance may be damaged due to latency issues on a poor network connection. In this project, the possibility of running inference natively on a mobile GPU is explored. One of the main applications of deep learning is object recognition, which encompasses different problems such as classification, identification and detection. In this project, we select a very deep neural network called Faster R-CNN, which solves a detection problem, and optimize it to run natively on a mobile platform. This innovation provides to a mobile device, such as a smartphone or a tablet, the capability of identifying objects and its location in images, potentially improving the performance of applications that currently employ the device camera combined with deep neural networks. However, mobile devices have limitations in power, memory, and compute capability. This makes power and memory hungry applications such as deep neural networks hard to deploy, requiring smart software design. As a result, mobile presents both an opportunity and challenge for machine learning systems. A preliminary profiling of the network on the Nvidia Jetson TX1 module, a stateof-the-art platform used in modern smartphones and handheld consoles, shows that the convolutional and fully-connected layers take most of the forward pass execution time, up to 88.16% of the total. The network parameters take 548.3 MBytes of space, an 87.2% of which belong to fully-connected layers. Hence, the main performance and energy bottlenecks are on the convolutional and fully-connected layers. In order to overcome these bottlenecks, two optimizations are proposed. In first place, we use half-precision floating points instead of single-precision; this reduces by half the memory bandwidth, improving performance and providing energy savings. In second place, we implement a neuron-pruning technique to remove up to 80% of the neurons in the fully-connected layers; pruning reduces the memory footprint of the network model and the amount of FP operations, reducing both energy consumption and execution time. To evaluate the aforementioned optimizations, thorough experimentation is carried out on a Nvidia Jetson TX1 module. Results show that, combining all the optimizations, we obtain, on average, a speedup of 1.55x, an energy reduction of 31.3%, an improvement in energy-delay of 2.26x and a memory footprint reduction of 86%.

MatèriesNeural networks (Computer science), Machine learning, Xarxes neuronals (Informàtica), Aprenentatge automàtic

TitulacióGRAU EN ENGINYERIA INFORMÀTICA (Pla 2010)

URIhttp://hdl.handle.net/2117/106673

Col·leccions

Facultat d'Informàtica de Barcelona - Grau en Enginyeria Informàtica (Pla 2010) [2.482]

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
126470.pdf		2,570Mb	PDF	Visualitza/Obre

UPCommons. Portal del coneixement obert de la UPC

Adapting deep neural networks to a low-power environment

Visualitza/Obre

Explora