Show simple item record

dc.contributorGonzález Colás, Antonio María
dc.contributor.authorMañas Sánchez, Oscar
dc.date.accessioned2017-07-21T07:57:58Z
dc.date.available2017-07-21T07:57:58Z
dc.date.issued2017
dc.identifier.urihttp://hdl.handle.net/2117/106673
dc.description.abstractThese days, working with deep neural networks goes hand in hand with the use of GPUs. Once a deep neural network has been trained for hours, days, or even weeks on a desktop GPU, it is deployed in the field where it runs inference computations, which are far less expensive than training. This fact, in conjunction with the mobile nature of deep learning applications, makes very interesting the possibility of running inference locally on a mobile device. There already exist mobile applications that can perform tasks involving deep neural networks, but they rely on remote servers to run the most expensive computations. This is not ideal because the user’s privacy may be compromised or the algorithm performance may be damaged due to latency issues on a poor network connection. In this project, the possibility of running inference natively on a mobile GPU is explored. One of the main applications of deep learning is object recognition, which encompasses different problems such as classification, identification and detection. In this project, we select a very deep neural network called Faster R-CNN, which solves a detection problem, and optimize it to run natively on a mobile platform. This innovation provides to a mobile device, such as a smartphone or a tablet, the capability of identifying objects and its location in images, potentially improving the performance of applications that currently employ the device camera combined with deep neural networks. However, mobile devices have limitations in power, memory, and compute capability. This makes power and memory hungry applications such as deep neural networks hard to deploy, requiring smart software design. As a result, mobile presents both an opportunity and challenge for machine learning systems. A preliminary profiling of the network on the Nvidia Jetson TX1 module, a stateof-the-art platform used in modern smartphones and handheld consoles, shows that the convolutional and fully-connected layers take most of the forward pass execution time, up to 88.16% of the total. The network parameters take 548.3 MBytes of space, an 87.2% of which belong to fully-connected layers. Hence, the main performance and energy bottlenecks are on the convolutional and fully-connected layers. In order to overcome these bottlenecks, two optimizations are proposed. In first place, we use half-precision floating points instead of single-precision; this reduces by half the memory bandwidth, improving performance and providing energy savings. In second place, we implement a neuron-pruning technique to remove up to 80% of the neurons in the fully-connected layers; pruning reduces the memory footprint of the network model and the amount of FP operations, reducing both energy consumption and execution time. To evaluate the aforementioned optimizations, thorough experimentation is carried out on a Nvidia Jetson TX1 module. Results show that, combining all the optimizations, we obtain, on average, a speedup of 1.55x, an energy reduction of 31.3%, an improvement in energy-delay of 2.26x and a memory footprint reduction of 86%.
dc.language.isoeng
dc.publisherUniversitat Politècnica de Catalunya
dc.subjectÀrees temàtiques de la UPC::Informàtica
dc.subject.lcshNeural networks (Computer science)
dc.subject.lcshMachine learning
dc.subject.otheraprenentatge profund
dc.subject.otherxarxes neuronals profundes
dc.subject.otherdetecció d'objectes
dc.subject.otherfaster r-cnn
dc.subject.othercaffe
dc.subject.othermitja precisió
dc.subject.otherpoda
dc.subject.otherpoda de pesos
dc.subject.otherpoda de neurones
dc.subject.otherdeep learning
dc.subject.otherdeep neural networks
dc.subject.otherobject detection
dc.subject.otherhalf-precision
dc.subject.otherpruning
dc.subject.otherweight-pruning
dc.subject.otherneuron-pruning
dc.titleAdapting deep neural networks to a low-power environment
dc.title.alternativeAdaptació de xarxes neuronals profundes a un entorn de baix consum
dc.typeBachelor thesis
dc.subject.lemacXarxes neuronals (Informàtica)
dc.subject.lemacAprenentatge automàtic
dc.identifier.slug126470
dc.rights.accessOpen Access
dc.date.updated2017-07-05T04:00:09Z
dc.audience.educationlevelGrau


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder