Show simple item record

dc.contributorAyguadé Parra, Eduard
dc.contributorLlosa Espuny, José Francisco
dc.contributor.authorNoguera Vall, Ferran
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.description.abstractIn recent years, neural networks have grown in popularity, mostly thanks to the advances in the field of high performance computing. Nevertheless, some factors are still limiting the usage of neural networks. In particular, two limiting factors are storage requirements and computational cost. The aim of this project is to radically improve storage demand and provide direction for accelerating the execution of neural networks. In the scope of this thesis two compression algorithms have been developed. These algorithms share a common basis, both exploit error-tolerance is a property, because of this property the weight matrix can be divided into blocks simplifying the problem while merely impacting the accuracy. The first algorithm, groups the weights inside every block using different clustering techniques: Arithmetic mean and K-Means. To decide which clustering method to apply to which block standard deviation is employed among others. The user can specify a trade-off between accuracy and compression. This method has underperformed, obtaining a compression rate of 10,57 for AlexNet, which is not nearly state-of-the-art. The main issue is that meaningless weights are being merged with significant ones, causing a significant drop in the accuracy. The second algorithm, takes on the problem of accuracy loss by pruning all the unimportant weights. After pruning, quantization is applied. For both steps, pruning and quantization, two options have been explored which are effective for different kinds of neural networks. Of the possible combinations between pruning and quantization, one is selected by trial-and-error. The first pruning technique focuses on removing as many weights as possible, while the second pruning method considers blocks to a greater extend. The two types of quantization allow three values per block and five values per block respectively. This algorithm performed very well, obtaining a compression rate of 57,15 for AlexNet with minimal accuracy loss.
dc.publisherUniversitat Politècnica de Catalunya
dc.subject.lcshNeural networks (Computer science)
dc.subject.lcshMachine learning
dc.subject.lcshArtificial intelligence
dc.subject.otheraprenentatge profund
dc.subject.othercompressió de la matriu de pesos
dc.subject.othercompressió de xarxes neuronals
dc.subject.otheralgoritmes de clustering
dc.subject.othermitjana aritmètica
dc.subject.otheracceleració de xarxes neuronals
dc.subject.otherconsum d'energia
dc.subject.otherxarxes neuronals convolutionals
dc.subject.othercapa densament connectada
dc.subject.otherdeep learning
dc.subject.otherneural networks
dc.subject.otherweight matrix compression
dc.subject.otherneural network compression
dc.subject.otherclustering algorithms
dc.subject.otherclustering algorithms
dc.subject.otherarithmetic mean
dc.subject.othermatrix compression
dc.subject.otherartificial intelligence
dc.subject.otherconvolutional neural networks
dc.subject.otherfully-connected layer
dc.titleNeural network compression
dc.typeMaster thesis
dc.subject.lemacXarxes neuronals (Informàtica)
dc.subject.lemacAprenentatge automàtic
dc.subject.lemacIntel·ligència artificial
dc.rights.accessOpen Access
dc.audience.mediatorFacultat d'Informàtica de Barcelona

Files in this item


This item appears in the following Collection(s)

Show simple item record

All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder