Neural network compression
Document typeMaster thesis
Rights accessOpen Access
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder
In recent years, neural networks have grown in popularity, mostly thanks to the advances in the field of high performance computing. Nevertheless, some factors are still limiting the usage of neural networks. In particular, two limiting factors are storage requirements and computational cost. The aim of this project is to radically improve storage demand and provide direction for accelerating the execution of neural networks. In the scope of this thesis two compression algorithms have been developed. These algorithms share a common basis, both exploit error-tolerance is a property, because of this property the weight matrix can be divided into blocks simplifying the problem while merely impacting the accuracy. The first algorithm, groups the weights inside every block using different clustering techniques: Arithmetic mean and K-Means. To decide which clustering method to apply to which block standard deviation is employed among others. The user can specify a trade-off between accuracy and compression. This method has underperformed, obtaining a compression rate of 10,57 for AlexNet, which is not nearly state-of-the-art. The main issue is that meaningless weights are being merged with significant ones, causing a significant drop in the accuracy. The second algorithm, takes on the problem of accuracy loss by pruning all the unimportant weights. After pruning, quantization is applied. For both steps, pruning and quantization, two options have been explored which are effective for different kinds of neural networks. Of the possible combinations between pruning and quantization, one is selected by trial-and-error. The first pruning technique focuses on removing as many weights as possible, while the second pruning method considers blocks to a greater extend. The two types of quantization allow three values per block and five values per block respectively. This algorithm performed very well, obtaining a compression rate of 57,15 for AlexNet with minimal accuracy loss.
SubjectsNeural networks (Computer science), Machine learning, Artificial intelligence, Xarxes neuronals (Informàtica), Aprenentatge automàtic, Intel·ligència artificial
DegreeMÀSTER UNIVERSITARI EN INNOVACIÓ I RECERCA EN INFORMÀTICA (Pla 2012)