Neural network compression

Noguera Vall, Ferran

dc.contributor	Ayguadé Parra, Eduard
dc.contributor	Llosa Espuny, José Francisco
dc.contributor.author	Noguera Vall, Ferran
dc.contributor.other	Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.date.accessioned	2021-04-30T09:09:23Z
dc.date.available	2021-04-30T09:09:23Z
dc.date.issued	2021-01
dc.identifier.uri	http://hdl.handle.net/2117/344886
dc.description.abstract	In recent years, neural networks have grown in popularity, mostly thanks to the advances in the field of high performance computing. Nevertheless, some factors are still limiting the usage of neural networks. In particular, two limiting factors are storage requirements and computational cost. The aim of this project is to radically improve storage demand and provide direction for accelerating the execution of neural networks. In the scope of this thesis two compression algorithms have been developed. These algorithms share a common basis, both exploit error-tolerance is a property, because of this property the weight matrix can be divided into blocks simplifying the problem while merely impacting the accuracy. The first algorithm, groups the weights inside every block using different clustering techniques: Arithmetic mean and K-Means. To decide which clustering method to apply to which block standard deviation is employed among others. The user can specify a trade-off between accuracy and compression. This method has underperformed, obtaining a compression rate of 10,57 for AlexNet, which is not nearly state-of-the-art. The main issue is that meaningless weights are being merged with significant ones, causing a significant drop in the accuracy. The second algorithm, takes on the problem of accuracy loss by pruning all the unimportant weights. After pruning, quantization is applied. For both steps, pruning and quantization, two options have been explored which are effective for different kinds of neural networks. Of the possible combinations between pruning and quantization, one is selected by trial-and-error. The first pruning technique focuses on removing as many weights as possible, while the second pruning method considers blocks to a greater extend. The two types of quantization allow three values per block and five values per block respectively. This algorithm performed very well, obtaining a compression rate of 57,15 for AlexNet with minimal accuracy loss.
dc.language.iso	eng
dc.publisher	Universitat Politècnica de Catalunya
dc.subject.lcsh	Neural networks (Computer science)
dc.subject.lcsh	Machine learning
dc.subject.lcsh	Artificial intelligence
dc.subject.other	aprenentatge profund
dc.subject.other	compressió de la matriu de pesos
dc.subject.other	compressió de xarxes neuronals
dc.subject.other	algoritmes de clustering
dc.subject.other	quantització
dc.subject.other	K-Means
dc.subject.other	mitjana aritmètica
dc.subject.other	acceleració de xarxes neuronals
dc.subject.other	consum d'energia
dc.subject.other	xarxes neuronals convolutionals
dc.subject.other	capa densament connectada
dc.subject.other	deep learning
dc.subject.other	neural networks
dc.subject.other	weight matrix compression
dc.subject.other	neural network compression
dc.subject.other	clustering algorithms
dc.subject.other	quantization
dc.subject.other	clustering algorithms
dc.subject.other	arithmetic mean
dc.subject.other	matrix compression
dc.subject.other	AlexNet
dc.subject.other	LeNet
dc.subject.other	CIFAR-10
dc.subject.other	MNIST
dc.subject.other	ImageNet
dc.subject.other	artificial intelligence
dc.subject.other	convolutional neural networks
dc.subject.other	fully-connected layer
dc.title	Neural network compression
dc.type	Master thesis
dc.subject.lemac	Xarxes neuronals (Informàtica)
dc.subject.lemac	Aprenentatge automàtic
dc.subject.lemac	Intel·ligència artificial
dc.identifier.slug	156456
dc.rights.access	Open Access
dc.date.updated	2021-02-05T07:29:28Z
dc.audience.educationlevel	Màster
dc.audience.mediator	Facultat d'Informàtica de Barcelona
dc.audience.degree	MÀSTER UNIVERSITARI EN INNOVACIÓ I RECERCA EN INFORMÀTICA (Pla 2012)

Fitxers d'aquest items

Nom:: 156456.pdf
Mida:: 2,203Mb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Master in Innovation and Research in Informatics - MIRI [454]

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

Neural network compression

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora