Convolutional neural networks for malware classification
Tutor / director / evaluatorBéjar Alonso, Javier
Document typeMaster thesis
Rights accessRestricted access - author's decision
According to AV vendors malicious software has been growing exponentially last years. One of the main reasons for these high volumes is that in order to evade detection, malware authors started using polymorphic and metamorphic techniques. As a result, traditional signature-based approaches to detect malware are being insufficient against new malware and the categorization of malware samples had become essential to know the basis of the behavior of malware and to fight back cybercriminals. During the last decade, solutions that fight against malicious software had begun using machine learning approaches. Unfortunately, there are few opensource datasets available for the academic community. One of the biggest datasets available was released last year in a competition hosted on Kaggle with data provided by Microsoft for the Big Data Innovators Gathering (BIG 2015). This thesis presents two novel and scalable approaches using Convolutional Neural Networks (CNNs) to assign malware to its corresponding family. On one hand, the first approach makes use of CNNs to learn a feature hierarchy to discriminate among samples of malware represented as gray-scale images. On the other hand, the second approach uses the CNN architecture introduced by Yoon Kim  to classify malware samples according their x86 instructions. The proposed methods achieved an improvement of 93.86% and 98,56% with respect to the equal probability benchmark.
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder