Network traffic classification : from theory to practice

Carela Español, Valentín

doi:10.5821/dissertation-2117-95495

dc.contributor	Barlet Ros, Pere
dc.contributor	Solé Pareta, Josep
dc.contributor.author	Carela Español, Valentín
dc.contributor.other	Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.date.accessioned	2014-11-04T12:29:23Z
dc.date.available	2014-11-04T12:29:23Z
dc.date.issued	2014-10-31
dc.identifier.citation	Carela Español, V. Network traffic classification : from theory to practice. Tesi doctoral, UPC, Departament d'Arquitectura de Computadors, 2014. DOI 10.5821/dissertation-2117-95495.
dc.identifier.uri	http://hdl.handle.net/2117/95495
dc.description.abstract	Since its inception until today, the Internet has been in constant transformation. The analysis and monitoring of data networks try to shed some light on this huge black box of interconnected computers. In particular, the classification of the network traffic has become crucial for understanding the Internet. During the last years, the research community has proposed many solutions to accurately identify and classify the network traffic. However, the continuous evolution of Internet applications and their techniques to avoid detection make their identification a very challenging task, which is far from being completely solved. This thesis addresses the network traffic classification problem from a more practical point of view, filling the gap between the real-world requirements from the network industry, and the research carried out. The first block of this thesis aims to facilitate the deployment of existing techniques in production networks. To achieve this goal, we study the viability of using NetFlow as input in our classification technique, a monitoring protocol already implemented in most routers. Since the application of packet sampling has become almost mandatory in large networks, we also study its impact on the classification and propose a method to improve the accuracy in this scenario. Our results show that it is possible to achieve high accuracy with both sampled and unsampled NetFlow data, despite the limited information provided by NetFlow. Once the classification solution is deployed it is important to maintain its accuracy over time. Current network traffic classification techniques have to be regularly updated to adapt them to traffic changes. The second block of this thesis focuses on this issue with the goal of automatically maintaining the classification solution without human intervention. Using the knowledge of the first block, we propose a classification solution that combines several techniques only using Sampled NetFlow as input for the classification. Then, we show that classification models suffer from temporal and spatial obsolescence and, therefore, we design an autonomic retraining system that is able to automatically update the models and keep the classifier accurate along time. Going one step further, we introduce next the use of stream-based Machine Learning techniques for network traffic classification. In particular, we propose a classification solution based on Hoeffding Adaptive Trees. Apart from the features of stream-based techniques (i.e., process an instance at a time and inspect it only once, with a predefined amount of memory and a bounded amount of time), our technique is able to automatically adapt to the changes in the traffic by using only NetFlow data as input for the classification. The third block of this thesis aims to be a first step towards the impartial validation of state-of-the-art classification techniques. The wide range of techniques, datasets, and ground-truth generators make the comparison of different traffic classifiers a very difficult task. To achieve this goal we evaluate the reliability of different Deep Packet Inspection-based techniques (DPI) commonly used in the literature for ground-truth generation. The results we obtain show that some well-known DPI techniques present several limitations that make them not recommendable as a ground-truth generator in their current state. In addition, we publish some of the datasets used in our evaluations to address the lack of publicly available datasets and make the comparison and validation of existing techniques easier.
dc.format.extent	170 p.
dc.language.iso	eng
dc.publisher	Universitat Politècnica de Catalunya
dc.rights	L'accés als continguts d'aquesta tesi queda condicionat a l'acceptació de les condicions d'ús establertes per la següent llicència Creative Commons: http://creativecommons.org/licenses/by-nc/3.0/es/
dc.rights.uri	http://creativecommons.org/licenses/by-nc/3.0/es/
dc.source	TDX (Tesis Doctorals en Xarxa)
dc.subject	Àrees temàtiques de la UPC::Informàtica
dc.title	Network traffic classification : from theory to practice
dc.type	Doctoral thesis
dc.subject.lemac	Dades -- Transmissió
dc.identifier.doi	10.5821/dissertation-2117-95495
dc.identifier.dl	B 25597-2014
dc.rights.access	Open Access
dc.description.version	Postprint (published version)
dc.identifier.tdx	http://hdl.handle.net/10803/283573

Fitxers d'aquest items

Nom:: TVCE1de1.pdf
Mida:: 11,39Mb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Departament d'Arquitectura de Computadors [361]
Totes les tesis [5.475]

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

Network traffic classification : from theory to practice

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora