Anomaly detection model selection using minimum description length
Document typeMaster thesis
Rights accessRestricted access - confidentiality agreement
Detecting objects that deviate significantly from the rest of a dataset is a complex process which requires advanced techniques. A great variety of algorithms to detect anomalies have been presented over the last years, but none has been proved to be the best. We present a proxy technique for predicting the outlier detection performance of compression-based algorithms using the minimum description length (MDL) principle given a particular dataset. We analyse the correlation between how well an algorithm can compress the data and its performance in anomaly detection (AD). The results show a clear relationship between the total compressed size of a dataset and the outlier detection performance for an MDL-based algorithm. This fact allows us to use the size as a proxy for selecting the most effective AD algorithm for a specific application.