Multidimensional scaling for Big Data
Tutor / director / evaluatorDelicado Useros, Pedro Francisco
Document typeMaster thesis
Rights accessOpen Access
We present a set of algorithms for Multidimensional Scaling (MDS) to be used with large datasets. MDS is a statistic tool for reduction of dimensionality, using as input a distance matrix of dimensions n x n. When n is large, classical algorithms suffer from computational problems and MDS configuration can not be obtained. In this thesis we address these problems by means of three algorithms: Divide and Conquer MDS, Fast MDS and MDS based on Gower interpolation. The main idea of these methods is based on partitioning the dataset into small pieces, where classical methods can work. In order to check the performance of the algorithms as well as to compare them, we do a simulation study. This study points out that Fast MDS and MDS based on Gower interpolation are appropriated to use when n is large and Divide and Conquer MDS is the best method that captures the variance of the original data.