Scalability, memory issues and challenges in mining large data sets
Document typeConference report
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Rights accessOpen Access
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder
Data mining is an active field of research and development aiming to automatically extract "knowledge" from analyzing data sets. Knowledge can be defined in different ways such as discovering (structured, frequent, approximate, etc.) patterns in data, grouping/clustering/bi-clustering data according to one or more criteria, finding association rules, etc. Such knowledge is then fed-back to decision support systems enabling end-users (actors) to make more informed decisions, which in economic terms could lead to advantages as compared to traditional decision support systems. It should be noted however, that data mining algorithms and frameworks have been proposed prior to the "Big Data" explosion. While data mining algorithms have considered efficiency and computational complexity as an important requirement, they did not take into account features of Big Data such as very large size, velocity with which data is generated, variety, etc. On the other hand, these features are indeed posing issues and challenges to data mining algorithms and frameworks. In this paper we analyse some of the issues in mining large data sets such as scalability and in-memory needs. We also show some computational results pointing out to such issues.
(c) 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.
CitationKolici, V., Xhafa, F., Barolli, L., Lala, A. Scalability, memory issues and challenges in mining large data sets. A: International Conference on Intelligent Networking and Collaborative Systems. "2014 International Conference on Intelligent Networking and Collaborative Systems: IEEE INCoS 2014: 10–12 September 2014, University of Salerno, Salerno, Italy: proceedings". Salerno: Institute of Electrical and Electronics Engineers (IEEE), 2014, p. 268-273.