Apache Mahout’s k-Means vs. fuzzy k-Means performance evaluation
Document typeConference report
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Rights accessOpen Access
The emergence of the Big Data as a disruptive technology for next generation of intelligent systems, has brought many issues of how to extract and make use of the knowledge obtained from the data within short times, limited budget and under high rates of data generation. The foremost challenge identified here is the data processing, and especially, mining and analysis for knowledge extraction. As the 'old' data mining frameworks were designed without Big Data requirements, a new generation of such frameworks is being developed fully implemented in Cloud platforms. One such frameworks is Apache Mahout aimed to leverage fast processing and analysis of Big Data. The performance of such new data mining frameworks is yet to be evaluated and potential limitations are to be revealed. In this paper we analyse the performance of Apache Mahout using large real data sets from the Twitter stream. We exemplify the analysis for the case of two clustering algorithms, namely, k-Means and Fuzzy k-Means, using a Hadoop cluster infrastructure for the experimental study.
(c) 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.
CitationXhafa, F., Bogza, A., Caballé , Santi, Barolli, L. Apache Mahout’s k-Means vs. fuzzy k-Means performance evaluation. A: International Conference on Intelligent Networking and Collaborative Systems. "2016 International Conference on Intelligent Networking and Collaborative Systems, IEEE INCoS 2016, 7-9 September 2016, Ostrava, Czech Republic: proceedings". Ostrava: Institute of Electrical and Electronics Engineers (IEEE), 2016, p. 110-116.