K-means vs Mini Batch K-means: a comparison
Visualitza/Obre
Estadístiques de LA Referencia / Recolecta
Inclou dades d'ús des de 2022
Cita com:
hdl:2117/23414
Tipus de documentReport de recerca
Data publicació2013-05-17
Condicions d'accésAccés obert
Llevat que s'hi indiqui el contrari, els
continguts d'aquesta obra estan subjectes a la llicència de Creative Commons
:
Reconeixement-NoComercial-SenseObraDerivada 3.0 Espanya
Abstract
Mini Batch K-means (cite{Sculley2010}) has been proposed as an alternative to the K-means algorithm for clustering massive datasets. The advantage of this algorithm is to reduce the computational cost by not using all the dataset each iteration but a subsample of a fixed size. This strategy reduces the number of distance computations per iteration at the cost of lower cluster quality. The purpose of this paper is to perform empirical experiments using artificial datasets with controlled characteristics to assess how much cluster quality is lost when applying this algorithm. The goal is to obtain some guidelines about what are the best circumstances to apply this algorithm and what is the maximum gain in computational time without compromising the overall quality of the partition.
CitacióBejar, J. "K-means vs Mini Batch K-means: a comparison". 2013.
Forma partLSI-13-8-R
URL repositori externhttp://www.lsi.upc.edu/~techreps/files/R13-8.zip
Fitxers | Descripció | Mida | Format | Visualitza |
---|---|---|---|---|
R13-8.pdf | 400,6Kb | Visualitza/Obre |