K-means vs Mini Batch K-means: a comparison

Béjar Alonso, Javier

Visualitza/Obre

R13-8.pdf (400,6Kb)

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Béjar Alonso, Javier

Tipus de documentReport de recerca

Data publicació2013-05-17

Condicions d'accésAccés obert

Attribution-NonCommercial-NoDerivs 3.0 Spain

Llevat que s'hi indiqui el contrari, els continguts d'aquesta obra estan subjectes a la llicència de Creative Commons : Reconeixement-NoComercial-SenseObraDerivada 3.0 Espanya

Abstract

Mini Batch K-means (cite{Sculley2010}) has been proposed as an alternative to the K-means algorithm for clustering massive datasets. The advantage of this algorithm is to reduce the computational cost by not using all the dataset each iteration but a subsample of a fixed size. This strategy reduces the number of distance computations per iteration at the cost of lower cluster quality. The purpose of this paper is to perform empirical experiments using artificial datasets with controlled characteristics to assess how much cluster quality is lost when applying this algorithm. The goal is to obtain some guidelines about what are the best circumstances to apply this algorithm and what is the maximum gain in computational time without compromising the overall quality of the partition.

CitacióBejar, J. "K-means vs Mini Batch K-means: a comparison". 2013.

Forma partLSI-13-8-R

URIhttp://hdl.handle.net/2117/23414

URL repositori externhttp://www.lsi.upc.edu/~techreps/files/R13-8.zip

Col·leccions

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
R13-8.pdf		400,6Kb	PDF	Visualitza/Obre

UPCommons. Portal del coneixement obert de la UPC

K-means vs Mini Batch K-means: a comparison

Visualitza/Obre

Explora