On-line sampling methods for discovering association rules

Domingo Soriano, Carlos; Gavaldà Mestre, Ricard; Watanabe, Osamu

Visualitza/Obre

R99-4(1).pdf (226,5Kb)

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Domingo Soriano, Carlos

Gavaldà Mestre, Ricard

Watanabe, Osamu

Tipus de documentReport de recerca

Data publicació1999-02

Condicions d'accésAccés obert

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

Abstract

Association rule discovery is one of the prototypical problems in data mining. In this problem, the input database is assumed to be very large and most of the algorithms are designed to minimize the number of scans of the database. Enumerating association rules is usually an expensive task due to the size of the input database. A proposed approach for reducing the running time of this process is random sampling. Of course, any implementation of an algorithm that uses sampling must solve the problem of determining which sample size is appropriate. Previous research of sampling for association rule mining has approached this problem concluding that, in general, the theoretically obtained sample size bounds are far from what is observed in practice. In this paper, we try to reduce this gap between theory and practice. We propose two on-line sampling algorithms for association rule mining. Our algorithms maintain the same theoretical guarantees of previous approaches while using a much smaller number of transactions in most of the cases. In the experiments we report, this improvement is often by an order of magnitude.

CitacióDomingo, C., Gavaldà, R., Watanabe, O. "On-line sampling methods for discovering association rules". 1999.

Forma partLSI-99-4-R

URIhttp://hdl.handle.net/2117/91378

Col·leccions

Departament de Ciències de la Computació - Reports de recerca [1.107]

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
R99-4(1).pdf		226,5Kb	PDF	Visualitza/Obre

UPCommons. Portal del coneixement obert de la UPC

On-line sampling methods for discovering association rules

Visualitza/Obre

Explora