Frequent sets, sequences, and taxonomies: new, efficient algorithmic proposals
Document typeResearch report
Rights accessOpen Access
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder
We describe efficient algorithmic proposals to approach three fundamental problems in data mining: association rules, episodes in sequences, and generalized association rules over hierarchical taxonomies. The association rule discovery problem aims at identifying frequent itemsets in a database and then forming conditional implication rules among them. For this association task, we will introduce a new algorithmic proposal to reduce substantially the number of processed transactions. The resulting algorithm, called Ready-and-Go, is used to discover frequent sets efficiently. Then, for the discovery of patterns in sequences of events in ordered collections of data, we propose to apply the appropriate variant of that algorithm, and additionally we introduce a new framework for the formalization of the concept of interesting episodes. Finally, we adapt our algorithm to the generalization of the frequent sets problem where data comes organized in taxonomic hierarchies, and here additionally we contribute with a new heuristic that, under certain natural conditions, improves the performance.
CitationBaixeries, J., Casas, G., Balcazar, J. L. "Frequent sets, sequences, and taxonomies: new, efficient algorithmic proposals". 2000.
Is part ofLSI-00-78-R