Reports de recerca
http://hdl.handle.net/2117/3488
2015-12-02T04:11:40ZSpectral learning of transducers over continuous sequences
http://hdl.handle.net/2117/20362
Spectral learning of transducers over continuous sequences
Recasens, Adria; Quattoni, Ariadna Julieta
In this paper we present a spectral algorithm for learning weighted nite state transducers (WFSTs) over paired input-output sequences, where the input is continuous and the output discrete. WFSTs are an important tool for modeling paired input-output sequences and have numerous applications in
real-world problems. Recently, Balle et al (2011) proposed a spectral method for learning WFSTs that overcomes some of the well known limitations of gradient-based or EM optimizations which can be computationally expensive and su er from local optima issues. Their algorithm can model distributions where both inputs and outputs are sequences from a discrete alphabet.
However, many real world problems require modeling paired sequences where the inputs are not discrete but continuos sequences. Modelling continuous sequences with spectral methods has been studied in the context of HMMs (Song et al 2010), where a spectral algorithm for this case was derived. In this
paper we follow that line of work and propose a spectral learning algorithm
for modelling paired input-output sequences where the inputs are continuous and the outputs are discrete. Our approach is based on generalizing the class of weighted nite state transducers over discrete input-output sequences to a class where transitions are linear combinations of elementary transitions and the weights of this linear combinations are determined by dynamic features of the continuous input sequence.
At its core, the algorithm is simple and scalable to large data sets. We present experiments on a real task that validate the eff ectiveness of the proposed approach.
2013-10-11T10:15:39ZRecasens, AdriaQuattoni, Ariadna JulietaIn this paper we present a spectral algorithm for learning weighted nite state transducers (WFSTs) over paired input-output sequences, where the input is continuous and the output discrete. WFSTs are an important tool for modeling paired input-output sequences and have numerous applications in
real-world problems. Recently, Balle et al (2011) proposed a spectral method for learning WFSTs that overcomes some of the well known limitations of gradient-based or EM optimizations which can be computationally expensive and su er from local optima issues. Their algorithm can model distributions where both inputs and outputs are sequences from a discrete alphabet.
However, many real world problems require modeling paired sequences where the inputs are not discrete but continuos sequences. Modelling continuous sequences with spectral methods has been studied in the context of HMMs (Song et al 2010), where a spectral algorithm for this case was derived. In this
paper we follow that line of work and propose a spectral learning algorithm
for modelling paired input-output sequences where the inputs are continuous and the outputs are discrete. Our approach is based on generalizing the class of weighted nite state transducers over discrete input-output sequences to a class where transitions are linear combinations of elementary transitions and the weights of this linear combinations are determined by dynamic features of the continuous input sequence.
At its core, the algorithm is simple and scalable to large data sets. We present experiments on a real task that validate the eff ectiveness of the proposed approach.Frequent sets, sequences and taxonomies: new efficient algorithmic proposals
http://hdl.handle.net/2117/14824
Frequent sets, sequences and taxonomies: new efficient algorithmic proposals
Baixeries i Juvillà, Jaume; Casas Garriga, Gemma; Balcázar Navarro, José Luis
We describe efficient algorithmic proposals to approach three fundamental problems in data mining: association rules, episodes in sequences, and generalized association rules over hierarchical taxonomies. The association rule discovery problem aims at identifying frequent itemsets in a database and then forming conditional implication rules among them. For this association task, we will introduce a new algorithmic proposal to reduce substantially the number of processed transactions. The resulting algorithm, called Ready-and-Go, is used to discover frequent sets efficiently. Then, for the discovery of patterns in sequences of events in ordered collections of data, we propose to apply the appropiate variant of that algorithm, and additionally we introduce a new framework for the formalization of the concept of intereseting episodes. Finally, we adapt our algorithm to the generalization of the frequent sets problem where data comes organized in taxonomic hierarchies, and here additionally we contribute with a new heuristic that, under certain natural conditions, improves the performance.
2012-01-26T10:47:30ZBaixeries i Juvillà, JaumeCasas Garriga, GemmaBalcázar Navarro, José LuisWe describe efficient algorithmic proposals to approach three fundamental problems in data mining: association rules, episodes in sequences, and generalized association rules over hierarchical taxonomies. The association rule discovery problem aims at identifying frequent itemsets in a database and then forming conditional implication rules among them. For this association task, we will introduce a new algorithmic proposal to reduce substantially the number of processed transactions. The resulting algorithm, called Ready-and-Go, is used to discover frequent sets efficiently. Then, for the discovery of patterns in sequences of events in ordered collections of data, we propose to apply the appropiate variant of that algorithm, and additionally we introduce a new framework for the formalization of the concept of intereseting episodes. Finally, we adapt our algorithm to the generalization of the frequent sets problem where data comes organized in taxonomic hierarchies, and here additionally we contribute with a new heuristic that, under certain natural conditions, improves the performance.An integer linear programming representation for data-center power-aware management
http://hdl.handle.net/2117/11061
An integer linear programming representation for data-center power-aware management
Berral García, Josep Lluís; Gavaldà Mestre, Ricard; Torres Viñals, Jordi
This work exposes how to represent a grid data-center based scheduling problem, taking the advantages of the virtualization and consolidation techniques, as a linear integer programming problem including all three mentioned factors. Although being integer linear programming (ILP) a computationally hard problem, specifying correctly its constraints and optimization function can contribute to find integer optimal solutions in relative short time. So ILP solutions can help designers and system managers not only to apply them to schedulers but also to create new heuristics and holistic functions that approximate well to the optimal solutions in a quicker way.
2011-01-17T11:21:12ZBerral García, Josep LluísGavaldà Mestre, RicardTorres Viñals, JordiThis work exposes how to represent a grid data-center based scheduling problem, taking the advantages of the virtualization and consolidation techniques, as a linear integer programming problem including all three mentioned factors. Although being integer linear programming (ILP) a computationally hard problem, specifying correctly its constraints and optimization function can contribute to find integer optimal solutions in relative short time. So ILP solutions can help designers and system managers not only to apply them to schedulers but also to create new heuristics and holistic functions that approximate well to the optimal solutions in a quicker way.