DSpace Collection:
http://hdl.handle.net/2117/3487
Sun, 20 Apr 2014 11:36:23 GMT2014-04-20T11:36:23Zwebmaster.bupc@upc.eduUniversitat Politècnica de Catalunya. Servei de Biblioteques i DocumentaciónoCharacterizing functional dependencies in formal concept analysis with pattern structures
http://hdl.handle.net/2117/21485
Title: Characterizing functional dependencies in formal concept analysis with pattern structures
Authors: Baixeries i Juvillà, Jaume; Kaytoue, Mehdi; Napoli, Amedeo
Abstract: Computing functional dependencies from a relation is an important database topic, with many applications in database management, reverse engineering and query optimization.
Whereas it has been deeply investigated in those fields, strong links exist with
the mathematical framework of Formal Concept Analysis. Considering the discovery of
functional dependencies, it is indeed known that a relation can be expressed as the binary relation of a formal context, whose implications are equivalent to those dependencies. However, this leads to a new data representation that is quadratic in the number of objects w.r.t. the original data. Here, we present an alternative avoiding such a data representation and show how to characterize functional dependencies using the formalism of pattern structures,
an extension of classical FCA to handle complex data. We also show how another class of dependencies can be characterized with that framework, namely, degenerated multivalued dependencies. Finally, we discuss and compare the performances of our new approach in a series of experiments on classical benchmark datasets.Fri, 07 Feb 2014 20:12:07 GMThttp://hdl.handle.net/2117/214852014-02-07T20:12:07ZBaixeries i Juvillà, Jaume; Kaytoue, Mehdi; Napoli, AmedeonoAssociation rules, Attribute implications, Data dependencies, Pattern
structures, Formal concept analysisComputing functional dependencies from a relation is an important database topic, with many applications in database management, reverse engineering and query optimization.
Whereas it has been deeply investigated in those fields, strong links exist with
the mathematical framework of Formal Concept Analysis. Considering the discovery of
functional dependencies, it is indeed known that a relation can be expressed as the binary relation of a formal context, whose implications are equivalent to those dependencies. However, this leads to a new data representation that is quadratic in the number of objects w.r.t. the original data. Here, we present an alternative avoiding such a data representation and show how to characterize functional dependencies using the formalism of pattern structures,
an extension of classical FCA to handle complex data. We also show how another class of dependencies can be characterized with that framework, namely, degenerated multivalued dependencies. Finally, we discuss and compare the performances of our new approach in a series of experiments on classical benchmark datasets.Spectral learning of weighted automata: a forward-backward perspective
http://hdl.handle.net/2117/21075
Title: Spectral learning of weighted automata: a forward-backward perspective
Authors: Balle Pigem, Borja de; Carreras Pérez, Xavier; Luque, Franco M.; Quattoni, Ariadna Julieta
Abstract: In recent years we have seen the development of efficient provably correct algorithms for learning Weighted Finite Automata (WFA). Most of these algorithms avoid the known hardness results by defining parameters beyond the number of states that can be used to quantify the complexity of learning automata under a particular distribution. One such class of methods are the so-called spectral algorithms that measure learning complexity in terms of the smallest singular value of some Hankel matrix. However, despite their simplicity and wide applicability to real problems, their impact in application domains remains marginal to this date. One of the goals of this paper is to remedy this situation by presenting a derivation of the spectral method for learning WFA that—without sacrificing rigor and mathematical elegance—puts emphasis on providing intuitions on the inner workings of the method and does not assume a strong background in formal algebraic methods. In addition, our algorithm overcomes some of the shortcomings of previous work and is able to learn from statistics of substrings. To illustrate the approach we present experiments on a real application of the method to natural language parsing.Fri, 20 Dec 2013 11:07:41 GMThttp://hdl.handle.net/2117/210752013-12-20T11:07:41ZBalle Pigem, Borja de; Carreras Pérez, Xavier; Luque, Franco M.; Quattoni, Ariadna JulietanoSpectral learning
Weighted finite automata
Dependency parsingIn recent years we have seen the development of efficient provably correct algorithms for learning Weighted Finite Automata (WFA). Most of these algorithms avoid the known hardness results by defining parameters beyond the number of states that can be used to quantify the complexity of learning automata under a particular distribution. One such class of methods are the so-called spectral algorithms that measure learning complexity in terms of the smallest singular value of some Hankel matrix. However, despite their simplicity and wide applicability to real problems, their impact in application domains remains marginal to this date. One of the goals of this paper is to remedy this situation by presenting a derivation of the spectral method for learning WFA that—without sacrificing rigor and mathematical elegance—puts emphasis on providing intuitions on the inner workings of the method and does not assume a strong background in formal algebraic methods. In addition, our algorithm overcomes some of the shortcomings of previous work and is able to learn from statistics of substrings. To illustrate the approach we present experiments on a real application of the method to natural language parsing.The Evolution of the exponent of Zipf's law in language ontogeny
http://hdl.handle.net/2117/19413
Title: The Evolution of the exponent of Zipf's law in language ontogeny
Authors: Baixeries i Juvillà, Jaume; Elvevag, Brita; Ferrer Cancho, Ramon
Abstract: It is well-known that word frequencies arrange themselves according to Zipf's law. However, little is known about the dependency of the parameters of the law and the complexity of a communication system. Many models of the evolution of language assume that the exponent of the law remains constant as the complexity of a communication systems increases. Using longitudinal studies of child language, we analysed the word rank distribution for the speech of children and adults participating in conversations. The adults typically included family members (e.g., parents) or the investigators conducting the research. Our analysis of the evolution of Zipf's law yields two main unexpected results. First, in children the exponent of the law tends to decrease over time while this tendency is weaker in adults, thus suggesting this is not a mere mirror effect of adult speech. Second, although the exponent of the law is more stable in adults, their exponents fall below 1 which is the typical value of the exponent assumed in both children and adults. Our analysis also shows a tendency of the mean length of utterances (MLU), a simple estimate of syntactic complexity, to increase as the exponent decreases. The parallel evolution of the exponent and a simple indicator of syntactic complexity (MLU) supports the hypothesis that the exponent of Zipf's law and linguistic complexity are inter-related. The assumption that Zipf's law for word ranks is a power-law with a constant exponent of one in both adults and children needs to be revised.Mon, 27 May 2013 14:04:20 GMThttp://hdl.handle.net/2117/194132013-05-27T14:04:20ZBaixeries i Juvillà, Jaume; Elvevag, Brita; Ferrer Cancho, RamonnoIt is well-known that word frequencies arrange themselves according to Zipf's law. However, little is known about the dependency of the parameters of the law and the complexity of a communication system. Many models of the evolution of language assume that the exponent of the law remains constant as the complexity of a communication systems increases. Using longitudinal studies of child language, we analysed the word rank distribution for the speech of children and adults participating in conversations. The adults typically included family members (e.g., parents) or the investigators conducting the research. Our analysis of the evolution of Zipf's law yields two main unexpected results. First, in children the exponent of the law tends to decrease over time while this tendency is weaker in adults, thus suggesting this is not a mere mirror effect of adult speech. Second, although the exponent of the law is more stable in adults, their exponents fall below 1 which is the typical value of the exponent assumed in both children and adults. Our analysis also shows a tendency of the mean length of utterances (MLU), a simple estimate of syntactic complexity, to increase as the exponent decreases. The parallel evolution of the exponent and a simple indicator of syntactic complexity (MLU) supports the hypothesis that the exponent of Zipf's law and linguistic complexity are inter-related. The assumption that Zipf's law for word ranks is a power-law with a constant exponent of one in both adults and children needs to be revised.The parameters of Menzerath-Altmann law in genomes
http://hdl.handle.net/2117/19025
Title: The parameters of Menzerath-Altmann law in genomes
Authors: Baixeries i Juvillà, Jaume; Hernández Fernández, Antonio; Forns, Núria; Ferrer Cancho, Ramon
Abstract: The relationship between the size of the whole and the size of the parts in language and music is known to follow the Menzerath-Altmann law at many levels of description (morphemes, words, sentences, …). Qualitatively, the law states that the larger the whole, the smaller its parts, e.g. the longer a word (in syllables) the shorter its syllables (in letters or
phonemes). This patterning has also been found in genomes: the longer a genome (in chromosomes), the shorter its chromosomes (in base pairs). However, it has been argued recently that mean chromosome length is trivially a pure power function of chromosome number with an exponent of -1. The functional dependency between mean chromosome size and chromosome number in groups of organisms from three different kingdoms is studied. The fit of a pure power function yields exponents between -1.6 and 0.1. It is shown that an exponent of -1 is unlikely for fungi, gymnosperm plants, insects, reptiles, ray-finned fishes and
amphibians. Even when the exponent is very close to -1, adding an exponential component
is able to yield a better fit with regard to a pure power-law in plants, mammals, ray-finned fishes and amphibians. The parameters of the Menzerath-Altmann law in genomes deviate significantly from a power law with a -1 exponent with the exception of birds and cartilaginous fishes.Fri, 26 Apr 2013 18:45:28 GMThttp://hdl.handle.net/2117/190252013-04-26T18:45:28ZBaixeries i Juvillà, Jaume; Hernández Fernández, Antonio; Forns, Núria; Ferrer Cancho, RamonnoThe relationship between the size of the whole and the size of the parts in language and music is known to follow the Menzerath-Altmann law at many levels of description (morphemes, words, sentences, …). Qualitatively, the law states that the larger the whole, the smaller its parts, e.g. the longer a word (in syllables) the shorter its syllables (in letters or
phonemes). This patterning has also been found in genomes: the longer a genome (in chromosomes), the shorter its chromosomes (in base pairs). However, it has been argued recently that mean chromosome length is trivially a pure power function of chromosome number with an exponent of -1. The functional dependency between mean chromosome size and chromosome number in groups of organisms from three different kingdoms is studied. The fit of a pure power function yields exponents between -1.6 and 0.1. It is shown that an exponent of -1 is unlikely for fungi, gymnosperm plants, insects, reptiles, ray-finned fishes and
amphibians. Even when the exponent is very close to -1, adding an exponential component
is able to yield a better fit with regard to a pure power-law in plants, mammals, ray-finned fishes and amphibians. The parameters of the Menzerath-Altmann law in genomes deviate significantly from a power law with a -1 exponent with the exception of birds and cartilaginous fishes.Learning probabilistic automata : a study in state distinguishability
http://hdl.handle.net/2117/18260
Title: Learning probabilistic automata : a study in state distinguishability
Authors: Balle Pigem, Borja de; Castro Rabal, Jorge; Gavaldà Mestre, Ricard
Abstract: Known algorithms for learning PDFA can only be shown to run in time polynomial in the so-called distinguishability μ of the target machine, besides the number of states and the usual accuracy and confidence parameters. We show that the dependence on μ is necessary in the worst case for every algorithm whose structure resembles existing ones. As a technical tool, a new variant of Statistical Queries termed View the MathML source-queries is defined. We show how to simulate View the MathML source-queries using classical Statistical Queries and show that known PAC algorithms for learning PDFA are in fact statistical query algorithms. Our results include a lower bound: every algorithm to learn PDFA with queries using a reasonable tolerance must make Ω(1/μ1−c) queries for every c>0. Finally, an adaptive algorithm that PAC-learns w.r.t. another measure of complexity is described. This yields better efficiency in many cases, while retaining the same inevitable worst-case behavior. Our algorithm requires fewer input parameters than previously existing ones, and has a better sample bound.Wed, 13 Mar 2013 13:49:57 GMThttp://hdl.handle.net/2117/182602013-03-13T13:49:57ZBalle Pigem, Borja de; Castro Rabal, Jorge; Gavaldà Mestre, RicardnoKnown algorithms for learning PDFA can only be shown to run in time polynomial in the so-called distinguishability μ of the target machine, besides the number of states and the usual accuracy and confidence parameters. We show that the dependence on μ is necessary in the worst case for every algorithm whose structure resembles existing ones. As a technical tool, a new variant of Statistical Queries termed View the MathML source-queries is defined. We show how to simulate View the MathML source-queries using classical Statistical Queries and show that known PAC algorithms for learning PDFA are in fact statistical query algorithms. Our results include a lower bound: every algorithm to learn PDFA with queries using a reasonable tolerance must make Ω(1/μ1−c) queries for every c>0. Finally, an adaptive algorithm that PAC-learns w.r.t. another measure of complexity is described. This yields better efficiency in many cases, while retaining the same inevitable worst-case behavior. Our algorithm requires fewer input parameters than previously existing ones, and has a better sample bound.A graphical tool for describing the temporal evolution of clusters in financial stock markets
http://hdl.handle.net/2117/18232
Title: A graphical tool for describing the temporal evolution of clusters in financial stock markets
Authors: Arratia Quesada, Argimiro Alejandro; Cabaña, Ana AlejandraTue, 12 Mar 2013 16:27:48 GMThttp://hdl.handle.net/2117/182322013-03-12T16:27:48ZArratia Quesada, Argimiro Alejandro; Cabaña, Ana AlejandranoEnergy-efficient and multifaceted resource management for profit-driven virtualized data centers
http://hdl.handle.net/2117/16067
Title: Energy-efficient and multifaceted resource management for profit-driven virtualized data centers
Authors: Goiri Presa, Íñigo; Berral García, Josep Lluís; Fitó, Josep Oriol; Julià Massó, Ferran; Nou Castell, Ramon; Guitart Fernández, Jordi; Gavaldà Mestre, Ricard; Torres Viñals, Jordi
Abstract: As long as virtualization has been introduced in data centers, it has been opening new chances for resource management. Nowadays, it is not just used as a tool for consolidating underused nodes and save power; it also allows new solutions to well-known challenges, such as heterogeneity management. Virtualization helps to encapsulate Web-based applications or HPC jobs in virtual machines (VMs) and see them as a single entity which can be managed in an easier and more efficient way. We propose a new scheduling policy that models and manages a virtualized data center. It focuses
on the allocation of VMs in data center nodes according to multiple facets to optimize the provider’s profit. In particular, it considers energy efficiency, virtualization overheads, and SLA violation penalties, and supports the outsourcing to external providers. The proposed approach is compared to other common scheduling policies, demonstrating that a provider can improve its benefit by 30% and save power while handling other challenges, such as resource outsourcing, in a better and more intuitive way than other typical approaches do.Sat, 16 Jun 2012 10:58:35 GMThttp://hdl.handle.net/2117/160672012-06-16T10:58:35ZGoiri Presa, Íñigo; Berral García, Josep Lluís; Fitó, Josep Oriol; Julià Massó, Ferran; Nou Castell, Ramon; Guitart Fernández, Jordi; Gavaldà Mestre, Ricard; Torres Viñals, JordinoAs long as virtualization has been introduced in data centers, it has been opening new chances for resource management. Nowadays, it is not just used as a tool for consolidating underused nodes and save power; it also allows new solutions to well-known challenges, such as heterogeneity management. Virtualization helps to encapsulate Web-based applications or HPC jobs in virtual machines (VMs) and see them as a single entity which can be managed in an easier and more efficient way. We propose a new scheduling policy that models and manages a virtualized data center. It focuses
on the allocation of VMs in data center nodes according to multiple facets to optimize the provider’s profit. In particular, it considers energy efficiency, virtualization overheads, and SLA violation penalties, and supports the outsourcing to external providers. The proposed approach is compared to other common scheduling policies, demonstrating that a provider can improve its benefit by 30% and save power while handling other challenges, such as resource outsourcing, in a better and more intuitive way than other typical approaches do.Random models of Menzerath-Altmann law in genomes
http://hdl.handle.net/2117/14563
Title: Random models of Menzerath-Altmann law in genomes
Authors: Baixeries i Juvillà, Jaume; Hernández Fernández, Antonio; Ferrer Cancho, Ramon
Abstract: Recently, a random breakage model has been proposed to explain the negative correlation between mean chromosome length and chromosome number that is found in many groups of species and is consistent with Menzerath–Altmann law, a statistical law that defines the dependency between the mean size of the whole and the number of parts in quantitative linguistics. Here, the central assumption of the model, namely that genome size is independent from chromosome number is reviewed. This assumption is shown to be unrealistic from the perspective of chromosome structure and the statistical analysis of real genomes. A general class of random models, including that random breakage model, is analyzed. For any model within this class, a power law with an exponent of −1 is predicted for the expectation of the mean chromosome size as a function of chromosome length, a functional dependency that is not supported by real genomes. The random breakage and variants keeping genome size and chromosome number independent raise no serious objection to the relevance of correlations consistent with Menzerath–Altmann law across taxonomic groups and the possibility of a connection between human language and genomes through that law.Mon, 16 Jan 2012 11:48:19 GMThttp://hdl.handle.net/2117/145632012-01-16T11:48:19ZBaixeries i Juvillà, Jaume; Hernández Fernández, Antonio; Ferrer Cancho, RamonnoRecently, a random breakage model has been proposed to explain the negative correlation between mean chromosome length and chromosome number that is found in many groups of species and is consistent with Menzerath–Altmann law, a statistical law that defines the dependency between the mean size of the whole and the number of parts in quantitative linguistics. Here, the central assumption of the model, namely that genome size is independent from chromosome number is reviewed. This assumption is shown to be unrealistic from the perspective of chromosome structure and the statistical analysis of real genomes. A general class of random models, including that random breakage model, is analyzed. For any model within this class, a power law with an exponent of −1 is predicted for the expectation of the mean chromosome size as a function of chromosome length, a functional dependency that is not supported by real genomes. The random breakage and variants keeping genome size and chromosome number independent raise no serious objection to the relevance of correlations consistent with Menzerath–Altmann law across taxonomic groups and the possibility of a connection between human language and genomes through that law.Size of the whole versus number of parts in genomes
http://hdl.handle.net/2117/13368
Title: Size of the whole versus number of parts in genomes
Authors: Hernández Fernández, Antonio; Baixeries i Juvillà, Jaume; Forns, Núria; Ferrer Cancho, Ramon
Abstract: It is known that chromosome number tends to decrease as genome size increases in angiosperm plants. Here the relationship between number of parts (the chromosomes) and size of the whole (the genome) is studied for other groups of organisms from different kingdoms. Two major results are obtained. First, the finding of relationships of the kind "the more parts the smaller the whole" as in angiosperms, but also relationships of the kind "the more parts the larger the whole". Second, these dependencies are not linear in general. The implications of the dependencies between genome size and chromosome number are two-fold. First, they indicate that arguments against the relevance of the finding of negative correlations consistent with Menzerath-Altmann law (a linguistic law that relates the size of the parts with the size of the whole) in genomes are seriously flawed. Second, they unravel the weakness of a recent model of chromosome lengths based upon random breakage that assumes that chromosome number and genome size are independent.Wed, 28 Sep 2011 08:53:18 GMThttp://hdl.handle.net/2117/133682011-09-28T08:53:18ZHernández Fernández, Antonio; Baixeries i Juvillà, Jaume; Forns, Núria; Ferrer Cancho, RamonnoIt is known that chromosome number tends to decrease as genome size increases in angiosperm plants. Here the relationship between number of parts (the chromosomes) and size of the whole (the genome) is studied for other groups of organisms from different kingdoms. Two major results are obtained. First, the finding of relationships of the kind "the more parts the smaller the whole" as in angiosperms, but also relationships of the kind "the more parts the larger the whole". Second, these dependencies are not linear in general. The implications of the dependencies between genome size and chromosome number are two-fold. First, they indicate that arguments against the relevance of the finding of negative correlations consistent with Menzerath-Altmann law (a linguistic law that relates the size of the parts with the size of the whole) in genomes are seriously flawed. Second, they unravel the weakness of a recent model of chromosome lengths based upon random breakage that assumes that chromosome number and genome size are independent.Estimating the horizon of predictability in time-series predictions using inductive modelling tools
http://hdl.handle.net/2117/12055
Title: Estimating the horizon of predictability in time-series predictions using inductive modelling tools
Authors: López Herrera, Josefina; Cellier, François E.; Cembrano Gennari, Gabriela
Abstract: This paper deals with the assessment of how far into the future a time series can be safely predicted using inductive modelling and extrapolation techniques. Three different time series representing the water demand of the city of Barcelona, another characterizing the water demand of a section of the city of Rotterdam, and a third describing weather data for the city of Tucson. Fuzzy inductive reasoning (FIR) is used to predict future values of these time series on the basis of their own past. FIR predictions come with two different built-in measures of confidence that can be used to obtain a quantitative estimate of how far into the future a time series can be predicted.Thu, 24 Mar 2011 18:15:27 GMThttp://hdl.handle.net/2117/120552011-03-24T18:15:27ZLópez Herrera, Josefina; Cellier, François E.; Cembrano Gennari, GabrielanoThis paper deals with the assessment of how far into the future a time series can be safely predicted using inductive modelling and extrapolation techniques. Three different time series representing the water demand of the city of Barcelona, another characterizing the water demand of a section of the city of Rotterdam, and a third describing weather data for the city of Tucson. Fuzzy inductive reasoning (FIR) is used to predict future values of these time series on the basis of their own past. FIR predictions come with two different built-in measures of confidence that can be used to obtain a quantitative estimate of how far into the future a time series can be predicted.Horn query learning with multiple refinement
http://hdl.handle.net/2117/10845
Title: Horn query learning with multiple refinement
Authors: Sierra Santibáñez, Josefina; Santibáñez Velilla, Josefina
Abstract: In this paper we try to understand the heuristics that underlie the decisions made by the Horn query learning algorithm proposed in [1]. We take advantage of our explicit representation of such heuristics
in order to present an alternative termination proof for the algorithm, as well as to justify its decisions by showing that they always guarantee that the negative examples in the sequence maintained by the algorithm violate different clauses in the target formula. Finally, we propose a new
algorithm that allows multiple refinement when we can prove that such a refinement does not affect the independence of the negative examples in the sequence maintained by the algorithm.Thu, 30 Dec 2010 09:01:50 GMThttp://hdl.handle.net/2117/108452010-12-30T09:01:50ZSierra Santibáñez, Josefina; Santibáñez Velilla, JosefinanoHorn clauses, Learning (artificial intelligence), Query processingIn this paper we try to understand the heuristics that underlie the decisions made by the Horn query learning algorithm proposed in [1]. We take advantage of our explicit representation of such heuristics
in order to present an alternative termination proof for the algorithm, as well as to justify its decisions by showing that they always guarantee that the negative examples in the sequence maintained by the algorithm violate different clauses in the target formula. Finally, we propose a new
algorithm that allows multiple refinement when we can prove that such a refinement does not affect the independence of the negative examples in the sequence maintained by the algorithm.Mining frequent closed rooted trees
http://hdl.handle.net/2117/6835
Title: Mining frequent closed rooted trees
Authors: Balcázar Navarro, José Luis; Bifet Figuerol, Albert Carles; Lozano Bojados, Antoni
Abstract: Many knowledge representation mechanisms are based on tree-like structures, thus symbolizing the fact that certain pieces of information are related in one sense or another. There exists a well-studied process of closure-based data mining in the itemset framework: we consider the extension of this process into trees. We focus mostly on the case where labels on the nodes are nonexistent or unreliable, and discuss algorithms for closurebased mining that only rely on the root of the tree and the link structure.
We provide a notion of intersection that leads to a deeper understanding of the notion of support-based closure, in terms of an actual closure operator.
We describe combinatorial characterizations and some properties of ordered trees, discuss their applicability to unordered trees, and rely on them to design efficient algorithms for mining frequent closed subtrees both in the ordered and the unordered settings. Empirical validations and comparisons with alternative algorithms are provided.Tue, 30 Mar 2010 09:00:52 GMThttp://hdl.handle.net/2117/68352010-03-30T09:00:52ZBalcázar Navarro, José Luis; Bifet Figuerol, Albert Carles; Lozano Bojados, AntoninoMany knowledge representation mechanisms are based on tree-like structures, thus symbolizing the fact that certain pieces of information are related in one sense or another. There exists a well-studied process of closure-based data mining in the itemset framework: we consider the extension of this process into trees. We focus mostly on the case where labels on the nodes are nonexistent or unreliable, and discuss algorithms for closurebased mining that only rely on the root of the tree and the link structure.
We provide a notion of intersection that leads to a deeper understanding of the notion of support-based closure, in terms of an actual closure operator.
We describe combinatorial characterizations and some properties of ordered trees, discuss their applicability to unordered trees, and rely on them to design efficient algorithms for mining frequent closed subtrees both in the ordered and the unordered settings. Empirical validations and comparisons with alternative algorithms are provided.