DSpace Collection:
http://hdl.handle.net/2117/3487
2014-09-17T23:57:52ZCompression as a universal principle of animal behavior
http://hdl.handle.net/2117/24008
Title: Compression as a universal principle of animal behavior
Authors: Ferrer Cancho, Ramon; Hernández Fernández, Antonio; Lusseau, David; Agoramoorthy, Govindasamy; Hsu, Minna J.; Semple, Stuart
Abstract: A key aim in biology and psychology is to identify fundamental principles underpinning the behavior of animals, including humans. Analyses of human language and the behavior of a range of non-human animal species have provided evidence for a common pattern underlying diverse behavioral phenomena: Words follow Zipf’s law of brevity (the tendency of more frequently used words to be shorter), and conformity to this general pattern has been seen in the behavior of a number of other animals. It has been argued that the presence of this law is a sign of efficient coding in the information theoretic sense. However, no strong direct connection has been demonstrated between the law and compression, the information theoretic principle of minimizing the expected length of a code. Here, we show that minimizing the expected code length implies that the length of a word cannot increase as its frequency increases. Furthermore, we show that the mean code length or duration is significantly small in human language, and also in the behavior of other species in all cases where agreement with the law of brevity has been found. We argue that compression is a general principle of animal behavior that reflects selection for efficiency of coding.2014-09-09T09:27:18ZCharacterizing functional dependencies in formal concept analysis with pattern structures
http://hdl.handle.net/2117/21485
Title: Characterizing functional dependencies in formal concept analysis with pattern structures
Authors: Baixeries i Juvillà, Jaume; Kaytoue, Mehdi; Napoli, Amedeo
Abstract: Computing functional dependencies from a relation is an important database topic, with many applications in database management, reverse engineering and query optimization.
Whereas it has been deeply investigated in those fields, strong links exist with
the mathematical framework of Formal Concept Analysis. Considering the discovery of
functional dependencies, it is indeed known that a relation can be expressed as the binary relation of a formal context, whose implications are equivalent to those dependencies. However, this leads to a new data representation that is quadratic in the number of objects w.r.t. the original data. Here, we present an alternative avoiding such a data representation and show how to characterize functional dependencies using the formalism of pattern structures,
an extension of classical FCA to handle complex data. We also show how another class of dependencies can be characterized with that framework, namely, degenerated multivalued dependencies. Finally, we discuss and compare the performances of our new approach in a series of experiments on classical benchmark datasets.2014-02-07T20:12:07ZSpectral learning of weighted automata: a forward-backward perspective
http://hdl.handle.net/2117/21075
Title: Spectral learning of weighted automata: a forward-backward perspective
Authors: Balle Pigem, Borja de; Carreras Pérez, Xavier; Luque, Franco M.; Quattoni, Ariadna Julieta
Abstract: In recent years we have seen the development of efficient provably correct algorithms for learning Weighted Finite Automata (WFA). Most of these algorithms avoid the known hardness results by defining parameters beyond the number of states that can be used to quantify the complexity of learning automata under a particular distribution. One such class of methods are the so-called spectral algorithms that measure learning complexity in terms of the smallest singular value of some Hankel matrix. However, despite their simplicity and wide applicability to real problems, their impact in application domains remains marginal to this date. One of the goals of this paper is to remedy this situation by presenting a derivation of the spectral method for learning WFA that—without sacrificing rigor and mathematical elegance—puts emphasis on providing intuitions on the inner workings of the method and does not assume a strong background in formal algebraic methods. In addition, our algorithm overcomes some of the shortcomings of previous work and is able to learn from statistics of substrings. To illustrate the approach we present experiments on a real application of the method to natural language parsing.2013-12-20T11:07:41ZThe Evolution of the exponent of Zipf's law in language ontogeny
http://hdl.handle.net/2117/19413
Title: The Evolution of the exponent of Zipf's law in language ontogeny
Authors: Baixeries i Juvillà, Jaume; Elvevag, Brita; Ferrer Cancho, Ramon
Abstract: It is well-known that word frequencies arrange themselves according to Zipf's law. However, little is known about the dependency of the parameters of the law and the complexity of a communication system. Many models of the evolution of language assume that the exponent of the law remains constant as the complexity of a communication systems increases. Using longitudinal studies of child language, we analysed the word rank distribution for the speech of children and adults participating in conversations. The adults typically included family members (e.g., parents) or the investigators conducting the research. Our analysis of the evolution of Zipf's law yields two main unexpected results. First, in children the exponent of the law tends to decrease over time while this tendency is weaker in adults, thus suggesting this is not a mere mirror effect of adult speech. Second, although the exponent of the law is more stable in adults, their exponents fall below 1 which is the typical value of the exponent assumed in both children and adults. Our analysis also shows a tendency of the mean length of utterances (MLU), a simple estimate of syntactic complexity, to increase as the exponent decreases. The parallel evolution of the exponent and a simple indicator of syntactic complexity (MLU) supports the hypothesis that the exponent of Zipf's law and linguistic complexity are inter-related. The assumption that Zipf's law for word ranks is a power-law with a constant exponent of one in both adults and children needs to be revised.2013-05-27T14:04:20ZThe parameters of Menzerath-Altmann law in genomes
http://hdl.handle.net/2117/19025
Title: The parameters of Menzerath-Altmann law in genomes
Authors: Baixeries i Juvillà, Jaume; Hernández Fernández, Antonio; Forns, Núria; Ferrer Cancho, Ramon
Abstract: The relationship between the size of the whole and the size of the parts in language and music is known to follow the Menzerath-Altmann law at many levels of description (morphemes, words, sentences, …). Qualitatively, the law states that the larger the whole, the smaller its parts, e.g. the longer a word (in syllables) the shorter its syllables (in letters or
phonemes). This patterning has also been found in genomes: the longer a genome (in chromosomes), the shorter its chromosomes (in base pairs). However, it has been argued recently that mean chromosome length is trivially a pure power function of chromosome number with an exponent of -1. The functional dependency between mean chromosome size and chromosome number in groups of organisms from three different kingdoms is studied. The fit of a pure power function yields exponents between -1.6 and 0.1. It is shown that an exponent of -1 is unlikely for fungi, gymnosperm plants, insects, reptiles, ray-finned fishes and
amphibians. Even when the exponent is very close to -1, adding an exponential component
is able to yield a better fit with regard to a pure power-law in plants, mammals, ray-finned fishes and amphibians. The parameters of the Menzerath-Altmann law in genomes deviate significantly from a power law with a -1 exponent with the exception of birds and cartilaginous fishes.2013-04-26T18:45:28ZLearning probabilistic automata : a study in state distinguishability
http://hdl.handle.net/2117/18260
Title: Learning probabilistic automata : a study in state distinguishability
Authors: Balle Pigem, Borja de; Castro Rabal, Jorge; Gavaldà Mestre, Ricard
Abstract: Known algorithms for learning PDFA can only be shown to run in time polynomial in the so-called distinguishability μ of the target machine, besides the number of states and the usual accuracy and confidence parameters. We show that the dependence on μ is necessary in the worst case for every algorithm whose structure resembles existing ones. As a technical tool, a new variant of Statistical Queries termed View the MathML source-queries is defined. We show how to simulate View the MathML source-queries using classical Statistical Queries and show that known PAC algorithms for learning PDFA are in fact statistical query algorithms. Our results include a lower bound: every algorithm to learn PDFA with queries using a reasonable tolerance must make Ω(1/μ1−c) queries for every c>0. Finally, an adaptive algorithm that PAC-learns w.r.t. another measure of complexity is described. This yields better efficiency in many cases, while retaining the same inevitable worst-case behavior. Our algorithm requires fewer input parameters than previously existing ones, and has a better sample bound.2013-03-13T13:49:57ZA graphical tool for describing the temporal evolution of clusters in financial stock markets
http://hdl.handle.net/2117/18232
Title: A graphical tool for describing the temporal evolution of clusters in financial stock markets
Authors: Arratia Quesada, Argimiro Alejandro; Cabaña, Ana Alejandra2013-03-12T16:27:48ZEnergy-efficient and multifaceted resource management for profit-driven virtualized data centers
http://hdl.handle.net/2117/16067
Title: Energy-efficient and multifaceted resource management for profit-driven virtualized data centers
Authors: Goiri Presa, Íñigo; Berral García, Josep Lluís; Fitó, Josep Oriol; Julià Massó, Ferran; Nou Castell, Ramon; Guitart Fernández, Jordi; Gavaldà Mestre, Ricard; Torres Viñals, Jordi
Abstract: As long as virtualization has been introduced in data centers, it has been opening new chances for resource management. Nowadays, it is not just used as a tool for consolidating underused nodes and save power; it also allows new solutions to well-known challenges, such as heterogeneity management. Virtualization helps to encapsulate Web-based applications or HPC jobs in virtual machines (VMs) and see them as a single entity which can be managed in an easier and more efficient way. We propose a new scheduling policy that models and manages a virtualized data center. It focuses
on the allocation of VMs in data center nodes according to multiple facets to optimize the provider’s profit. In particular, it considers energy efficiency, virtualization overheads, and SLA violation penalties, and supports the outsourcing to external providers. The proposed approach is compared to other common scheduling policies, demonstrating that a provider can improve its benefit by 30% and save power while handling other challenges, such as resource outsourcing, in a better and more intuitive way than other typical approaches do.2012-06-16T10:58:35ZRandom models of Menzerath-Altmann law in genomes
http://hdl.handle.net/2117/14563
Title: Random models of Menzerath-Altmann law in genomes
Authors: Baixeries i Juvillà, Jaume; Hernández Fernández, Antonio; Ferrer Cancho, Ramon
Abstract: Recently, a random breakage model has been proposed to explain the negative correlation between mean chromosome length and chromosome number that is found in many groups of species and is consistent with Menzerath–Altmann law, a statistical law that defines the dependency between the mean size of the whole and the number of parts in quantitative linguistics. Here, the central assumption of the model, namely that genome size is independent from chromosome number is reviewed. This assumption is shown to be unrealistic from the perspective of chromosome structure and the statistical analysis of real genomes. A general class of random models, including that random breakage model, is analyzed. For any model within this class, a power law with an exponent of −1 is predicted for the expectation of the mean chromosome size as a function of chromosome length, a functional dependency that is not supported by real genomes. The random breakage and variants keeping genome size and chromosome number independent raise no serious objection to the relevance of correlations consistent with Menzerath–Altmann law across taxonomic groups and the possibility of a connection between human language and genomes through that law.2012-01-16T11:48:19ZSize of the whole versus number of parts in genomes
http://hdl.handle.net/2117/13368
Title: Size of the whole versus number of parts in genomes
Authors: Hernández Fernández, Antonio; Baixeries i Juvillà, Jaume; Forns, Núria; Ferrer Cancho, Ramon
Abstract: It is known that chromosome number tends to decrease as genome size increases in angiosperm plants. Here the relationship between number of parts (the chromosomes) and size of the whole (the genome) is studied for other groups of organisms from different kingdoms. Two major results are obtained. First, the finding of relationships of the kind "the more parts the smaller the whole" as in angiosperms, but also relationships of the kind "the more parts the larger the whole". Second, these dependencies are not linear in general. The implications of the dependencies between genome size and chromosome number are two-fold. First, they indicate that arguments against the relevance of the finding of negative correlations consistent with Menzerath-Altmann law (a linguistic law that relates the size of the parts with the size of the whole) in genomes are seriously flawed. Second, they unravel the weakness of a recent model of chromosome lengths based upon random breakage that assumes that chromosome number and genome size are independent.2011-09-28T08:53:18ZEstimating the horizon of predictability in time-series predictions using inductive modelling tools
http://hdl.handle.net/2117/12055
Title: Estimating the horizon of predictability in time-series predictions using inductive modelling tools
Authors: López Herrera, Josefina; Cellier, François E.; Cembrano Gennari, Gabriela
Abstract: This paper deals with the assessment of how far into the future a time series can be safely predicted using inductive modelling and extrapolation techniques. Three different time series representing the water demand of the city of Barcelona, another characterizing the water demand of a section of the city of Rotterdam, and a third describing weather data for the city of Tucson. Fuzzy inductive reasoning (FIR) is used to predict future values of these time series on the basis of their own past. FIR predictions come with two different built-in measures of confidence that can be used to obtain a quantitative estimate of how far into the future a time series can be predicted.2011-03-24T18:15:27ZHorn query learning with multiple refinement
http://hdl.handle.net/2117/10845
Title: Horn query learning with multiple refinement
Authors: Sierra Santibáñez, Josefina; Santibáñez Velilla, Josefina
Abstract: In this paper we try to understand the heuristics that underlie the decisions made by the Horn query learning algorithm proposed in [1]. We take advantage of our explicit representation of such heuristics
in order to present an alternative termination proof for the algorithm, as well as to justify its decisions by showing that they always guarantee that the negative examples in the sequence maintained by the algorithm violate different clauses in the target formula. Finally, we propose a new
algorithm that allows multiple refinement when we can prove that such a refinement does not affect the independence of the negative examples in the sequence maintained by the algorithm.2010-12-30T09:01:50ZMining frequent closed rooted trees
http://hdl.handle.net/2117/6835
Title: Mining frequent closed rooted trees
Authors: Balcázar Navarro, José Luis; Bifet Figuerol, Albert Carles; Lozano Bojados, Antoni
Abstract: Many knowledge representation mechanisms are based on tree-like structures, thus symbolizing the fact that certain pieces of information are related in one sense or another. There exists a well-studied process of closure-based data mining in the itemset framework: we consider the extension of this process into trees. We focus mostly on the case where labels on the nodes are nonexistent or unreliable, and discuss algorithms for closurebased mining that only rely on the root of the tree and the link structure.
We provide a notion of intersection that leads to a deeper understanding of the notion of support-based closure, in terms of an actual closure operator.
We describe combinatorial characterizations and some properties of ordered trees, discuss their applicability to unordered trees, and rely on them to design efficient algorithms for mining frequent closed subtrees both in the ordered and the unordered settings. Empirical validations and comparisons with alternative algorithms are provided.2010-03-30T09:00:52Z