DSpace Community:
http://hdl.handle.net/2117/3486
Thu, 17 Apr 2014 10:34:17 GMT2014-04-17T10:34:17Zwebmaster.bupc@upc.eduUniversitat Politècnica de Catalunya. Servei de Biblioteques i DocumentaciónoCharacterizing functional dependencies in formal concept analysis with pattern structures
http://hdl.handle.net/2117/21485
Title: Characterizing functional dependencies in formal concept analysis with pattern structures
Authors: Baixeries i Juvillà, Jaume; Kaytoue, Mehdi; Napoli, Amedeo
Abstract: Computing functional dependencies from a relation is an important database topic, with many applications in database management, reverse engineering and query optimization.
Whereas it has been deeply investigated in those fields, strong links exist with
the mathematical framework of Formal Concept Analysis. Considering the discovery of
functional dependencies, it is indeed known that a relation can be expressed as the binary relation of a formal context, whose implications are equivalent to those dependencies. However, this leads to a new data representation that is quadratic in the number of objects w.r.t. the original data. Here, we present an alternative avoiding such a data representation and show how to characterize functional dependencies using the formalism of pattern structures,
an extension of classical FCA to handle complex data. We also show how another class of dependencies can be characterized with that framework, namely, degenerated multivalued dependencies. Finally, we discuss and compare the performances of our new approach in a series of experiments on classical benchmark datasets.Fri, 07 Feb 2014 20:12:07 GMThttp://hdl.handle.net/2117/214852014-02-07T20:12:07ZBaixeries i Juvillà, Jaume; Kaytoue, Mehdi; Napoli, AmedeonoAssociation rules, Attribute implications, Data dependencies, Pattern
structures, Formal concept analysisComputing functional dependencies from a relation is an important database topic, with many applications in database management, reverse engineering and query optimization.
Whereas it has been deeply investigated in those fields, strong links exist with
the mathematical framework of Formal Concept Analysis. Considering the discovery of
functional dependencies, it is indeed known that a relation can be expressed as the binary relation of a formal context, whose implications are equivalent to those dependencies. However, this leads to a new data representation that is quadratic in the number of objects w.r.t. the original data. Here, we present an alternative avoiding such a data representation and show how to characterize functional dependencies using the formalism of pattern structures,
an extension of classical FCA to handle complex data. We also show how another class of dependencies can be characterized with that framework, namely, degenerated multivalued dependencies. Finally, we discuss and compare the performances of our new approach in a series of experiments on classical benchmark datasets.Spectral learning of sequence taggers over continuous sequences
http://hdl.handle.net/2117/21208
Title: Spectral learning of sequence taggers over continuous sequences
Authors: Recasens, Adria; Quattoni, Ariadna Julieta
Abstract: In this paper we present a spectral algorithm for learning weighted finite-state sequence taggers (WFSTs) over paired input-output sequences, where the input is continuous and the output discrete. WFSTs are an important tool for modelling paired input-output sequences and have numerous applications in real-world problems. Our approach is based on generalizing the class of weighted finite-state sequence taggers over discrete input-output sequences to a class where transitions are linear combinations of elementary transitions and the weights of the linear combination are determined by dynamic features of the continuous input sequence. The resulting learning algorithm is efficient and accurate.Fri, 10 Jan 2014 11:34:20 GMThttp://hdl.handle.net/2117/212082014-01-10T11:34:20ZRecasens, Adria; Quattoni, Ariadna JulietanoIn this paper we present a spectral algorithm for learning weighted finite-state sequence taggers (WFSTs) over paired input-output sequences, where the input is continuous and the output discrete. WFSTs are an important tool for modelling paired input-output sequences and have numerous applications in real-world problems. Our approach is based on generalizing the class of weighted finite-state sequence taggers over discrete input-output sequences to a class where transitions are linear combinations of elementary transitions and the weights of the linear combination are determined by dynamic features of the continuous input sequence. The resulting learning algorithm is efficient and accurate.Unsupervised spectral learning of finite-state transducers
http://hdl.handle.net/2117/21077
Title: Unsupervised spectral learning of finite-state transducers
Authors: Bailly, Raphaël; Carreras Pérez, Xavier; Quattoni, Ariadna Julieta
Abstract: Finite-State Transducers (FST) are a standard tool for modeling paired inputoutput
sequences and are used in numerous applications, ranging from computational
biology to natural language processing. Recently Balle et al. presented a spectral algorithm for learning FST from samples of aligned input-output sequences. In this paper we address the more realistic, yet challenging setting where the alignments are unknown to the learning algorithm. We frame FST learning as finding a low rank Hankel matrix satisfying constraints derived from observable statistics. Under this formulation, we provide identifiability results for FST distributions. Then, following previous work on rank minimization, we propose a regularized convex relaxation of this objective which is based on minimizing a nuclear norm penalty subject to linear constraints and can be solved efficiently.Fri, 20 Dec 2013 11:42:49 GMThttp://hdl.handle.net/2117/210772013-12-20T11:42:49ZBailly, Raphaël; Carreras Pérez, Xavier; Quattoni, Ariadna JulietanoFinite State Transducers
Spectral LearningFinite-State Transducers (FST) are a standard tool for modeling paired inputoutput
sequences and are used in numerous applications, ranging from computational
biology to natural language processing. Recently Balle et al. presented a spectral algorithm for learning FST from samples of aligned input-output sequences. In this paper we address the more realistic, yet challenging setting where the alignments are unknown to the learning algorithm. We frame FST learning as finding a low rank Hankel matrix satisfying constraints derived from observable statistics. Under this formulation, we provide identifiability results for FST distributions. Then, following previous work on rank minimization, we propose a regularized convex relaxation of this objective which is based on minimizing a nuclear norm penalty subject to linear constraints and can be solved efficiently.Unsupervised spectral learning of WCFG as low-rank matrix completion
http://hdl.handle.net/2117/21076
Title: Unsupervised spectral learning of WCFG as low-rank matrix completion
Authors: Bailly, Raphaël; Carreras Pérez, Xavier; Luque, Franco M.; Quattoni, Ariadna Julieta
Abstract: We derive a spectral method for unsupervised
learning ofWeighted Context Free Grammars.
We frame WCFG induction as finding a Hankel
matrix that has low rank and is linearly
constrained to represent a function computed
by inside-outside recursions. The proposed algorithm picks the grammar that agrees with a sample and is the simplest with respect to the nuclear norm of the Hankel matrix.Fri, 20 Dec 2013 11:28:14 GMThttp://hdl.handle.net/2117/210762013-12-20T11:28:14ZBailly, Raphaël; Carreras Pérez, Xavier; Luque, Franco M.; Quattoni, Ariadna JulietanoWe derive a spectral method for unsupervised
learning ofWeighted Context Free Grammars.
We frame WCFG induction as finding a Hankel
matrix that has low rank and is linearly
constrained to represent a function computed
by inside-outside recursions. The proposed algorithm picks the grammar that agrees with a sample and is the simplest with respect to the nuclear norm of the Hankel matrix.Spectral learning of weighted automata: a forward-backward perspective
http://hdl.handle.net/2117/21075
Title: Spectral learning of weighted automata: a forward-backward perspective
Authors: Balle Pigem, Borja de; Carreras Pérez, Xavier; Luque, Franco M.; Quattoni, Ariadna Julieta
Abstract: In recent years we have seen the development of efficient provably correct algorithms for learning Weighted Finite Automata (WFA). Most of these algorithms avoid the known hardness results by defining parameters beyond the number of states that can be used to quantify the complexity of learning automata under a particular distribution. One such class of methods are the so-called spectral algorithms that measure learning complexity in terms of the smallest singular value of some Hankel matrix. However, despite their simplicity and wide applicability to real problems, their impact in application domains remains marginal to this date. One of the goals of this paper is to remedy this situation by presenting a derivation of the spectral method for learning WFA that—without sacrificing rigor and mathematical elegance—puts emphasis on providing intuitions on the inner workings of the method and does not assume a strong background in formal algebraic methods. In addition, our algorithm overcomes some of the shortcomings of previous work and is able to learn from statistics of substrings. To illustrate the approach we present experiments on a real application of the method to natural language parsing.Fri, 20 Dec 2013 11:07:41 GMThttp://hdl.handle.net/2117/210752013-12-20T11:07:41ZBalle Pigem, Borja de; Carreras Pérez, Xavier; Luque, Franco M.; Quattoni, Ariadna JulietanoSpectral learning
Weighted finite automata
Dependency parsingIn recent years we have seen the development of efficient provably correct algorithms for learning Weighted Finite Automata (WFA). Most of these algorithms avoid the known hardness results by defining parameters beyond the number of states that can be used to quantify the complexity of learning automata under a particular distribution. One such class of methods are the so-called spectral algorithms that measure learning complexity in terms of the smallest singular value of some Hankel matrix. However, despite their simplicity and wide applicability to real problems, their impact in application domains remains marginal to this date. One of the goals of this paper is to remedy this situation by presenting a derivation of the spectral method for learning WFA that—without sacrificing rigor and mathematical elegance—puts emphasis on providing intuitions on the inner workings of the method and does not assume a strong background in formal algebraic methods. In addition, our algorithm overcomes some of the shortcomings of previous work and is able to learn from statistics of substrings. To illustrate the approach we present experiments on a real application of the method to natural language parsing.A joint model for 2D and 3D pose estimation from a single image
http://hdl.handle.net/2117/20946
Title: A joint model for 2D and 3D pose estimation from a single image
Authors: Simo-Serra, Edgar; Quattoni, Ariadna Julieta; Torras, Carme; Moreno-Noguer, Francesc
Abstract: We introduce a novel approach to automatically recover 3D human pose from a single image. Most previous work follows a pipelined approach: initially, a set of 2D features such as edges, joints or silhouettes are detected in the image, and then these observations are used to infer the 3D pose. Solving these two problems separately may lead to erroneous 3D poses when the feature detector has performed poorly. In this paper, we address this issue by jointly solving both the 2D detection and the 3D inference problems. For this purpose, we propose a Bayesian framework that integrates a generative model based on latent variables and discriminative 2D part detectors based on HOGs, and perform inference using evolutionary algorithms. Real experimentation demonstrates competitive results, and the ability of our methodology to provide accurate 2D and 3D pose estimations even when the 2D detectors are inaccurate.Tue, 10 Dec 2013 10:35:59 GMThttp://hdl.handle.net/2117/209462013-12-10T10:35:59ZSimo-Serra, Edgar; Quattoni, Ariadna Julieta; Torras, Carme; Moreno-Noguer, FrancescnoDeformable models, Detectors, Estimation, Joints, Shape, Solid modeling, Three-dimensional displaysWe introduce a novel approach to automatically recover 3D human pose from a single image. Most previous work follows a pipelined approach: initially, a set of 2D features such as edges, joints or silhouettes are detected in the image, and then these observations are used to infer the 3D pose. Solving these two problems separately may lead to erroneous 3D poses when the feature detector has performed poorly. In this paper, we address this issue by jointly solving both the 2D detection and the 3D inference problems. For this purpose, we propose a Bayesian framework that integrates a generative model based on latent variables and discriminative 2D part detectors based on HOGs, and perform inference using evolutionary algorithms. Real experimentation demonstrates competitive results, and the ability of our methodology to provide accurate 2D and 3D pose estimations even when the 2D detectors are inaccurate.Spectral learning of transducers over continuous sequences
http://hdl.handle.net/2117/20362
Title: Spectral learning of transducers over continuous sequences
Authors: Recasens, Adria; Quattoni, Ariadna Julieta
Abstract: In this paper we present a spectral algorithm for learning weighted nite state transducers (WFSTs) over paired input-output sequences, where the input is continuous and the output discrete. WFSTs are an important tool for modeling paired input-output sequences and have numerous applications in
real-world problems. Recently, Balle et al (2011) proposed a spectral method for learning WFSTs that overcomes some of the well known limitations of gradient-based or EM optimizations which can be computationally expensive and su er from local optima issues. Their algorithm can model distributions where both inputs and outputs are sequences from a discrete alphabet.
However, many real world problems require modeling paired sequences where the inputs are not discrete but continuos sequences. Modelling continuous sequences with spectral methods has been studied in the context of HMMs (Song et al 2010), where a spectral algorithm for this case was derived. In this
paper we follow that line of work and propose a spectral learning algorithm
for modelling paired input-output sequences where the inputs are continuous and the outputs are discrete. Our approach is based on generalizing the class of weighted nite state transducers over discrete input-output sequences to a class where transitions are linear combinations of elementary transitions and the weights of this linear combinations are determined by dynamic features of the continuous input sequence.
At its core, the algorithm is simple and scalable to large data sets. We present experiments on a real task that validate the eff ectiveness of the proposed approach.Fri, 11 Oct 2013 10:15:39 GMThttp://hdl.handle.net/2117/203622013-10-11T10:15:39ZRecasens, Adria; Quattoni, Ariadna JulietanoIn this paper we present a spectral algorithm for learning weighted nite state transducers (WFSTs) over paired input-output sequences, where the input is continuous and the output discrete. WFSTs are an important tool for modeling paired input-output sequences and have numerous applications in
real-world problems. Recently, Balle et al (2011) proposed a spectral method for learning WFSTs that overcomes some of the well known limitations of gradient-based or EM optimizations which can be computationally expensive and su er from local optima issues. Their algorithm can model distributions where both inputs and outputs are sequences from a discrete alphabet.
However, many real world problems require modeling paired sequences where the inputs are not discrete but continuos sequences. Modelling continuous sequences with spectral methods has been studied in the context of HMMs (Song et al 2010), where a spectral algorithm for this case was derived. In this
paper we follow that line of work and propose a spectral learning algorithm
for modelling paired input-output sequences where the inputs are continuous and the outputs are discrete. Our approach is based on generalizing the class of weighted nite state transducers over discrete input-output sequences to a class where transitions are linear combinations of elementary transitions and the weights of this linear combinations are determined by dynamic features of the continuous input sequence.
At its core, the algorithm is simple and scalable to large data sets. We present experiments on a real task that validate the eff ectiveness of the proposed approach.A count invariant for Lambek calculus with additives and bracket modalities
http://hdl.handle.net/2117/20347
Title: A count invariant for Lambek calculus with additives and bracket modalities
Authors: Valentín Fernández Gallart, José Oriol; Serret, Daniel; Morrill, Glyn
Abstract: The count invariance of van Benthem (1991) is that for a sequent to be a theorem of the Lambek calculus, for each atom, the number of positive occurrences equals the number of negative occurrences. (The same is true for
multiplicative linear logic.) The count invariance provides for extensive pruning
of the sequent proof search space. In this paper we generalize count invariance to categorial grammar (or linear logic) with additives and bracket modalities. We define by mutual recursion two counts, minimum count and maximum count, and we prove that if a multiplicative-additive sequent is a theorem, then for every atom, the minimum count is less than or equal to zero and the maximum count is greater than or equal to zero; in the case of a purely multiplicative sequent, minimum count and maximum count coincide in such a way as to together reconstitute the van Benthem count criterion. We then define in the same way a bracket count providing a count check for bracket modalities. This allows for efficient pruning of the sequent proof search space in parsing categorial grammar with additives and bracket modalities.Wed, 09 Oct 2013 12:02:14 GMThttp://hdl.handle.net/2117/203472013-10-09T12:02:14ZValentín Fernández Gallart, José Oriol; Serret, Daniel; Morrill, GlynnoCategorial grammar, Lambek calculus, Linear logic, Multiplicative linear logic, Mutual recursion, Proof searchThe count invariance of van Benthem (1991) is that for a sequent to be a theorem of the Lambek calculus, for each atom, the number of positive occurrences equals the number of negative occurrences. (The same is true for
multiplicative linear logic.) The count invariance provides for extensive pruning
of the sequent proof search space. In this paper we generalize count invariance to categorial grammar (or linear logic) with additives and bracket modalities. We define by mutual recursion two counts, minimum count and maximum count, and we prove that if a multiplicative-additive sequent is a theorem, then for every atom, the minimum count is less than or equal to zero and the maximum count is greater than or equal to zero; in the case of a purely multiplicative sequent, minimum count and maximum count coincide in such a way as to together reconstitute the van Benthem count criterion. We then define in the same way a bracket count providing a count check for bracket modalities. This allows for efficient pruning of the sequent proof search space in parsing categorial grammar with additives and bracket modalities.Closures and partial implications in educational data mining
http://hdl.handle.net/2117/20280
Title: Closures and partial implications in educational data mining
Authors: García Sáiz, Diego; Zorrilla Pantaleón, Marta Elena; Balcázar Navarro, José Luis
Abstract: Educational Data Mining (EDM) is a growing field of use of data analysis techniques. Speci fically, we consider partial implications. The main problems are, fi rst, that a support threshold is absolutely necessary but setting it "right" is extremely di fficult; and, second, that, very often, large amounts of partial implications are found, beyond what an EDM user would be able to manually inspect. Our program yacaree,
recently developed, is an associator that tackles both problems. In an EDM context, our program has demonstrated to be competitive with respect to the amount of partial implications output. But "fi nding few rules" is not the same as "fi nding the right rules". We extend the evaluation with a deeper quantitative analysis and a subjective evaluation on EDM datasets, eliciting the opinion of the instructors of the courses
under analysis to assess the pertinence of the rules found by diff erent association miners.Thu, 03 Oct 2013 11:22:13 GMThttp://hdl.handle.net/2117/202802013-10-03T11:22:13ZGarcía Sáiz, Diego; Zorrilla Pantaleón, Marta Elena; Balcázar Navarro, José LuisnoClosure lattices, Partial implications, Association rulesEducational Data Mining (EDM) is a growing field of use of data analysis techniques. Speci fically, we consider partial implications. The main problems are, fi rst, that a support threshold is absolutely necessary but setting it "right" is extremely di fficult; and, second, that, very often, large amounts of partial implications are found, beyond what an EDM user would be able to manually inspect. Our program yacaree,
recently developed, is an associator that tackles both problems. In an EDM context, our program has demonstrated to be competitive with respect to the amount of partial implications output. But "fi nding few rules" is not the same as "fi nding the right rules". We extend the evaluation with a deeper quantitative analysis and a subjective evaluation on EDM datasets, eliciting the opinion of the instructors of the courses
under analysis to assess the pertinence of the rules found by diff erent association miners.Iterator-based algorithms in self-tuning discovery of partial implications
http://hdl.handle.net/2117/20269
Title: Iterator-based algorithms in self-tuning discovery of partial implications
Authors: Balcázar Navarro, José Luis; García Sáiz, Diego; de la Dehesa, Javier
Abstract: We describe the internal algorithmics of our recent implementation of a closure-based self-tuning associator: yacaree. This system is designed so as not to request the user to specify any threshold. In order to avoid the need of a support threshold, we introduce an algorithm that constructs closed sets in order of decreasing support; we are not aware of any similar previous algorithm. In order not to overwhelm the user with large quantities of partial implications, our system filters the output according to a recently studied lattice-closure-based notion
of con fidence boost, and self-adjusts the threshold for that rule quality measure as well. As a consequence, the necessary algorithmics interact in complicated ways. In order to control this interaction, we have resorted to a well-known, powerful conceptual tool, called Iterators: this notion allows us to distribute control among the various algorithms at play in a relatively simple manner, leading to a fully operative, open-source, effi cient system for discovery of partial implications in relational data.Thu, 03 Oct 2013 08:44:41 GMThttp://hdl.handle.net/2117/202692013-10-03T08:44:41ZBalcázar Navarro, José Luis; García Sáiz, Diego; de la Dehesa, JaviernoAssociation mining, Parameter-free mining, Iterators, PythonWe describe the internal algorithmics of our recent implementation of a closure-based self-tuning associator: yacaree. This system is designed so as not to request the user to specify any threshold. In order to avoid the need of a support threshold, we introduce an algorithm that constructs closed sets in order of decreasing support; we are not aware of any similar previous algorithm. In order not to overwhelm the user with large quantities of partial implications, our system filters the output according to a recently studied lattice-closure-based notion
of con fidence boost, and self-adjusts the threshold for that rule quality measure as well. As a consequence, the necessary algorithmics interact in complicated ways. In order to control this interaction, we have resorted to a well-known, powerful conceptual tool, called Iterators: this notion allows us to distribute control among the various algorithms at play in a relatively simple manner, leading to a fully operative, open-source, effi cient system for discovery of partial implications in relational data.Spectral learning in non-deterministic dependency parsing
http://hdl.handle.net/2117/20170
Title: Spectral learning in non-deterministic dependency parsing
Authors: Luque, Franco M.; Quattoni, Ariadna Julieta; Balle Pigem, Borja de; Carreras Pérez, Xavier
Abstract: In this paper we study spectral learning methods for non-deterministic split head-automata grammars, a powerful hidden-state formalism for dependency parsing. We present a learning algorithm that, like other spectral methods, is efficient and non-susceptible to local minima. We show how this algorithm can be formulated as a technique for inducing hidden structure from distributions computed by forward-backward recursions. Furthermore, we also present an inside-outside algorithm for the parsing model that runs in cubic time, hence maintaining the standard parsing costs for context-free grammars.
Description: Best Paper Award of EACL 2012Fri, 20 Sep 2013 10:50:52 GMThttp://hdl.handle.net/2117/201702013-09-20T10:50:52ZLuque, Franco M.; Quattoni, Ariadna Julieta; Balle Pigem, Borja de; Carreras Pérez, XaviernoIn this paper we study spectral learning methods for non-deterministic split head-automata grammars, a powerful hidden-state formalism for dependency parsing. We present a learning algorithm that, like other spectral methods, is efficient and non-susceptible to local minima. We show how this algorithm can be formulated as a technique for inducing hidden structure from distributions computed by forward-backward recursions. Furthermore, we also present an inside-outside algorithm for the parsing model that runs in cubic time, hence maintaining the standard parsing costs for context-free grammars.A kernel for time series classification: application to atmospheric pollutants
http://hdl.handle.net/2117/19435
Title: A kernel for time series classification: application to atmospheric pollutants
Authors: Arias Vicente, Marta; Troncoso, Alicia; Riquelme, José C.
Abstract: In this paper a kernel for time-series data is presented. The main idea of the kernel is that it is designed to recognize as similar time series that may be slightly shifted with one another. Namely, it tries to focus on the shape of the time-series and ignores the fact that the series may not be perfectly aligned. The proposed kernel has been validated on several datasets based on the UCR time-series repository [1]. A comparison with the well-known Dynamic Time Warping (DTW) distance and Euclidean distance shows that the proposed kernel outperforms the Euclidean distance and is competitive with respect to the DTW distance while having a much lower computational cost.Wed, 29 May 2013 07:54:57 GMThttp://hdl.handle.net/2117/194352013-05-29T07:54:57ZArias Vicente, Marta; Troncoso, Alicia; Riquelme, José C.noAtmospheric pollutants, Computational costs, Data sets, Dynamic time warping, Euclidean distance, Time series classifications, Time-series dataIn this paper a kernel for time-series data is presented. The main idea of the kernel is that it is designed to recognize as similar time series that may be slightly shifted with one another. Namely, it tries to focus on the shape of the time-series and ignores the fact that the series may not be perfectly aligned. The proposed kernel has been validated on several datasets based on the UCR time-series repository [1]. A comparison with the well-known Dynamic Time Warping (DTW) distance and Euclidean distance shows that the proposed kernel outperforms the Euclidean distance and is competitive with respect to the DTW distance while having a much lower computational cost.The Evolution of the exponent of Zipf's law in language ontogeny
http://hdl.handle.net/2117/19413
Title: The Evolution of the exponent of Zipf's law in language ontogeny
Authors: Baixeries i Juvillà, Jaume; Elvevag, Brita; Ferrer Cancho, Ramon
Abstract: It is well-known that word frequencies arrange themselves according to Zipf's law. However, little is known about the dependency of the parameters of the law and the complexity of a communication system. Many models of the evolution of language assume that the exponent of the law remains constant as the complexity of a communication systems increases. Using longitudinal studies of child language, we analysed the word rank distribution for the speech of children and adults participating in conversations. The adults typically included family members (e.g., parents) or the investigators conducting the research. Our analysis of the evolution of Zipf's law yields two main unexpected results. First, in children the exponent of the law tends to decrease over time while this tendency is weaker in adults, thus suggesting this is not a mere mirror effect of adult speech. Second, although the exponent of the law is more stable in adults, their exponents fall below 1 which is the typical value of the exponent assumed in both children and adults. Our analysis also shows a tendency of the mean length of utterances (MLU), a simple estimate of syntactic complexity, to increase as the exponent decreases. The parallel evolution of the exponent and a simple indicator of syntactic complexity (MLU) supports the hypothesis that the exponent of Zipf's law and linguistic complexity are inter-related. The assumption that Zipf's law for word ranks is a power-law with a constant exponent of one in both adults and children needs to be revised.Mon, 27 May 2013 14:04:20 GMThttp://hdl.handle.net/2117/194132013-05-27T14:04:20ZBaixeries i Juvillà, Jaume; Elvevag, Brita; Ferrer Cancho, RamonnoIt is well-known that word frequencies arrange themselves according to Zipf's law. However, little is known about the dependency of the parameters of the law and the complexity of a communication system. Many models of the evolution of language assume that the exponent of the law remains constant as the complexity of a communication systems increases. Using longitudinal studies of child language, we analysed the word rank distribution for the speech of children and adults participating in conversations. The adults typically included family members (e.g., parents) or the investigators conducting the research. Our analysis of the evolution of Zipf's law yields two main unexpected results. First, in children the exponent of the law tends to decrease over time while this tendency is weaker in adults, thus suggesting this is not a mere mirror effect of adult speech. Second, although the exponent of the law is more stable in adults, their exponents fall below 1 which is the typical value of the exponent assumed in both children and adults. Our analysis also shows a tendency of the mean length of utterances (MLU), a simple estimate of syntactic complexity, to increase as the exponent decreases. The parallel evolution of the exponent and a simple indicator of syntactic complexity (MLU) supports the hypothesis that the exponent of Zipf's law and linguistic complexity are inter-related. The assumption that Zipf's law for word ranks is a power-law with a constant exponent of one in both adults and children needs to be revised.Empowering automatic data-center management with machine learning
http://hdl.handle.net/2117/19370
Title: Empowering automatic data-center management with machine learning
Authors: Berral García, Josep Lluís; Gavaldà Mestre, Ricard; Torres Viñals, Jordi
Abstract: The Cloud as computing paradigm has become nowadays crucial for most Internet business models. Managing and optimizing its performance on a moment-by-moment basis is not easy given as the amount and diversity of elements involved (hardware, applications, workloads, customer needs...). Here we show how a combination of scheduling algorithms and data mining techniques helps improving the performance and profitability of a data-center running virtualized web-services. We model the data-center's main resources (CPU, memory, IO), quality of service (viewed as response time), and workloads (incoming streams of requests) from past executions. We show how these models to help scheduling algorithms make better decisions about job and resource allocation, aiming for a balance between throughput, quality of service, and power consumption.Wed, 22 May 2013 11:19:56 GMThttp://hdl.handle.net/2117/193702013-05-22T11:19:56ZBerral García, Josep Lluís; Gavaldà Mestre, Ricard; Torres Viñals, JordinoThe Cloud as computing paradigm has become nowadays crucial for most Internet business models. Managing and optimizing its performance on a moment-by-moment basis is not easy given as the amount and diversity of elements involved (hardware, applications, workloads, customer needs...). Here we show how a combination of scheduling algorithms and data mining techniques helps improving the performance and profitability of a data-center running virtualized web-services. We model the data-center's main resources (CPU, memory, IO), quality of service (viewed as response time), and workloads (incoming streams of requests) from past executions. We show how these models to help scheduling algorithms make better decisions about job and resource allocation, aiming for a balance between throughput, quality of service, and power consumption.The parameters of Menzerath-Altmann law in genomes
http://hdl.handle.net/2117/19025
Title: The parameters of Menzerath-Altmann law in genomes
Authors: Baixeries i Juvillà, Jaume; Hernández Fernández, Antonio; Forns, Núria; Ferrer Cancho, Ramon
Abstract: The relationship between the size of the whole and the size of the parts in language and music is known to follow the Menzerath-Altmann law at many levels of description (morphemes, words, sentences, …). Qualitatively, the law states that the larger the whole, the smaller its parts, e.g. the longer a word (in syllables) the shorter its syllables (in letters or
phonemes). This patterning has also been found in genomes: the longer a genome (in chromosomes), the shorter its chromosomes (in base pairs). However, it has been argued recently that mean chromosome length is trivially a pure power function of chromosome number with an exponent of -1. The functional dependency between mean chromosome size and chromosome number in groups of organisms from three different kingdoms is studied. The fit of a pure power function yields exponents between -1.6 and 0.1. It is shown that an exponent of -1 is unlikely for fungi, gymnosperm plants, insects, reptiles, ray-finned fishes and
amphibians. Even when the exponent is very close to -1, adding an exponential component
is able to yield a better fit with regard to a pure power-law in plants, mammals, ray-finned fishes and amphibians. The parameters of the Menzerath-Altmann law in genomes deviate significantly from a power law with a -1 exponent with the exception of birds and cartilaginous fishes.Fri, 26 Apr 2013 18:45:28 GMThttp://hdl.handle.net/2117/190252013-04-26T18:45:28ZBaixeries i Juvillà, Jaume; Hernández Fernández, Antonio; Forns, Núria; Ferrer Cancho, RamonnoThe relationship between the size of the whole and the size of the parts in language and music is known to follow the Menzerath-Altmann law at many levels of description (morphemes, words, sentences, …). Qualitatively, the law states that the larger the whole, the smaller its parts, e.g. the longer a word (in syllables) the shorter its syllables (in letters or
phonemes). This patterning has also been found in genomes: the longer a genome (in chromosomes), the shorter its chromosomes (in base pairs). However, it has been argued recently that mean chromosome length is trivially a pure power function of chromosome number with an exponent of -1. The functional dependency between mean chromosome size and chromosome number in groups of organisms from three different kingdoms is studied. The fit of a pure power function yields exponents between -1.6 and 0.1. It is shown that an exponent of -1 is unlikely for fungi, gymnosperm plants, insects, reptiles, ray-finned fishes and
amphibians. Even when the exponent is very close to -1, adding an exponential component
is able to yield a better fit with regard to a pure power-law in plants, mammals, ray-finned fishes and amphibians. The parameters of the Menzerath-Altmann law in genomes deviate significantly from a power law with a -1 exponent with the exception of birds and cartilaginous fishes.