Articles de revista
http://hdl.handle.net/2117/3487
2016-09-28T22:46:59ZGelada vocal sequences follow Menzerath's linguistic law
http://hdl.handle.net/2117/89435
Gelada vocal sequences follow Menzerath's linguistic law
Gustison, Morgan; Semple, Stuart; Ferrer Cancho, Ramon; Bergman, Thore
Identifying universal principles underpinning diverse natural systems is a key goal of the life sciences. A powerful approach in addressing this goal has been to test whether patterns consistent with linguistic laws are found in nonhuman animals. Menzerath's law is a linguistic law that states that, the larger the construct, the smaller the size of its constituents. Here, to our knowledge, we present the first evidence that Menzerath's law holds in the vocal communication of a nonhuman species. We show that, in vocal sequences of wild male geladas (Theropithecus gelada), construct size (sequence size in number of calls) is negatively correlated with constituent size (duration of calls). Call duration does not vary significantly with position in the sequence, but call sequence composition does change with sequence size and most call types are abbreviated in larger sequences. We also find that intercall intervals follow the same relationship with sequence size as do calls. Finally, we provide formal mathematical support for the idea that Menzerath's law reflects compression-the principle of minimizing the expected length of a code. Our findings suggest that a common principle underpins human and gelada vocal communication, highlighting the value of exploring the applicability of linguistic laws in vocal systems outside the realm of language.
2016-09-01T06:54:58ZGustison, MorganSemple, StuartFerrer Cancho, RamonBergman, ThoreIdentifying universal principles underpinning diverse natural systems is a key goal of the life sciences. A powerful approach in addressing this goal has been to test whether patterns consistent with linguistic laws are found in nonhuman animals. Menzerath's law is a linguistic law that states that, the larger the construct, the smaller the size of its constituents. Here, to our knowledge, we present the first evidence that Menzerath's law holds in the vocal communication of a nonhuman species. We show that, in vocal sequences of wild male geladas (Theropithecus gelada), construct size (sequence size in number of calls) is negatively correlated with constituent size (duration of calls). Call duration does not vary significantly with position in the sequence, but call sequence composition does change with sequence size and most call types are abbreviated in larger sequences. We also find that intercall intervals follow the same relationship with sequence size as do calls. Finally, we provide formal mathematical support for the idea that Menzerath's law reflects compression-the principle of minimizing the expected length of a code. Our findings suggest that a common principle underpins human and gelada vocal communication, highlighting the value of exploring the applicability of linguistic laws in vocal systems outside the realm of language.The scaling of the minimum sum of edge lengths in uniformly random trees
http://hdl.handle.net/2117/88535
The scaling of the minimum sum of edge lengths in uniformly random trees
Esteban Ángeles, Juan Luis; Ferrer Cancho, Ramon; Gómez Rodríguez, Carlos
The minimum linear arrangement problem on a network consists of finding the minimum sum of edge lengths that can be achieved when the vertices are arranged linearly. Although there are algorithms to solve this problem on trees in polynomial time, they have remained theoretical and have not been implemented in practical contexts to our knowledge. Here we use one of those algorithms to investigate the growth of this sum as a function of the size of the tree in uniformly random trees. We show that this sum is bounded above by its value in a star tree. We also show that the mean edge length grows logarithmically in optimal linear arrangements, in stark contrast to the linear growth that is expected on optimal arrangements of star trees or on random linear arrangements.
2016-07-06T09:32:34ZEsteban Ángeles, Juan LuisFerrer Cancho, RamonGómez Rodríguez, CarlosThe minimum linear arrangement problem on a network consists of finding the minimum sum of edge lengths that can be achieved when the vertices are arranged linearly. Although there are algorithms to solve this problem on trees in polynomial time, they have remained theoretical and have not been implemented in practical contexts to our knowledge. Here we use one of those algorithms to investigate the growth of this sum as a function of the size of the tree in uniformly random trees. We show that this sum is bounded above by its value in a star tree. We also show that the mean edge length grows logarithmically in optimal linear arrangements, in stark contrast to the linear growth that is expected on optimal arrangements of star trees or on random linear arrangements.Estimaciones de promedios en análisis
http://hdl.handle.net/2117/86042
Estimaciones de promedios en análisis
Boza Rocho, Santiago; Soria de Diego, Javier
Existen en Análisis una serie de resultados clásicos en los que el cálculo de promedios juega un papel relevante, como los métodos generalizados de convergencia, el teorema Fundamental del Cálculo o el tratamiento digital de imágenes. Para entender mejor estas ideas, desarrollaremos brevemente algunas de las técnicas más útiles que permiten obtener buenas estimaciones de dichos promedios. En particular, mencionaremos las desigualdades de Hardy, propiedades de la convolución, operadores maximales, aproximaciones de la identidad, lemas de cubrimiento, reordenamientos decrecientes, etc. Al final del artículo presentaremos también algunos problemas interesantes, todavía sin resolver.
2016-04-21T10:18:15ZBoza Rocho, SantiagoSoria de Diego, JavierExisten en Análisis una serie de resultados clásicos en los que el cálculo de promedios juega un papel relevante, como los métodos generalizados de convergencia, el teorema Fundamental del Cálculo o el tratamiento digital de imágenes. Para entender mejor estas ideas, desarrollaremos brevemente algunas de las técnicas más útiles que permiten obtener buenas estimaciones de dichos promedios. En particular, mencionaremos las desigualdades de Hardy, propiedades de la convolución, operadores maximales, aproximaciones de la identidad, lemas de cubrimiento, reordenamientos decrecientes, etc. Al final del artículo presentaremos también algunos problemas interesantes, todavía sin resolver.Liberating language research from dogmas of the 20th century
http://hdl.handle.net/2117/85273
Liberating language research from dogmas of the 20th century
Ferrer Cancho, Ramon; Gómez Rodríguez, Carlos
A commentary on the article “Large-scale evidence of dependency length
minimization in 37 languages” by Futrell, Mahowald & Gibson (PNAS 2015 112 (33) 10336-10341).
2016-04-06T09:32:25ZFerrer Cancho, RamonGómez Rodríguez, CarlosA commentary on the article “Large-scale evidence of dependency length
minimization in 37 languages” by Futrell, Mahowald & Gibson (PNAS 2015 112 (33) 10336-10341).GeoSRS: a hybrid social recommender system for geolocated data
http://hdl.handle.net/2117/84070
GeoSRS: a hybrid social recommender system for geolocated data
Capdevila Pujol, Joan; Arias Vicente, Marta; Arratia Quesada, Argimiro Alejandro
All right sreserved. We present GeoSRS, a hybrid recommender system for a popular location-based social network (LBSN), in which users are able to write short reviews on the places of interest they visit. Using state-of-the-art text mining techniques, our system recommends locations to users using as source the whole set of text reviews in addition to their geographical location. To evaluate our system, we have collected our own data sets by crawling the social network Foursquare. To do this efficiently, we propose the use of a parallel version of the Quadtree technique, which may be applicable to crawling/exploring other spatially distributed sources. Finally, we study the performance of GeoSRS on our collected data set and conclude that by combining sentiment analysis and text modeling, GeoSRS generates more accurate recommendations. The performance of the system improves as more reviews are available, which further motivates the use of large-scale crawling techniques such as the Quadtree.
2016-03-09T14:36:56ZCapdevila Pujol, JoanArias Vicente, MartaArratia Quesada, Argimiro AlejandroAll right sreserved. We present GeoSRS, a hybrid recommender system for a popular location-based social network (LBSN), in which users are able to write short reviews on the places of interest they visit. Using state-of-the-art text mining techniques, our system recommends locations to users using as source the whole set of text reviews in addition to their geographical location. To evaluate our system, we have collected our own data sets by crawling the social network Foursquare. To do this efficiently, we propose the use of a parallel version of the Quadtree technique, which may be applicable to crawling/exploring other spatially distributed sources. Finally, we study the performance of GeoSRS on our collected data set and conclude that by combining sentiment analysis and text modeling, GeoSRS generates more accurate recommendations. The performance of the system improves as more reviews are available, which further motivates the use of large-scale crawling techniques such as the Quadtree.An efficient closed frequent itemset miner for the MOA stream mining system
http://hdl.handle.net/2117/82987
An efficient closed frequent itemset miner for the MOA stream mining system
Quadrana, Massimo; Bifet Figuerol, Albert Carles; Gavaldà Mestre, Ricard
Mining itemsets is a central task in data mining, both in the batch and the streaming paradigms. While robust, efficient, and well-tested implementations exist for batch mining, hardly any publicly available equivalent exists for the streaming scenario. The lack of an efficient, usable tool for the task hinders its use by practitioners and makes it difficult to assess new research in the area. To alleviate this situation, we review the algorithms described in the literature, and implement and evaluate the IncMine algorithm by Cheng, Ke and Ng [J. Intell. Inf. Syst. 31(3) (2008), 191–215] for mining frequent closed itemsets from data streams. Our implementation works on top of the MOA (Massive Online Analysis) stream mining framework to ease its use and integration with other stream mining tasks. We provide a PAC-style rigorous analysis of the quality of the output of IncMine as a function of its parameters; this type of analysis is rare in pattern mining algorithms. As a by-product, the analysis shows how one of the user-provided parameters in the original description can be removed entirely while retaining the performance guarantees. Finally, we experimentally confirm both on synthetic and real data the excellent performance of the algorithm, as reported in the original paper, and its ability to handle concept drift.
2016-02-16T08:02:25ZQuadrana, MassimoBifet Figuerol, Albert CarlesGavaldà Mestre, RicardMining itemsets is a central task in data mining, both in the batch and the streaming paradigms. While robust, efficient, and well-tested implementations exist for batch mining, hardly any publicly available equivalent exists for the streaming scenario. The lack of an efficient, usable tool for the task hinders its use by practitioners and makes it difficult to assess new research in the area. To alleviate this situation, we review the algorithms described in the literature, and implement and evaluate the IncMine algorithm by Cheng, Ke and Ng [J. Intell. Inf. Syst. 31(3) (2008), 191–215] for mining frequent closed itemsets from data streams. Our implementation works on top of the MOA (Massive Online Analysis) stream mining framework to ease its use and integration with other stream mining tasks. We provide a PAC-style rigorous analysis of the quality of the output of IncMine as a function of its parameters; this type of analysis is rare in pattern mining algorithms. As a by-product, the analysis shows how one of the user-provided parameters in the original description can be removed entirely while retaining the performance guarantees. Finally, we experimentally confirm both on synthetic and real data the excellent performance of the algorithm, as reported in the original paper, and its ability to handle concept drift.Structural ambiguity in Montague Grammar and categorial grammar
http://hdl.handle.net/2117/82703
Structural ambiguity in Montague Grammar and categorial grammar
Morrill, Glyn
We give a type logical categorial grammar for the syntax and semantics of Montague's seminal fragment, which includes ambiguities of quantification and intensionality and their interactions, and we present the analyses assigned by a parser/theorem prover CatLog to the examples in the first half of Chapter 7 of the classic text Introduction to Montague Semantics of Dowty, Wall and Peters (1981).
2016-02-09T10:42:01ZMorrill, GlynWe give a type logical categorial grammar for the syntax and semantics of Montague's seminal fragment, which includes ambiguities of quantification and intensionality and their interactions, and we present the analyses assigned by a parser/theorem prover CatLog to the examples in the first half of Chapter 7 of the classic text Introduction to Montague Semantics of Dowty, Wall and Peters (1981).Absolute-type shaft encoding using LFSR sequences with a prescribed length
http://hdl.handle.net/2117/79981
Absolute-type shaft encoding using LFSR sequences with a prescribed length
Fuertes Armengol, José Mª; Balle Pigem, Borja de; Ventura Capell, Enric
Maximal-length binary sequences have existed for a long time. They have many interesting properties, and one of them is that, when taken in blocks of n consecutive positions, they form 2n - 1 different codes in a closed circular sequence. This property can be used to measure absolute angular positions as the circle can be divided into as many parts as different codes can be retrieved. This paper describes how a closed binary sequence with an arbitrary length can be effectively designed with the minimal possible block length using linear feedback shift registers. Such sequences can be used to measure a specified exact number of angular positions using the minimal possible number of sensors that linear methods allow.
2015-11-26T17:41:12ZFuertes Armengol, José MªBalle Pigem, Borja deVentura Capell, EnricMaximal-length binary sequences have existed for a long time. They have many interesting properties, and one of them is that, when taken in blocks of n consecutive positions, they form 2n - 1 different codes in a closed circular sequence. This property can be used to measure absolute angular positions as the circle can be divided into as many parts as different codes can be retrieved. This paper describes how a closed binary sequence with an arbitrary length can be effectively designed with the minimal possible block length using linear feedback shift registers. Such sequences can be used to measure a specified exact number of angular positions using the minimal possible number of sensors that linear methods allow.A multi-scale smoothing kernel for measuring time-series similarity
http://hdl.handle.net/2117/78645
A multi-scale smoothing kernel for measuring time-series similarity
Troncoso, Alicia; Arias Vicente, Marta; Riquelme Santos, José Cristóbal
In this paper a kernel for time-series data is introduced so that it can be used for any data mining task that relies on a similarity or distance metric. The main idea of our kernel is that it should recognize as highly similar time-series that are essentially the same but may be slightly perturbed from each other: for example, if one series is shifted with respect to the other or if it slightly misaligned. Namely, our kernel tries to focus on the shape of the time-series and ignores small perturbations such as misalignments or shifts. First, a recursive formulation of the kernel directly based on its definition is proposed. Then it is shown how to efficiently compute the kernel using an equivalent matrix-based formulation. To validate the proposed kernel three experiments have been carried out. As an initial step, several synthetic datasets have been generated from UCR time-series repository and the KDD challenge of 2007 with the purpose of validating the kernel-derived distance over shifted time-series. Also, the kernel has been applied to the original UCR time-series to analyze its potential in time-series classification in conjunction with Support Vector Machines. Finally, two real-world applications related to ozone concentration in atmosphere and electricity demand have been considered.
2015-11-02T14:11:48ZTroncoso, AliciaArias Vicente, MartaRiquelme Santos, José CristóbalIn this paper a kernel for time-series data is introduced so that it can be used for any data mining task that relies on a similarity or distance metric. The main idea of our kernel is that it should recognize as highly similar time-series that are essentially the same but may be slightly perturbed from each other: for example, if one series is shifted with respect to the other or if it slightly misaligned. Namely, our kernel tries to focus on the shape of the time-series and ignores small perturbations such as misalignments or shifts. First, a recursive formulation of the kernel directly based on its definition is proposed. Then it is shown how to efficiently compute the kernel using an equivalent matrix-based formulation. To validate the proposed kernel three experiments have been carried out. As an initial step, several synthetic datasets have been generated from UCR time-series repository and the KDD challenge of 2007 with the purpose of validating the kernel-derived distance over shifted time-series. Also, the kernel has been applied to the original UCR time-series to analyze its potential in time-series classification in conjunction with Support Vector Machines. Finally, two real-world applications related to ozone concentration in atmosphere and electricity demand have been considered.Zipf's law for word frequencies: Word forms versus lemmas in long texts
http://hdl.handle.net/2117/77862
Zipf's law for word frequencies: Word forms versus lemmas in long texts
Corral, Alvaro; Boleda Torrent, Gemma; Ferrer Cancho, Ramon
Zipf's law is a fundamental paradigm in the statistics of written and spoken natural language as well as in other communication systems. We raise the question of the elementary units for which Zipf's law should hold in the most natural way, studying its validity for plain word forms and for the corresponding lemma forms. We analyze several long literary texts comprising four languages, with different levels of morphological complexity. In all cases Zipf's law is fulfilled, in the sense that a power-law distribution of word or lemma frequencies is valid for several orders of magnitude. We investigate the extent to which the word-lemma transformation preserves two parameters of Zipf's law: the exponent and the low-frequency cut-off. We are not able to demonstrate a strict invariance of the tail, as for a few texts both exponents deviate significantly, but we conclude that the exponents are very similar, despite the remarkavble transformation that going from words to lemmas represents, considerably affecting all ranges of frequencies. In contrast, the low-frequency cut-offs are less stable, tending to increase substantially after the transformation.
2015-10-19T08:26:11ZCorral, AlvaroBoleda Torrent, GemmaFerrer Cancho, RamonZipf's law is a fundamental paradigm in the statistics of written and spoken natural language as well as in other communication systems. We raise the question of the elementary units for which Zipf's law should hold in the most natural way, studying its validity for plain word forms and for the corresponding lemma forms. We analyze several long literary texts comprising four languages, with different levels of morphological complexity. In all cases Zipf's law is fulfilled, in the sense that a power-law distribution of word or lemma frequencies is valid for several orders of magnitude. We investigate the extent to which the word-lemma transformation preserves two parameters of Zipf's law: the exponent and the low-frequency cut-off. We are not able to demonstrate a strict invariance of the tail, as for a few texts both exponents deviate significantly, but we conclude that the exponents are very similar, despite the remarkavble transformation that going from words to lemmas represents, considerably affecting all ranges of frequencies. In contrast, the low-frequency cut-offs are less stable, tending to increase substantially after the transformation.