Articles de revista
http://hdl.handle.net/2117/3487
Mon, 24 Oct 2016 16:03:39 GMT
20161024T16:03:39Z

On graph combinatorics to improve eigenvectorbased measures of centrality in directed networks
http://hdl.handle.net/2117/90335
On graph combinatorics to improve eigenvectorbased measures of centrality in directed networks
Arratia Quesada, Argimiro Alejandro; Marijuan López, Carlos
We present a combinatorial study on the rearrangement of links in the structure of directed networks for the purpose of improving the valuation of a vertex or group of vertices as established by an eigenvectorbased centrality measure. We build our topological classification starting from unidirectional rooted trees and up to more complex hierarchical structures such as acyclic digraphs, bidirectional and cyclical rooted trees (obtained by closing cycles on unidirectional trees). We analyze different modifications on the structure of these networks and study their effect on the valuation given by the eigenvectorbased scoring functions, with particular focus on alphacentrality and PageRank.
© 2016. This version is made available under the CCBYNCND 4.0 license http://creativecommons.org/licenses/byncnd/4.0/
Thu, 29 Sep 2016 14:12:42 GMT
http://hdl.handle.net/2117/90335
20160929T14:12:42Z
Arratia Quesada, Argimiro Alejandro
Marijuan López, Carlos
We present a combinatorial study on the rearrangement of links in the structure of directed networks for the purpose of improving the valuation of a vertex or group of vertices as established by an eigenvectorbased centrality measure. We build our topological classification starting from unidirectional rooted trees and up to more complex hierarchical structures such as acyclic digraphs, bidirectional and cyclical rooted trees (obtained by closing cycles on unidirectional trees). We analyze different modifications on the structure of these networks and study their effect on the valuation given by the eigenvectorbased scoring functions, with particular focus on alphacentrality and PageRank.

Gelada vocal sequences follow Menzerath's linguistic law
http://hdl.handle.net/2117/89435
Gelada vocal sequences follow Menzerath's linguistic law
Gustison, Morgan; Semple, Stuart; Ferrer Cancho, Ramon; Bergman, Thore
Identifying universal principles underpinning diverse natural systems is a key goal of the life sciences. A powerful approach in addressing this goal has been to test whether patterns consistent with linguistic laws are found in nonhuman animals. Menzerath's law is a linguistic law that states that, the larger the construct, the smaller the size of its constituents. Here, to our knowledge, we present the first evidence that Menzerath's law holds in the vocal communication of a nonhuman species. We show that, in vocal sequences of wild male geladas (Theropithecus gelada), construct size (sequence size in number of calls) is negatively correlated with constituent size (duration of calls). Call duration does not vary significantly with position in the sequence, but call sequence composition does change with sequence size and most call types are abbreviated in larger sequences. We also find that intercall intervals follow the same relationship with sequence size as do calls. Finally, we provide formal mathematical support for the idea that Menzerath's law reflects compressionthe principle of minimizing the expected length of a code. Our findings suggest that a common principle underpins human and gelada vocal communication, highlighting the value of exploring the applicability of linguistic laws in vocal systems outside the realm of language.
Thu, 01 Sep 2016 06:54:58 GMT
http://hdl.handle.net/2117/89435
20160901T06:54:58Z
Gustison, Morgan
Semple, Stuart
Ferrer Cancho, Ramon
Bergman, Thore
Identifying universal principles underpinning diverse natural systems is a key goal of the life sciences. A powerful approach in addressing this goal has been to test whether patterns consistent with linguistic laws are found in nonhuman animals. Menzerath's law is a linguistic law that states that, the larger the construct, the smaller the size of its constituents. Here, to our knowledge, we present the first evidence that Menzerath's law holds in the vocal communication of a nonhuman species. We show that, in vocal sequences of wild male geladas (Theropithecus gelada), construct size (sequence size in number of calls) is negatively correlated with constituent size (duration of calls). Call duration does not vary significantly with position in the sequence, but call sequence composition does change with sequence size and most call types are abbreviated in larger sequences. We also find that intercall intervals follow the same relationship with sequence size as do calls. Finally, we provide formal mathematical support for the idea that Menzerath's law reflects compressionthe principle of minimizing the expected length of a code. Our findings suggest that a common principle underpins human and gelada vocal communication, highlighting the value of exploring the applicability of linguistic laws in vocal systems outside the realm of language.

The scaling of the minimum sum of edge lengths in uniformly random trees
http://hdl.handle.net/2117/88535
The scaling of the minimum sum of edge lengths in uniformly random trees
Esteban Ángeles, Juan Luis; Ferrer Cancho, Ramon; Gómez Rodríguez, Carlos
The minimum linear arrangement problem on a network consists of finding the minimum sum of edge lengths that can be achieved when the vertices are arranged linearly. Although there are algorithms to solve this problem on trees in polynomial time, they have remained theoretical and have not been implemented in practical contexts to our knowledge. Here we use one of those algorithms to investigate the growth of this sum as a function of the size of the tree in uniformly random trees. We show that this sum is bounded above by its value in a star tree. We also show that the mean edge length grows logarithmically in optimal linear arrangements, in stark contrast to the linear growth that is expected on optimal arrangements of star trees or on random linear arrangements.
Wed, 06 Jul 2016 09:32:34 GMT
http://hdl.handle.net/2117/88535
20160706T09:32:34Z
Esteban Ángeles, Juan Luis
Ferrer Cancho, Ramon
Gómez Rodríguez, Carlos
The minimum linear arrangement problem on a network consists of finding the minimum sum of edge lengths that can be achieved when the vertices are arranged linearly. Although there are algorithms to solve this problem on trees in polynomial time, they have remained theoretical and have not been implemented in practical contexts to our knowledge. Here we use one of those algorithms to investigate the growth of this sum as a function of the size of the tree in uniformly random trees. We show that this sum is bounded above by its value in a star tree. We also show that the mean edge length grows logarithmically in optimal linear arrangements, in stark contrast to the linear growth that is expected on optimal arrangements of star trees or on random linear arrangements.

Estimaciones de promedios en análisis
http://hdl.handle.net/2117/86042
Estimaciones de promedios en análisis
Boza Rocho, Santiago; Soria de Diego, Javier
Existen en Análisis una serie de resultados clásicos en los que el cálculo de promedios juega un papel relevante, como los métodos generalizados de convergencia, el teorema Fundamental del Cálculo o el tratamiento digital de imágenes. Para entender mejor estas ideas, desarrollaremos brevemente algunas de las técnicas más útiles que permiten obtener buenas estimaciones de dichos promedios. En particular, mencionaremos las desigualdades de Hardy, propiedades de la convolución, operadores maximales, aproximaciones de la identidad, lemas de cubrimiento, reordenamientos decrecientes, etc. Al final del artículo presentaremos también algunos problemas interesantes, todavía sin resolver.
Thu, 21 Apr 2016 10:18:15 GMT
http://hdl.handle.net/2117/86042
20160421T10:18:15Z
Boza Rocho, Santiago
Soria de Diego, Javier
Existen en Análisis una serie de resultados clásicos en los que el cálculo de promedios juega un papel relevante, como los métodos generalizados de convergencia, el teorema Fundamental del Cálculo o el tratamiento digital de imágenes. Para entender mejor estas ideas, desarrollaremos brevemente algunas de las técnicas más útiles que permiten obtener buenas estimaciones de dichos promedios. En particular, mencionaremos las desigualdades de Hardy, propiedades de la convolución, operadores maximales, aproximaciones de la identidad, lemas de cubrimiento, reordenamientos decrecientes, etc. Al final del artículo presentaremos también algunos problemas interesantes, todavía sin resolver.

Liberating language research from dogmas of the 20th century
http://hdl.handle.net/2117/85273
Liberating language research from dogmas of the 20th century
Ferrer Cancho, Ramon; Gómez Rodríguez, Carlos
A commentary on the article “Largescale evidence of dependency length
minimization in 37 languages” by Futrell, Mahowald & Gibson (PNAS 2015 112 (33) 1033610341).
Wed, 06 Apr 2016 09:32:25 GMT
http://hdl.handle.net/2117/85273
20160406T09:32:25Z
Ferrer Cancho, Ramon
Gómez Rodríguez, Carlos
A commentary on the article “Largescale evidence of dependency length
minimization in 37 languages” by Futrell, Mahowald & Gibson (PNAS 2015 112 (33) 1033610341).

GeoSRS: a hybrid social recommender system for geolocated data
http://hdl.handle.net/2117/84070
GeoSRS: a hybrid social recommender system for geolocated data
Capdevila Pujol, Joan; Arias Vicente, Marta; Arratia Quesada, Argimiro Alejandro
All right sreserved. We present GeoSRS, a hybrid recommender system for a popular locationbased social network (LBSN), in which users are able to write short reviews on the places of interest they visit. Using stateoftheart text mining techniques, our system recommends locations to users using as source the whole set of text reviews in addition to their geographical location. To evaluate our system, we have collected our own data sets by crawling the social network Foursquare. To do this efficiently, we propose the use of a parallel version of the Quadtree technique, which may be applicable to crawling/exploring other spatially distributed sources. Finally, we study the performance of GeoSRS on our collected data set and conclude that by combining sentiment analysis and text modeling, GeoSRS generates more accurate recommendations. The performance of the system improves as more reviews are available, which further motivates the use of largescale crawling techniques such as the Quadtree.
Wed, 09 Mar 2016 14:36:56 GMT
http://hdl.handle.net/2117/84070
20160309T14:36:56Z
Capdevila Pujol, Joan
Arias Vicente, Marta
Arratia Quesada, Argimiro Alejandro
All right sreserved. We present GeoSRS, a hybrid recommender system for a popular locationbased social network (LBSN), in which users are able to write short reviews on the places of interest they visit. Using stateoftheart text mining techniques, our system recommends locations to users using as source the whole set of text reviews in addition to their geographical location. To evaluate our system, we have collected our own data sets by crawling the social network Foursquare. To do this efficiently, we propose the use of a parallel version of the Quadtree technique, which may be applicable to crawling/exploring other spatially distributed sources. Finally, we study the performance of GeoSRS on our collected data set and conclude that by combining sentiment analysis and text modeling, GeoSRS generates more accurate recommendations. The performance of the system improves as more reviews are available, which further motivates the use of largescale crawling techniques such as the Quadtree.

An efficient closed frequent itemset miner for the MOA stream mining system
http://hdl.handle.net/2117/82987
An efficient closed frequent itemset miner for the MOA stream mining system
Quadrana, Massimo; Bifet Figuerol, Albert Carles; Gavaldà Mestre, Ricard
Mining itemsets is a central task in data mining, both in the batch and the streaming paradigms. While robust, efficient, and welltested implementations exist for batch mining, hardly any publicly available equivalent exists for the streaming scenario. The lack of an efficient, usable tool for the task hinders its use by practitioners and makes it difficult to assess new research in the area. To alleviate this situation, we review the algorithms described in the literature, and implement and evaluate the IncMine algorithm by Cheng, Ke and Ng [J. Intell. Inf. Syst. 31(3) (2008), 191–215] for mining frequent closed itemsets from data streams. Our implementation works on top of the MOA (Massive Online Analysis) stream mining framework to ease its use and integration with other stream mining tasks. We provide a PACstyle rigorous analysis of the quality of the output of IncMine as a function of its parameters; this type of analysis is rare in pattern mining algorithms. As a byproduct, the analysis shows how one of the userprovided parameters in the original description can be removed entirely while retaining the performance guarantees. Finally, we experimentally confirm both on synthetic and real data the excellent performance of the algorithm, as reported in the original paper, and its ability to handle concept drift.
Tue, 16 Feb 2016 08:02:25 GMT
http://hdl.handle.net/2117/82987
20160216T08:02:25Z
Quadrana, Massimo
Bifet Figuerol, Albert Carles
Gavaldà Mestre, Ricard
Mining itemsets is a central task in data mining, both in the batch and the streaming paradigms. While robust, efficient, and welltested implementations exist for batch mining, hardly any publicly available equivalent exists for the streaming scenario. The lack of an efficient, usable tool for the task hinders its use by practitioners and makes it difficult to assess new research in the area. To alleviate this situation, we review the algorithms described in the literature, and implement and evaluate the IncMine algorithm by Cheng, Ke and Ng [J. Intell. Inf. Syst. 31(3) (2008), 191–215] for mining frequent closed itemsets from data streams. Our implementation works on top of the MOA (Massive Online Analysis) stream mining framework to ease its use and integration with other stream mining tasks. We provide a PACstyle rigorous analysis of the quality of the output of IncMine as a function of its parameters; this type of analysis is rare in pattern mining algorithms. As a byproduct, the analysis shows how one of the userprovided parameters in the original description can be removed entirely while retaining the performance guarantees. Finally, we experimentally confirm both on synthetic and real data the excellent performance of the algorithm, as reported in the original paper, and its ability to handle concept drift.

Structural ambiguity in Montague Grammar and categorial grammar
http://hdl.handle.net/2117/82703
Structural ambiguity in Montague Grammar and categorial grammar
Morrill, Glyn
We give a type logical categorial grammar for the syntax and semantics of Montague's seminal fragment, which includes ambiguities of quantification and intensionality and their interactions, and we present the analyses assigned by a parser/theorem prover CatLog to the examples in the first half of Chapter 7 of the classic text Introduction to Montague Semantics of Dowty, Wall and Peters (1981).
Tue, 09 Feb 2016 10:42:01 GMT
http://hdl.handle.net/2117/82703
20160209T10:42:01Z
Morrill, Glyn
We give a type logical categorial grammar for the syntax and semantics of Montague's seminal fragment, which includes ambiguities of quantification and intensionality and their interactions, and we present the analyses assigned by a parser/theorem prover CatLog to the examples in the first half of Chapter 7 of the classic text Introduction to Montague Semantics of Dowty, Wall and Peters (1981).

Absolutetype shaft encoding using LFSR sequences with a prescribed length
http://hdl.handle.net/2117/79981
Absolutetype shaft encoding using LFSR sequences with a prescribed length
Fuertes Armengol, José Mª; Balle Pigem, Borja de; Ventura Capell, Enric
Maximallength binary sequences have existed for a long time. They have many interesting properties, and one of them is that, when taken in blocks of n consecutive positions, they form 2n  1 different codes in a closed circular sequence. This property can be used to measure absolute angular positions as the circle can be divided into as many parts as different codes can be retrieved. This paper describes how a closed binary sequence with an arbitrary length can be effectively designed with the minimal possible block length using linear feedback shift registers. Such sequences can be used to measure a specified exact number of angular positions using the minimal possible number of sensors that linear methods allow.
Thu, 26 Nov 2015 17:41:12 GMT
http://hdl.handle.net/2117/79981
20151126T17:41:12Z
Fuertes Armengol, José Mª
Balle Pigem, Borja de
Ventura Capell, Enric
Maximallength binary sequences have existed for a long time. They have many interesting properties, and one of them is that, when taken in blocks of n consecutive positions, they form 2n  1 different codes in a closed circular sequence. This property can be used to measure absolute angular positions as the circle can be divided into as many parts as different codes can be retrieved. This paper describes how a closed binary sequence with an arbitrary length can be effectively designed with the minimal possible block length using linear feedback shift registers. Such sequences can be used to measure a specified exact number of angular positions using the minimal possible number of sensors that linear methods allow.

A multiscale smoothing kernel for measuring timeseries similarity
http://hdl.handle.net/2117/78645
A multiscale smoothing kernel for measuring timeseries similarity
Troncoso, Alicia; Arias Vicente, Marta; Riquelme Santos, José Cristóbal
In this paper a kernel for timeseries data is introduced so that it can be used for any data mining task that relies on a similarity or distance metric. The main idea of our kernel is that it should recognize as highly similar timeseries that are essentially the same but may be slightly perturbed from each other: for example, if one series is shifted with respect to the other or if it slightly misaligned. Namely, our kernel tries to focus on the shape of the timeseries and ignores small perturbations such as misalignments or shifts. First, a recursive formulation of the kernel directly based on its definition is proposed. Then it is shown how to efficiently compute the kernel using an equivalent matrixbased formulation. To validate the proposed kernel three experiments have been carried out. As an initial step, several synthetic datasets have been generated from UCR timeseries repository and the KDD challenge of 2007 with the purpose of validating the kernelderived distance over shifted timeseries. Also, the kernel has been applied to the original UCR timeseries to analyze its potential in timeseries classification in conjunction with Support Vector Machines. Finally, two realworld applications related to ozone concentration in atmosphere and electricity demand have been considered.
Mon, 02 Nov 2015 14:11:48 GMT
http://hdl.handle.net/2117/78645
20151102T14:11:48Z
Troncoso, Alicia
Arias Vicente, Marta
Riquelme Santos, José Cristóbal
In this paper a kernel for timeseries data is introduced so that it can be used for any data mining task that relies on a similarity or distance metric. The main idea of our kernel is that it should recognize as highly similar timeseries that are essentially the same but may be slightly perturbed from each other: for example, if one series is shifted with respect to the other or if it slightly misaligned. Namely, our kernel tries to focus on the shape of the timeseries and ignores small perturbations such as misalignments or shifts. First, a recursive formulation of the kernel directly based on its definition is proposed. Then it is shown how to efficiently compute the kernel using an equivalent matrixbased formulation. To validate the proposed kernel three experiments have been carried out. As an initial step, several synthetic datasets have been generated from UCR timeseries repository and the KDD challenge of 2007 with the purpose of validating the kernelderived distance over shifted timeseries. Also, the kernel has been applied to the original UCR timeseries to analyze its potential in timeseries classification in conjunction with Support Vector Machines. Finally, two realworld applications related to ozone concentration in atmosphere and electricity demand have been considered.