LARCA  Laboratori d'Algorísmia Relacional, Complexitat i Aprenentatge
http://hdl.handle.net/2117/3486
Mon, 24 Jul 2017 08:02:11 GMT
20170724T08:02:11Z
LARCA  Laboratori d'Algorísmia Relacional, Complexitat i Aprenentatge
http://upcommons.upc.edu/bitstream/id/906643/
http://hdl.handle.net/2117/3486

The entropy of wordslearnability and expressivity across more than 1000 languages
http://hdl.handle.net/2117/106703
The entropy of wordslearnability and expressivity across more than 1000 languages
Bentz, Chris; Alikaniotis, Dimitrios; Cysouw, Michael; Ferrer Cancho, Ramon
The choice associated with words is a fundamental property of natural languages. It lies at the heart of quantitative linguistics, computational linguistics and language sciences more generally. Information theory gives us tools at hand to measure precisely the average amount of choice associated with words: the word entropy. Here, we use three parallel corpora, encompassing ca. 450 million words in 1916 texts and 1259 languages, to tackle some of the major conceptual and practical problems of word entropy estimation: dependence on text size, register, style and estimation method, as well as nonindependence of words in cotext. We present two main findings: Firstly, word entropies display relatively narrow, unimodal distributions. There is no language in our sample with a unigram entropy of less than six bits/word. We argue that this is in line with informationtheoretic models of communication. Languages are held in a narrow range by two fundamental pressures: word learnability and word expressivity, with a potential bias towards expressivity. Secondly, there is a strong linear relationship between unigram entropies and entropy rates. The entropy difference between words with and without cotextual information is narrowly distributed around ca. three bits/word. In other words, knowing the preceding text reduces the uncertainty of words by roughly the same amount across languages of the world.
Fri, 21 Jul 2017 10:48:54 GMT
http://hdl.handle.net/2117/106703
20170721T10:48:54Z
Bentz, Chris
Alikaniotis, Dimitrios
Cysouw, Michael
Ferrer Cancho, Ramon
The choice associated with words is a fundamental property of natural languages. It lies at the heart of quantitative linguistics, computational linguistics and language sciences more generally. Information theory gives us tools at hand to measure precisely the average amount of choice associated with words: the word entropy. Here, we use three parallel corpora, encompassing ca. 450 million words in 1916 texts and 1259 languages, to tackle some of the major conceptual and practical problems of word entropy estimation: dependence on text size, register, style and estimation method, as well as nonindependence of words in cotext. We present two main findings: Firstly, word entropies display relatively narrow, unimodal distributions. There is no language in our sample with a unigram entropy of less than six bits/word. We argue that this is in line with informationtheoretic models of communication. Languages are held in a narrow range by two fundamental pressures: word learnability and word expressivity, with a potential bias towards expressivity. Secondly, there is a strong linear relationship between unigram entropies and entropy rates. The entropy difference between words with and without cotextual information is narrowly distributed around ca. three bits/word. In other words, knowing the preceding text reduces the uncertainty of words by roughly the same amount across languages of the world.

Disclosure day on relativity: a science activity beyond the classroom
http://hdl.handle.net/2117/106559
Disclosure day on relativity: a science activity beyond the classroom
Aragoneses, Andrés; Salán Ballesteros, Maria Núria; Hernández Fernández, Antonio
An important goal for students in engineering education is the ability to present and defend a project in front of a technical audience. We have designed an activity for helping students to work the independent learning and communication skills, while they are introduced in the dynamics of a conference. In this activity, students prepare and present a poster at a popular physics conference on relativity. This activity is shown to provide them with communication skills, related to generic skills at the core of Universitat Politècnica de Catalunya (UPC) degrees, and which are relevant in most of the duties of an engineer.
Tue, 18 Jul 2017 08:45:53 GMT
http://hdl.handle.net/2117/106559
20170718T08:45:53Z
Aragoneses, Andrés
Salán Ballesteros, Maria Núria
Hernández Fernández, Antonio
An important goal for students in engineering education is the ability to present and defend a project in front of a technical audience. We have designed an activity for helping students to work the independent learning and communication skills, while they are introduced in the dynamics of a conference. In this activity, students prepare and present a poster at a popular physics conference on relativity. This activity is shown to provide them with communication skills, related to generic skills at the core of Universitat Politècnica de Catalunya (UPC) degrees, and which are relevant in most of the duties of an engineer.

Random crossings in dependency trees
http://hdl.handle.net/2117/106079
Random crossings in dependency trees
Ferrer Cancho, Ramon
It has been hypothesized that the rather small number of crossings in real syntactic dependency trees is a sideeffect of pressure for dependency length minimization. Here we answer a
related important research question: what would be the expected number of crossings if the natural order of a sentence was lost and replaced by a random ordering? We show that this number depends only on the number of vertices of the dependency tree (the sentence length) and the second moment about zero of vertex degrees. The expected number of crossings is minimum for a star tree (crossings are impossible) and maximum for a linear tree (the number of crossings is of the order of the square of the sequence length).
Mon, 03 Jul 2017 08:13:04 GMT
http://hdl.handle.net/2117/106079
20170703T08:13:04Z
Ferrer Cancho, Ramon
It has been hypothesized that the rather small number of crossings in real syntactic dependency trees is a sideeffect of pressure for dependency length minimization. Here we answer a
related important research question: what would be the expected number of crossings if the natural order of a sentence was lost and replaced by a random ordering? We show that this number depends only on the number of vertices of the dependency tree (the sentence length) and the second moment about zero of vertex degrees. The expected number of crossings is minimum for a star tree (crossings are impossible) and maximum for a linear tree (the number of crossings is of the order of the square of the sequence length).

A correction on Shiloach's algorithm for minimum linear arrangement of trees
http://hdl.handle.net/2117/106035
A correction on Shiloach's algorithm for minimum linear arrangement of trees
Esteban Ángeles, Juan Luis; Ferrer Cancho, Ramon
More than 30 years ago, Shiloach published an algorithm to solve the minimum linear arrangement problem for undirected trees. Here we fix a small error in the original version of the algorithm and discuss its effect on subsequent literature. We also improve some aspects of the notation.
Fri, 30 Jun 2017 11:25:05 GMT
http://hdl.handle.net/2117/106035
20170630T11:25:05Z
Esteban Ángeles, Juan Luis
Ferrer Cancho, Ramon
More than 30 years ago, Shiloach published an algorithm to solve the minimum linear arrangement problem for undirected trees. Here we fix a small error in the original version of the algorithm and discuss its effect on subsequent literature. We also improve some aspects of the notation.

Using the MarshallOlkin extended Zipf distribution in graph generation
http://hdl.handle.net/2117/105744
Using the MarshallOlkin extended Zipf distribution in graph generation
Duarte López, Ariel; Prat Pérez, Arnau; Pérez Casany, Marta
Being able to generate large synthetic graphs resembling those found in the real world, is of high importance for the design of new graph algorithms and benchmarks. In this paper, we first compare several probability models in terms of goodnessoffit, when used to model the degree distribution of real graphs. Second, after confirming that the MOEZipf model is the one that gives better fits, we present a method to generate MOEZipf distributions. The method is shown to work well in practice when implemented in a scalable synthetic graph generator.
Fri, 23 Jun 2017 06:39:52 GMT
http://hdl.handle.net/2117/105744
20170623T06:39:52Z
Duarte López, Ariel
Prat Pérez, Arnau
Pérez Casany, Marta
Being able to generate large synthetic graphs resembling those found in the real world, is of high importance for the design of new graph algorithms and benchmarks. In this paper, we first compare several probability models in terms of goodnessoffit, when used to model the degree distribution of real graphs. Second, after confirming that the MOEZipf model is the one that gives better fits, we present a method to generate MOEZipf distributions. The method is shown to work well in practice when implemented in a scalable synthetic graph generator.

Machine learning assists the classification of reports by citizens on diseasecarrying mosquitoes
http://hdl.handle.net/2117/105694
Machine learning assists the classification of reports by citizens on diseasecarrying mosquitoes
Rodríguez García, Antonio; Bartumeus, Frederic; Gavaldà Mestre, Ricard
Mosquito Alert (www.mosquitoalert.com/en) is an expertvalidated citizen science platform for tracking and controlling diseasecarrying mosquitoes. Citizens download a free app and use their phones to send reports of presumed sightings of two worldwide disease vector
mosquito species (the Asian Tiger and the Yellow Fever mosquito). These reports are then supervised by a team of entomologists and, once validated, added to a database. As the platform prepares to scale to much larger geographical areas and user bases, the expert validation by entomologists becomes the main bottleneck. In this paper we describe the use of machine learning on the citizen reports to automatically validate a fraction of them, therefore allowing the entomologists either to deal with larger report streams or to concentrate on those that are more strategic, such as reports from new areas (so that early warning protocols are activated) or from areas with high epidemiological risks (so that control actions to reduce mosquito populations are activated). The current prototype flags a third of the reports as “almost certainly positive” with high confidence. It is currently being integrated into the main workflow of the Mosquito Alert platform.
Wed, 21 Jun 2017 09:38:34 GMT
http://hdl.handle.net/2117/105694
20170621T09:38:34Z
Rodríguez García, Antonio
Bartumeus, Frederic
Gavaldà Mestre, Ricard
Mosquito Alert (www.mosquitoalert.com/en) is an expertvalidated citizen science platform for tracking and controlling diseasecarrying mosquitoes. Citizens download a free app and use their phones to send reports of presumed sightings of two worldwide disease vector
mosquito species (the Asian Tiger and the Yellow Fever mosquito). These reports are then supervised by a team of entomologists and, once validated, added to a database. As the platform prepares to scale to much larger geographical areas and user bases, the expert validation by entomologists becomes the main bottleneck. In this paper we describe the use of machine learning on the citizen reports to automatically validate a fraction of them, therefore allowing the entomologists either to deal with larger report streams or to concentrate on those that are more strategic, such as reports from new areas (so that early warning protocols are activated) or from areas with high epidemiological risks (so that control actions to reduce mosquito populations are activated). The current prototype flags a third of the reports as “almost certainly positive” with high confidence. It is currently being integrated into the main workflow of the Mosquito Alert platform.

Grammar logicised: relativisation
http://hdl.handle.net/2117/105058
Grammar logicised: relativisation
Morrill, Glyn
Many variants of categorial grammar assume an underlying logic which is associative and linear. In relation to left extraction, the former property is challenged by island domains, which involve nonassociativity, and the latter property is challenged by parasitic gaps, which involve nonlinearity. We present a version of type logical grammar including ‘structural inhibition’ for nonassociativity and ‘structural facilitation’ for nonlinearity and we give an account of relativisation including islands and parasitic gaps and their interaction.
Wed, 31 May 2017 08:49:12 GMT
http://hdl.handle.net/2117/105058
20170531T08:49:12Z
Morrill, Glyn
Many variants of categorial grammar assume an underlying logic which is associative and linear. In relation to left extraction, the former property is challenged by island domains, which involve nonassociativity, and the latter property is challenged by parasitic gaps, which involve nonlinearity. We present a version of type logical grammar including ‘structural inhibition’ for nonassociativity and ‘structural facilitation’ for nonlinearity and we give an account of relativisation including islands and parasitic gaps and their interaction.

An alternative to CARMA models via iterations of Ornstein–Uhlenbeck processes
http://hdl.handle.net/2117/104495
An alternative to CARMA models via iterations of Ornstein–Uhlenbeck processes
Arratia Quesada, Argimiro Alejandro; Cabaña, Ana Alejandra; Cabaña Perez, Enrique
We present a new construction of continuous ARMA processes based on iterating an Ornstein–Uhlenbeck operator OUκ that maps a random variable y(t) onto OUκy(t)=∫t−∞e−κ(t−s)dy(s). This construction resembles the procedure to build an AR( p) from an AR(1) and derives in a parsimonious model for continuous autoregression, with fewer parameters to compute than the known CARMA obtained as a solution of a system of stochastic differential equations. We show properties of this operator, give state space representation of the iterated Ornstein–Uhlenbeck process and show how to estimate the parameters of the model.
Tue, 16 May 2017 10:42:41 GMT
http://hdl.handle.net/2117/104495
20170516T10:42:41Z
Arratia Quesada, Argimiro Alejandro
Cabaña, Ana Alejandra
Cabaña Perez, Enrique
We present a new construction of continuous ARMA processes based on iterating an Ornstein–Uhlenbeck operator OUκ that maps a random variable y(t) onto OUκy(t)=∫t−∞e−κ(t−s)dy(s). This construction resembles the procedure to build an AR( p) from an AR(1) and derives in a parsimonious model for continuous autoregression, with fewer parameters to compute than the known CARMA obtained as a solution of a system of stochastic differential equations. We show properties of this operator, give state space representation of the iterated Ornstein–Uhlenbeck process and show how to estimate the parameters of the model.

Positive isometric averaging operators on l2(Z,µ)
http://hdl.handle.net/2117/104392
Positive isometric averaging operators on l2(Z,µ)
Boza Rocho, Santiago; Soria de Diego, Javier
© 2016 Springer International Publishing We show that positive isometric averaging operators on the sequence space (Formula presented.) are determined by very subtle arithmetic conditions on (Formula presented.) (even for very simple examples), contrary to what happens in the continuous case (Formula presented.), where any possible average value is realized by a suitable positive isometry.
Mon, 15 May 2017 07:32:34 GMT
http://hdl.handle.net/2117/104392
20170515T07:32:34Z
Boza Rocho, Santiago
Soria de Diego, Javier
© 2016 Springer International Publishing We show that positive isometric averaging operators on the sequence space (Formula presented.) are determined by very subtle arithmetic conditions on (Formula presented.) (even for very simple examples), contrary to what happens in the continuous case (Formula presented.), where any possible average value is realized by a suitable positive isometry.

Nonuniform complexity classes specified by lower and upper bounds
http://hdl.handle.net/2117/104347
Nonuniform complexity classes specified by lower and upper bounds
Balcázar Navarro, José Luis; Gabarró Vallès, Joaquim
We characterize in terms of oracle Turing machines the classes defined by exponential lower bounds on some nonuniform complexity measures. After, we use the same methods to giue a new characterization of classes defined by polynomial and polylog upper bounds, obtaining an unified approach to deal with upper and lower bounds, The main measures are the initial index, the contextfree cosU ond the boolean circuits size. We interpret our results by discussing a trade off between oracle information and computed information for oracle Turing machines.; NOMS caractérisons en termes de machines de Turing avec oracles les classes définies par des bornes inférieures exponentielles pour des mesures de complexité non uniformes. Nous utilisons ensuite les mêmes méthodes pour donner une nouvelle caractérisation des classes définies par des bornes supérieures polynomiales et polylogarithmiques, obtenanrainsi une approche unifiée pour les bornes inférieures et supérieures. Les mesures principales sont F index initial, le coût grammatical et la taille des circuits booléens. Nous interprétons nos résultats en étudiant, pour les machines de Turing avec oracle, la relation entre l'information due à Voracle et l'information calculée par la machine.
Fri, 12 May 2017 08:15:24 GMT
http://hdl.handle.net/2117/104347
20170512T08:15:24Z
Balcázar Navarro, José Luis
Gabarró Vallès, Joaquim
We characterize in terms of oracle Turing machines the classes defined by exponential lower bounds on some nonuniform complexity measures. After, we use the same methods to giue a new characterization of classes defined by polynomial and polylog upper bounds, obtaining an unified approach to deal with upper and lower bounds, The main measures are the initial index, the contextfree cosU ond the boolean circuits size. We interpret our results by discussing a trade off between oracle information and computed information for oracle Turing machines.
NOMS caractérisons en termes de machines de Turing avec oracles les classes définies par des bornes inférieures exponentielles pour des mesures de complexité non uniformes. Nous utilisons ensuite les mêmes méthodes pour donner une nouvelle caractérisation des classes définies par des bornes supérieures polynomiales et polylogarithmiques, obtenanrainsi une approche unifiée pour les bornes inférieures et supérieures. Les mesures principales sont F index initial, le coût grammatical et la taille des circuits booléens. Nous interprétons nos résultats en étudiant, pour les machines de Turing avec oracle, la relation entre l'information due à Voracle et l'information calculée par la machine.