Articles de revista
http://hdl.handle.net/2117/3487
20170225T18:18:58Z

Learning definite Horn formulas from closure queries
http://hdl.handle.net/2117/101185
Learning definite Horn formulas from closure queries
Arias Vicente, Marta; Balcázar Navarro, José Luis; Tîrnauca, Cristina
A definite Horn theory is a set of ndimensional Boolean vectors whose characteristic function is expressible as a definite Horn formula, that is, as conjunction of definite Horn clauses. The class of definite Horn theories is known to be learnable under different query learning settings, such as learning from membership and equivalence queries or learning from entailment. We propose yet a different type of query: the closure query. Closure queries are a natural extension of membership queries and also a variant, appropriate in the context of definite Horn formulas, of the socalled correction queries. We present an algorithm that learns conjunctions of definite Horn clauses in polynomial time, using closure and equivalence queries, and show how it relates to the canonical Guigues–Duquenne basis for implicational systems. We also show how the different query models mentioned relate to each other by either showing fullfledged reductions by means of query simulation (where possible), or by showing their connections in the context of particular algorithms that use them for learning definite Horn formulas.
20170217T12:58:08Z
Arias Vicente, Marta
Balcázar Navarro, José Luis
Tîrnauca, Cristina
A definite Horn theory is a set of ndimensional Boolean vectors whose characteristic function is expressible as a definite Horn formula, that is, as conjunction of definite Horn clauses. The class of definite Horn theories is known to be learnable under different query learning settings, such as learning from membership and equivalence queries or learning from entailment. We propose yet a different type of query: the closure query. Closure queries are a natural extension of membership queries and also a variant, appropriate in the context of definite Horn formulas, of the socalled correction queries. We present an algorithm that learns conjunctions of definite Horn clauses in polynomial time, using closure and equivalence queries, and show how it relates to the canonical Guigues–Duquenne basis for implicational systems. We also show how the different query models mentioned relate to each other by either showing fullfledged reductions by means of query simulation (where possible), or by showing their connections in the context of particular algorithms that use them for learning definite Horn formulas.

Compression and the origins of Zipf's law for word frequencies
http://hdl.handle.net/2117/100379
Compression and the origins of Zipf's law for word frequencies
Ferrer Cancho, Ramon
Here we sketch a new derivation of Zipf's law for word frequencies based on optimal coding. The structure of the derivation is reminiscent of Mandelbrot's random typing model but it has multiple advantages over random typing: (1) it starts from realistic cognitive pressures, (2) it does not require fine tuning of parameters, and (3) it sheds light on the origins of other statistical laws of language and thus can lead to a compact theory of linguistic laws. Our findings suggest that the recurrence of Zipf's law in human languages could originate from pressure for easy and fast communication.
20170131T12:44:17Z
Ferrer Cancho, Ramon
Here we sketch a new derivation of Zipf's law for word frequencies based on optimal coding. The structure of the derivation is reminiscent of Mandelbrot's random typing model but it has multiple advantages over random typing: (1) it starts from realistic cognitive pressures, (2) it does not require fine tuning of parameters, and (3) it sheds light on the origins of other statistical laws of language and thus can lead to a compact theory of linguistic laws. Our findings suggest that the recurrence of Zipf's law in human languages could originate from pressure for easy and fast communication.

Crossings as a side effect of dependency lengths
http://hdl.handle.net/2117/100375
Crossings as a side effect of dependency lengths
Ferrer Cancho, Ramon; Gómez Rodríguez, Carlos
The syntactic structure of sentences exhibits a striking regularity: dependencies tend to not cross when drawn above the sentence. We investigate two competing explanations. The traditional hypothesis is that this trend arises from an independent principle of syntax that reduces crossings practically to zero. An alternative to this view is the hypothesis that crossings are a side effect of dependency lengths, that is, sentences with shorter dependency lengths should tend to have fewer crossings. We are able to reject the traditional view in the majority of languages considered. The alternative hypothesis can lead to a more parsimonious theory of language.
20170131T12:01:36Z
Ferrer Cancho, Ramon
Gómez Rodríguez, Carlos
The syntactic structure of sentences exhibits a striking regularity: dependencies tend to not cross when drawn above the sentence. We investigate two competing explanations. The traditional hypothesis is that this trend arises from an independent principle of syntax that reduces crossings practically to zero. An alternative to this view is the hypothesis that crossings are a side effect of dependency lengths, that is, sentences with shorter dependency lengths should tend to have fewer crossings. We are able to reject the traditional view in the majority of languages considered. The alternative hypothesis can lead to a more parsimonious theory of language.

The infochemical core
http://hdl.handle.net/2117/100367
The infochemical core
Hernández Fernández, Antonio; Ferrer Cancho, Ramon
Vocalizations, and less often gestures, have been the object of linguistic research for decades. However, the development of a general theory of communication with human language as a particular case requires a clear understanding of the organization of communication through other means. Infochemicals are chemical compounds that carry information and are employed by small organisms that cannot emit acoustic signals of an optimal frequency to achieve successful communication. Here, we investigate the distribution of infochemicals across species when they are ranked by their degree or the number of species with which they are associated (because they produce them or are sensitive to them). We evaluate the quality of the fit of different functions to the dependency between degree and rank by means of a penalty for the number of parameters of the function. Surprisingly, a double Zipf (a Zipf distribution with two regimes, each with a different exponent) is the model yielding the best fit although it is the function with the largest number of parameters. This suggests that the worldwide repertoire of infochemicals contains a core which is shared by many species and is reminiscent of the core vocabularies found for human language in dictionaries or large corpora.
20170131T11:10:12Z
Hernández Fernández, Antonio
Ferrer Cancho, Ramon
Vocalizations, and less often gestures, have been the object of linguistic research for decades. However, the development of a general theory of communication with human language as a particular case requires a clear understanding of the organization of communication through other means. Infochemicals are chemical compounds that carry information and are employed by small organisms that cannot emit acoustic signals of an optimal frequency to achieve successful communication. Here, we investigate the distribution of infochemicals across species when they are ranked by their degree or the number of species with which they are associated (because they produce them or are sensitive to them). We evaluate the quality of the fit of different functions to the dependency between degree and rank by means of a penalty for the number of parameters of the function. Surprisingly, a double Zipf (a Zipf distribution with two regimes, each with a different exponent) is the model yielding the best fit although it is the function with the largest number of parameters. This suggests that the worldwide repertoire of infochemicals contains a core which is shared by many species and is reminiscent of the core vocabularies found for human language in dictionaries or large corpora.

A reply to Kubota and Levine on gapping
http://hdl.handle.net/2117/99487
A reply to Kubota and Levine on gapping
Valentín Fernández Gallart, José Oriol; Morrill, Glyn
In a series of papers Kubota and Levine give an account of gapping and determiner gapping in terms of hybrid type logical grammar, including anomalous scopal interactions with auxiliaries and negative quantifiers. We make three observations: i) under the counterpart assumptions that Kubota and Levine make, the existent displacement type logical grammar account of gapping already accounts for the scopal interactions, ii) Kubota and Levine overgenerate determinerverb order permutations in determiner gapping conjuncts whereas the immediate adaptation of their proposal to displacement type logical grammar does not do so, and iii) Kubota and Levine do not capture simplex gapping as a special case of complex gapping, but require distinct lexical entries for the two cases; we show how a generalisation of displacement type logical grammar allows both simplex and discontinuous gapping under a single type assignment
20170117T14:00:39Z
Valentín Fernández Gallart, José Oriol
Morrill, Glyn
In a series of papers Kubota and Levine give an account of gapping and determiner gapping in terms of hybrid type logical grammar, including anomalous scopal interactions with auxiliaries and negative quantifiers. We make three observations: i) under the counterpart assumptions that Kubota and Levine make, the existent displacement type logical grammar account of gapping already accounts for the scopal interactions, ii) Kubota and Levine overgenerate determinerverb order permutations in determiner gapping conjuncts whereas the immediate adaptation of their proposal to displacement type logical grammar does not do so, and iii) Kubota and Levine do not capture simplex gapping as a special case of complex gapping, but require distinct lexical entries for the two cases; we show how a generalisation of displacement type logical grammar allows both simplex and discontinuous gapping under a single type assignment

Rehabilitation profiles of older adult stroke survivors admitted to intermediate care units: A multicentre study
http://hdl.handle.net/2117/98769
Rehabilitation profiles of older adult stroke survivors admitted to intermediate care units: A multicentre study
Pérez, Laura M.; Inzitari, Marco; Quinn, Terence J.; Montaner, Joan; Gavaldà Mestre, Ricard; Duarte, Esther; Coll Planas, Laura; Cerdá, Mercé; Santaeugenia, Sebastia; Closa, Conxita; Gallofre, Miquel
Background:
Stroke is a major cause of disability in older adults, but the evidence around postacute treatment is limited and heterogeneous. We aimed to identify profiles of older adult stroke survivors admitted to intermediate care geriatric rehabilitation units.
Methods:
We performed a cohort study, enrolling stroke survivors aged 65 years or older, admitted to 9 intermediate care units in CataloniaSpain. To identify potential profiles, we included age, caregiver presence, comorbidity, prestroke and poststroke disability, cognitive impairment and stroke severity in a cluster analysis. We also proposed a practical decision tree for patient’s classification in clinical practice. We analyzed differences between profiles in functional improvement (Barthel index), relative functional gain (Montebello index), length of hospital stay (LOS), rehabilitation efficiency (functional improvement by LOS), and new institutionalization using multivariable regression models (for continuous and dichotomous outcomes).
Results:
Among 384 patients (79.1±7.9 years, 50.8% women), we identified 3 complexity profiles: a) Lower Complexity with Caregiver (LCC), b) Moderate Complexity without Caregiver (MCN), and c) Higher Complexity with Caregiver (HCC). The decision tree showed high agreement with cluster analysis (96.6%). Using either linear (continuous outcomes) or logistic regression, both LCC and MCN, compared to HCC, showed statistically significant higher chances of functional improvement (OR = 4.68, 95%CI = 2.54–8.63 and OR = 3.0, 95%CI = 1.52–5.87, respectively, for Barthel index improvement =20), relative functional gain (OR = 4.41, 95%CI = 1.81–10.75 and OR = 3.45, 95%CI = 1.31–9.04, respectively, for top Vs lower tertiles), and rehabilitation efficiency (OR = 7.88, 95%CI = 3.65–17.03 and OR = 3.87, 95%CI = 1.69–8.89, respectively, for top Vs lower tertiles). In relation to LOS, MCN cluster had lower chance of shorter LOS than LCC (OR = 0.41, 95%CI = 0.23–0.75) and HCC (OR = 0.37, 95%CI = 0.19–0.73), for LOS lower Vs higher tertiles.
Conclusion:
Our data suggest that poststroke rehabilitation profiles could be identified using routine assessment tools and showed differential recovery. If confirmed, these findings might help to develop tailored interventions to optimize recovery of older stroke patients.
20161222T15:35:29Z
Pérez, Laura M.
Inzitari, Marco
Quinn, Terence J.
Montaner, Joan
Gavaldà Mestre, Ricard
Duarte, Esther
Coll Planas, Laura
Cerdá, Mercé
Santaeugenia, Sebastia
Closa, Conxita
Gallofre, Miquel
Background:
Stroke is a major cause of disability in older adults, but the evidence around postacute treatment is limited and heterogeneous. We aimed to identify profiles of older adult stroke survivors admitted to intermediate care geriatric rehabilitation units.
Methods:
We performed a cohort study, enrolling stroke survivors aged 65 years or older, admitted to 9 intermediate care units in CataloniaSpain. To identify potential profiles, we included age, caregiver presence, comorbidity, prestroke and poststroke disability, cognitive impairment and stroke severity in a cluster analysis. We also proposed a practical decision tree for patient’s classification in clinical practice. We analyzed differences between profiles in functional improvement (Barthel index), relative functional gain (Montebello index), length of hospital stay (LOS), rehabilitation efficiency (functional improvement by LOS), and new institutionalization using multivariable regression models (for continuous and dichotomous outcomes).
Results:
Among 384 patients (79.1±7.9 years, 50.8% women), we identified 3 complexity profiles: a) Lower Complexity with Caregiver (LCC), b) Moderate Complexity without Caregiver (MCN), and c) Higher Complexity with Caregiver (HCC). The decision tree showed high agreement with cluster analysis (96.6%). Using either linear (continuous outcomes) or logistic regression, both LCC and MCN, compared to HCC, showed statistically significant higher chances of functional improvement (OR = 4.68, 95%CI = 2.54–8.63 and OR = 3.0, 95%CI = 1.52–5.87, respectively, for Barthel index improvement =20), relative functional gain (OR = 4.41, 95%CI = 1.81–10.75 and OR = 3.45, 95%CI = 1.31–9.04, respectively, for top Vs lower tertiles), and rehabilitation efficiency (OR = 7.88, 95%CI = 3.65–17.03 and OR = 3.87, 95%CI = 1.69–8.89, respectively, for top Vs lower tertiles). In relation to LOS, MCN cluster had lower chance of shorter LOS than LCC (OR = 0.41, 95%CI = 0.23–0.75) and HCC (OR = 0.37, 95%CI = 0.19–0.73), for LOS lower Vs higher tertiles.
Conclusion:
Our data suggest that poststroke rehabilitation profiles could be identified using routine assessment tools and showed differential recovery. If confirmed, these findings might help to develop tailored interventions to optimize recovery of older stroke patients.

The meaningfrequency law in Zipfian optimization models of communication
http://hdl.handle.net/2117/95871
The meaningfrequency law in Zipfian optimization models of communication
Ferrer Cancho, Ramon
According to Zipf’s meaningfrequency law, words that are more frequent tend to have more meanings. Here it is shown that a linear dependency between the frequency of a form and its number of meanings is found in a family of models of Zipf’s law for word frequencies. This is evidence for a weak version of the meaningfrequency law. Interestingly, that weak law (a) is not an inevitable of property of the assumptions of the family and (b) is found at least in the narrow regime
where those models exhibit Zipf’s law for word frequencies.
https://arxiv.org/abs/1409.7275
20161109T10:03:01Z
Ferrer Cancho, Ramon
According to Zipf’s meaningfrequency law, words that are more frequent tend to have more meanings. Here it is shown that a linear dependency between the frequency of a form and its number of meanings is found in a family of models of Zipf’s law for word frequencies. This is evidence for a weak version of the meaningfrequency law. Interestingly, that weak law (a) is not an inevitable of property of the assumptions of the family and (b) is found at least in the narrow regime
where those models exhibit Zipf’s law for word frequencies.

On graph combinatorics to improve eigenvectorbased measures of centrality in directed networks
http://hdl.handle.net/2117/90335
On graph combinatorics to improve eigenvectorbased measures of centrality in directed networks
Arratia Quesada, Argimiro Alejandro; Marijuan López, Carlos
We present a combinatorial study on the rearrangement of links in the structure of directed networks for the purpose of improving the valuation of a vertex or group of vertices as established by an eigenvectorbased centrality measure. We build our topological classification starting from unidirectional rooted trees and up to more complex hierarchical structures such as acyclic digraphs, bidirectional and cyclical rooted trees (obtained by closing cycles on unidirectional trees). We analyze different modifications on the structure of these networks and study their effect on the valuation given by the eigenvectorbased scoring functions, with particular focus on alphacentrality and PageRank.
© 2016. This version is made available under the CCBYNCND 4.0 license http://creativecommons.org/licenses/byncnd/4.0/
20160929T14:12:42Z
Arratia Quesada, Argimiro Alejandro
Marijuan López, Carlos
We present a combinatorial study on the rearrangement of links in the structure of directed networks for the purpose of improving the valuation of a vertex or group of vertices as established by an eigenvectorbased centrality measure. We build our topological classification starting from unidirectional rooted trees and up to more complex hierarchical structures such as acyclic digraphs, bidirectional and cyclical rooted trees (obtained by closing cycles on unidirectional trees). We analyze different modifications on the structure of these networks and study their effect on the valuation given by the eigenvectorbased scoring functions, with particular focus on alphacentrality and PageRank.

Gelada vocal sequences follow Menzerath's linguistic law
http://hdl.handle.net/2117/89435
Gelada vocal sequences follow Menzerath's linguistic law
Gustison, Morgan; Semple, Stuart; Ferrer Cancho, Ramon; Bergman, Thore
Identifying universal principles underpinning diverse natural systems is a key goal of the life sciences. A powerful approach in addressing this goal has been to test whether patterns consistent with linguistic laws are found in nonhuman animals. Menzerath's law is a linguistic law that states that, the larger the construct, the smaller the size of its constituents. Here, to our knowledge, we present the first evidence that Menzerath's law holds in the vocal communication of a nonhuman species. We show that, in vocal sequences of wild male geladas (Theropithecus gelada), construct size (sequence size in number of calls) is negatively correlated with constituent size (duration of calls). Call duration does not vary significantly with position in the sequence, but call sequence composition does change with sequence size and most call types are abbreviated in larger sequences. We also find that intercall intervals follow the same relationship with sequence size as do calls. Finally, we provide formal mathematical support for the idea that Menzerath's law reflects compressionthe principle of minimizing the expected length of a code. Our findings suggest that a common principle underpins human and gelada vocal communication, highlighting the value of exploring the applicability of linguistic laws in vocal systems outside the realm of language.
20160901T06:54:58Z
Gustison, Morgan
Semple, Stuart
Ferrer Cancho, Ramon
Bergman, Thore
Identifying universal principles underpinning diverse natural systems is a key goal of the life sciences. A powerful approach in addressing this goal has been to test whether patterns consistent with linguistic laws are found in nonhuman animals. Menzerath's law is a linguistic law that states that, the larger the construct, the smaller the size of its constituents. Here, to our knowledge, we present the first evidence that Menzerath's law holds in the vocal communication of a nonhuman species. We show that, in vocal sequences of wild male geladas (Theropithecus gelada), construct size (sequence size in number of calls) is negatively correlated with constituent size (duration of calls). Call duration does not vary significantly with position in the sequence, but call sequence composition does change with sequence size and most call types are abbreviated in larger sequences. We also find that intercall intervals follow the same relationship with sequence size as do calls. Finally, we provide formal mathematical support for the idea that Menzerath's law reflects compressionthe principle of minimizing the expected length of a code. Our findings suggest that a common principle underpins human and gelada vocal communication, highlighting the value of exploring the applicability of linguistic laws in vocal systems outside the realm of language.

The scaling of the minimum sum of edge lengths in uniformly random trees
http://hdl.handle.net/2117/88535
The scaling of the minimum sum of edge lengths in uniformly random trees
Esteban Ángeles, Juan Luis; Ferrer Cancho, Ramon; Gómez Rodríguez, Carlos
The minimum linear arrangement problem on a network consists of finding the minimum sum of edge lengths that can be achieved when the vertices are arranged linearly. Although there are algorithms to solve this problem on trees in polynomial time, they have remained theoretical and have not been implemented in practical contexts to our knowledge. Here we use one of those algorithms to investigate the growth of this sum as a function of the size of the tree in uniformly random trees. We show that this sum is bounded above by its value in a star tree. We also show that the mean edge length grows logarithmically in optimal linear arrangements, in stark contrast to the linear growth that is expected on optimal arrangements of star trees or on random linear arrangements.
20160706T09:32:34Z
Esteban Ángeles, Juan Luis
Ferrer Cancho, Ramon
Gómez Rodríguez, Carlos
The minimum linear arrangement problem on a network consists of finding the minimum sum of edge lengths that can be achieved when the vertices are arranged linearly. Although there are algorithms to solve this problem on trees in polynomial time, they have remained theoretical and have not been implemented in practical contexts to our knowledge. Here we use one of those algorithms to investigate the growth of this sum as a function of the size of the tree in uniformly random trees. We show that this sum is bounded above by its value in a star tree. We also show that the mean edge length grows logarithmically in optimal linear arrangements, in stark contrast to the linear growth that is expected on optimal arrangements of star trees or on random linear arrangements.