LARCA  Laboratori d'Algorísmia Relacional, Complexitat i Aprenentatge
http://hdl.handle.net/2117/3486
20170221T16:58:49Z

Learning definite Horn formulas from closure queries
http://hdl.handle.net/2117/101185
Learning definite Horn formulas from closure queries
Arias Vicente, Marta; Balcázar Navarro, José Luis; Tîrnauca, Cristina
A definite Horn theory is a set of ndimensional Boolean vectors whose characteristic function is expressible as a definite Horn formula, that is, as conjunction of definite Horn clauses. The class of definite Horn theories is known to be learnable under different query learning settings, such as learning from membership and equivalence queries or learning from entailment. We propose yet a different type of query: the closure query. Closure queries are a natural extension of membership queries and also a variant, appropriate in the context of definite Horn formulas, of the socalled correction queries. We present an algorithm that learns conjunctions of definite Horn clauses in polynomial time, using closure and equivalence queries, and show how it relates to the canonical Guigues–Duquenne basis for implicational systems. We also show how the different query models mentioned relate to each other by either showing fullfledged reductions by means of query simulation (where possible), or by showing their connections in the context of particular algorithms that use them for learning definite Horn formulas.
20170217T12:58:08Z
Arias Vicente, Marta
Balcázar Navarro, José Luis
Tîrnauca, Cristina
A definite Horn theory is a set of ndimensional Boolean vectors whose characteristic function is expressible as a definite Horn formula, that is, as conjunction of definite Horn clauses. The class of definite Horn theories is known to be learnable under different query learning settings, such as learning from membership and equivalence queries or learning from entailment. We propose yet a different type of query: the closure query. Closure queries are a natural extension of membership queries and also a variant, appropriate in the context of definite Horn formulas, of the socalled correction queries. We present an algorithm that learns conjunctions of definite Horn clauses in polynomial time, using closure and equivalence queries, and show how it relates to the canonical Guigues–Duquenne basis for implicational systems. We also show how the different query models mentioned relate to each other by either showing fullfledged reductions by means of query simulation (where possible), or by showing their connections in the context of particular algorithms that use them for learning definite Horn formulas.

Generalising discontinuity
http://hdl.handle.net/2117/100910
Generalising discontinuity
Morrill, Glyn; Merenciano Saladrigas, Josep Maria
This paper makes two generalisations of categorial calculus of discontinuity. In the first we introduce unary modalities which mediate between continuous and discontinuous strings. In the second each of the modes of adjunction of the proposal to date, concatenation, juxtaposition and interpolation, are augmented with variants. Linguistic illustration and motivation is provided, and we show how adherence to a discipline of sorting renders the generalisations tractable within a particularly efficient logic programming paradigm.
20170213T11:51:52Z
Morrill, Glyn
Merenciano Saladrigas, Josep Maria
This paper makes two generalisations of categorial calculus of discontinuity. In the first we introduce unary modalities which mediate between continuous and discontinuous strings. In the second each of the modes of adjunction of the proposal to date, concatenation, juxtaposition and interpolation, are augmented with variants. Linguistic illustration and motivation is provided, and we show how adherence to a discipline of sorting renders the generalisations tractable within a particularly efficient logic programming paradigm.

Computational coverage of type logical grammar: The Montague test
http://hdl.handle.net/2117/100544
Computational coverage of type logical grammar: The Montague test
Morrill, Glyn; Valentín Fernández Gallart, José Oriol
It is nearly half a century since Montague made his contributions to
the field of logical semantics. In this time, computational linguistics has taken an almost entirely statistical turn and mainstream linguistics has adopted an almost entirely nonformal methodology. But in a minority approach reaching back before the linguistic revolution, and to the origins of computing, type logical grammar (TLG) has continued championing the flags of symbolic computation and logical rigor in discrete grammar. In this paper, we aim to concretise a measure
of progress for computational grammar in the form of the Montague Test. This is the challenge of providing a computational cover grammar of the Montague fragment. We formulate this Montague Test and show how the challenge is met by the type logical parser/theoremprover CatLog2.
20170203T11:52:28Z
Morrill, Glyn
Valentín Fernández Gallart, José Oriol
It is nearly half a century since Montague made his contributions to
the field of logical semantics. In this time, computational linguistics has taken an almost entirely statistical turn and mainstream linguistics has adopted an almost entirely nonformal methodology. But in a minority approach reaching back before the linguistic revolution, and to the origins of computing, type logical grammar (TLG) has continued championing the flags of symbolic computation and logical rigor in discrete grammar. In this paper, we aim to concretise a measure
of progress for computational grammar in the form of the Montague Test. This is the challenge of providing a computational cover grammar of the Montague fragment. We formulate this Montague Test and show how the challenge is met by the type logical parser/theoremprover CatLog2.

Compression and the origins of Zipf's law for word frequencies
http://hdl.handle.net/2117/100379
Compression and the origins of Zipf's law for word frequencies
Ferrer Cancho, Ramon
Here we sketch a new derivation of Zipf's law for word frequencies based on optimal coding. The structure of the derivation is reminiscent of Mandelbrot's random typing model but it has multiple advantages over random typing: (1) it starts from realistic cognitive pressures, (2) it does not require fine tuning of parameters, and (3) it sheds light on the origins of other statistical laws of language and thus can lead to a compact theory of linguistic laws. Our findings suggest that the recurrence of Zipf's law in human languages could originate from pressure for easy and fast communication.
20170131T12:44:17Z
Ferrer Cancho, Ramon
Here we sketch a new derivation of Zipf's law for word frequencies based on optimal coding. The structure of the derivation is reminiscent of Mandelbrot's random typing model but it has multiple advantages over random typing: (1) it starts from realistic cognitive pressures, (2) it does not require fine tuning of parameters, and (3) it sheds light on the origins of other statistical laws of language and thus can lead to a compact theory of linguistic laws. Our findings suggest that the recurrence of Zipf's law in human languages could originate from pressure for easy and fast communication.

Crossings as a side effect of dependency lengths
http://hdl.handle.net/2117/100375
Crossings as a side effect of dependency lengths
Ferrer Cancho, Ramon; Gómez Rodríguez, Carlos
The syntactic structure of sentences exhibits a striking regularity: dependencies tend to not cross when drawn above the sentence. We investigate two competing explanations. The traditional hypothesis is that this trend arises from an independent principle of syntax that reduces crossings practically to zero. An alternative to this view is the hypothesis that crossings are a side effect of dependency lengths, that is, sentences with shorter dependency lengths should tend to have fewer crossings. We are able to reject the traditional view in the majority of languages considered. The alternative hypothesis can lead to a more parsimonious theory of language.
20170131T12:01:36Z
Ferrer Cancho, Ramon
Gómez Rodríguez, Carlos
The syntactic structure of sentences exhibits a striking regularity: dependencies tend to not cross when drawn above the sentence. We investigate two competing explanations. The traditional hypothesis is that this trend arises from an independent principle of syntax that reduces crossings practically to zero. An alternative to this view is the hypothesis that crossings are a side effect of dependency lengths, that is, sentences with shorter dependency lengths should tend to have fewer crossings. We are able to reject the traditional view in the majority of languages considered. The alternative hypothesis can lead to a more parsimonious theory of language.

The infochemical core
http://hdl.handle.net/2117/100367
The infochemical core
Hernández Fernández, Antonio; Ferrer Cancho, Ramon
Vocalizations, and less often gestures, have been the object of linguistic research for decades. However, the development of a general theory of communication with human language as a particular case requires a clear understanding of the organization of communication through other means. Infochemicals are chemical compounds that carry information and are employed by small organisms that cannot emit acoustic signals of an optimal frequency to achieve successful communication. Here, we investigate the distribution of infochemicals across species when they are ranked by their degree or the number of species with which they are associated (because they produce them or are sensitive to them). We evaluate the quality of the fit of different functions to the dependency between degree and rank by means of a penalty for the number of parameters of the function. Surprisingly, a double Zipf (a Zipf distribution with two regimes, each with a different exponent) is the model yielding the best fit although it is the function with the largest number of parameters. This suggests that the worldwide repertoire of infochemicals contains a core which is shared by many species and is reminiscent of the core vocabularies found for human language in dictionaries or large corpora.
20170131T11:10:12Z
Hernández Fernández, Antonio
Ferrer Cancho, Ramon
Vocalizations, and less often gestures, have been the object of linguistic research for decades. However, the development of a general theory of communication with human language as a particular case requires a clear understanding of the organization of communication through other means. Infochemicals are chemical compounds that carry information and are employed by small organisms that cannot emit acoustic signals of an optimal frequency to achieve successful communication. Here, we investigate the distribution of infochemicals across species when they are ranked by their degree or the number of species with which they are associated (because they produce them or are sensitive to them). We evaluate the quality of the fit of different functions to the dependency between degree and rank by means of a penalty for the number of parameters of the function. Surprisingly, a double Zipf (a Zipf distribution with two regimes, each with a different exponent) is the model yielding the best fit although it is the function with the largest number of parameters. This suggests that the worldwide repertoire of infochemicals contains a core which is shared by many species and is reminiscent of the core vocabularies found for human language in dictionaries or large corpora.

Learning probability distributions generated by finitestate machines
http://hdl.handle.net/2117/100347
Learning probability distributions generated by finitestate machines
Castro Rabal, Jorge; Gavaldà Mestre, Ricard
We review methods for inference of probability distributions generated by probabilistic automata and related models for sequence generation. We focus on methods that can be proved to learn in the inference
in the limit and PAC formal models. The methods we review are state merging and state splitting methods for probabilistic deterministic automata and the recently developed spectral method for nondeterministic probabilistic automata. In both cases, we derive them from a highlevel algorithm described in terms of the Hankel matrix of the distribution to be learned, given as an oracle, and then describe how to adapt that algorithm to account for the error introduced by a finite sample.
20170131T09:07:39Z
Castro Rabal, Jorge
Gavaldà Mestre, Ricard
We review methods for inference of probability distributions generated by probabilistic automata and related models for sequence generation. We focus on methods that can be proved to learn in the inference
in the limit and PAC formal models. The methods we review are state merging and state splitting methods for probabilistic deterministic automata and the recently developed spectral method for nondeterministic probabilistic automata. In both cases, we derive them from a highlevel algorithm described in terms of the Hankel matrix of the distribution to be learned, given as an oracle, and then describe how to adapt that algorithm to account for the error introduced by a finite sample.

Fast calculation of entropy with Zhang's estimator
http://hdl.handle.net/2117/100157
Fast calculation of entropy with Zhang's estimator
Lozano Bojados, Antoni; Casas Fernández, Bernardino; Bentz, Chris; Ferrer Cancho, Ramon
Entropy is a fundamental property of a repertoire. Here, we present an efficient algorithm to estimate the entropy of types with the help of Zhang’s estimator. The algorithm takes advantage of the fact that the number of different frequencies in a text is in general much smaller than the number of types. We justify the convenience of the algorithm by means of an analysis of the statistical properties of texts from more than 1000 languages. Our work opens up various possibilities for future research.
20170127T08:06:04Z
Lozano Bojados, Antoni
Casas Fernández, Bernardino
Bentz, Chris
Ferrer Cancho, Ramon
Entropy is a fundamental property of a repertoire. Here, we present an efficient algorithm to estimate the entropy of types with the help of Zhang’s estimator. The algorithm takes advantage of the fact that the number of different frequencies in a text is in general much smaller than the number of types. We justify the convenience of the algorithm by means of an analysis of the statistical properties of texts from more than 1000 languages. Our work opens up various possibilities for future research.

Semblant cerca semblant?
http://hdl.handle.net/2117/100086
Semblant cerca semblant?
Sanou Gozalo, Eduard; Arias Vicente, Marta; Ferrer Cancho, Ramon; Hernández Fernández, Antonio
En una assignatura del grau d'enginyeria d'informàtica, la pràctica de programació ha passat de ser un treball individual a un treball en equip, en principi per parelles. L'alumnat té llibertat total per formar equips amb una intervenció mínima per part del professorat.
L'anàlisi de les parelles formades indica que no hi ha una tendència dels alumnes a associarse amb alumnes de rendiment semblant, potser perquè paràmetres cognitius generals no regeixen la tria de parella acadèmica.
In a course of the degree of computer science, the programming project has changed from individual to teamed work, tentatively in couples (pair programming). Students have full freedom to team up with minimum intervention from professors.
The analysis of the couples made indicates that students do not tend associate with students with a similar academic performance, maybe because general cognitive parameters do not govern the choice of academic partners.
20170126T07:35:57Z
Sanou Gozalo, Eduard
Arias Vicente, Marta
Ferrer Cancho, Ramon
Hernández Fernández, Antonio
En una assignatura del grau d'enginyeria d'informàtica, la pràctica de programació ha passat de ser un treball individual a un treball en equip, en principi per parelles. L'alumnat té llibertat total per formar equips amb una intervenció mínima per part del professorat.
L'anàlisi de les parelles formades indica que no hi ha una tendència dels alumnes a associarse amb alumnes de rendiment semblant, potser perquè paràmetres cognitius generals no regeixen la tria de parella acadèmica.
In a course of the degree of computer science, the programming project has changed from individual to teamed work, tentatively in couples (pair programming). Students have full freedom to team up with minimum intervention from professors.
The analysis of the couples made indicates that students do not tend associate with students with a similar academic performance, maybe because general cognitive parameters do not govern the choice of academic partners.

Contributions to the formalization of orderlike dependencies using FCA
http://hdl.handle.net/2117/100076
Contributions to the formalization of orderlike dependencies using FCA
Codocedo, Victor; Baixeries i Juvillà, Jaume; Kaytoue, Mehdi; Napoli, Amedeo
Functional Dependencies (FDs) play a key role in many fields of the relational database model, one of
the most widely used database systems. FDs have also been applied in data analysis, data quality, knowledge
discovery and the like, but in a very limited scope, because of their fixed semantics. To overcome this limitation,
many generalizations have been defined to relax the crisp definition of FDs. FDs and a few of their generalizations
have been characterized with Formal Concept Analysis which reveals itself to be an interesting unified framework for
characterizing dependencies, that is, understanding and computing them in a formal way. In this paper, we extend this
work by taking into account orderlike dependencies. Such dependencies, well defined in the database field, consider
an ordering on the domain of each attribute, and not simply an equality relation as with standard FDs. © 2016, CEURWS.
All rights reserved.
20170125T18:39:50Z
Codocedo, Victor
Baixeries i Juvillà, Jaume
Kaytoue, Mehdi
Napoli, Amedeo
Functional Dependencies (FDs) play a key role in many fields of the relational database model, one of
the most widely used database systems. FDs have also been applied in data analysis, data quality, knowledge
discovery and the like, but in a very limited scope, because of their fixed semantics. To overcome this limitation,
many generalizations have been defined to relax the crisp definition of FDs. FDs and a few of their generalizations
have been characterized with Formal Concept Analysis which reveals itself to be an interesting unified framework for
characterizing dependencies, that is, understanding and computing them in a formal way. In this paper, we extend this
work by taking into account orderlike dependencies. Such dependencies, well defined in the database field, consider
an ordering on the domain of each attribute, and not simply an equality relation as with standard FDs. © 2016, CEURWS.
All rights reserved.