Articles de revista
http://hdl.handle.net/2117/3487
2016-02-07T15:15:44ZAbsolute-type shaft encoding using LFSR sequences with a prescribed length
http://hdl.handle.net/2117/79981
Absolute-type shaft encoding using LFSR sequences with a prescribed length
Fuertes Armengol, José Mª; Balle Pigem, Borja de; Ventura Capell, Enric
Maximal-length binary sequences have existed for a long time. They have many interesting properties, and one of them is that, when taken in blocks of n consecutive positions, they form 2n - 1 different codes in a closed circular sequence. This property can be used to measure absolute angular positions as the circle can be divided into as many parts as different codes can be retrieved. This paper describes how a closed binary sequence with an arbitrary length can be effectively designed with the minimal possible block length using linear feedback shift registers. Such sequences can be used to measure a specified exact number of angular positions using the minimal possible number of sensors that linear methods allow.
2015-11-26T17:41:12ZFuertes Armengol, José MªBalle Pigem, Borja deVentura Capell, EnricMaximal-length binary sequences have existed for a long time. They have many interesting properties, and one of them is that, when taken in blocks of n consecutive positions, they form 2n - 1 different codes in a closed circular sequence. This property can be used to measure absolute angular positions as the circle can be divided into as many parts as different codes can be retrieved. This paper describes how a closed binary sequence with an arbitrary length can be effectively designed with the minimal possible block length using linear feedback shift registers. Such sequences can be used to measure a specified exact number of angular positions using the minimal possible number of sensors that linear methods allow.A multi-scale smoothing kernel for measuring time-series similarity
http://hdl.handle.net/2117/78645
A multi-scale smoothing kernel for measuring time-series similarity
Troncoso, Alicia; Arias Vicente, Marta; Riquelme Santos, José Cristóbal
In this paper a kernel for time-series data is introduced so that it can be used for any data mining task that relies on a similarity or distance metric. The main idea of our kernel is that it should recognize as highly similar time-series that are essentially the same but may be slightly perturbed from each other: for example, if one series is shifted with respect to the other or if it slightly misaligned. Namely, our kernel tries to focus on the shape of the time-series and ignores small perturbations such as misalignments or shifts. First, a recursive formulation of the kernel directly based on its definition is proposed. Then it is shown how to efficiently compute the kernel using an equivalent matrix-based formulation. To validate the proposed kernel three experiments have been carried out. As an initial step, several synthetic datasets have been generated from UCR time-series repository and the KDD challenge of 2007 with the purpose of validating the kernel-derived distance over shifted time-series. Also, the kernel has been applied to the original UCR time-series to analyze its potential in time-series classification in conjunction with Support Vector Machines. Finally, two real-world applications related to ozone concentration in atmosphere and electricity demand have been considered.
2015-11-02T14:11:48ZTroncoso, AliciaArias Vicente, MartaRiquelme Santos, José CristóbalIn this paper a kernel for time-series data is introduced so that it can be used for any data mining task that relies on a similarity or distance metric. The main idea of our kernel is that it should recognize as highly similar time-series that are essentially the same but may be slightly perturbed from each other: for example, if one series is shifted with respect to the other or if it slightly misaligned. Namely, our kernel tries to focus on the shape of the time-series and ignores small perturbations such as misalignments or shifts. First, a recursive formulation of the kernel directly based on its definition is proposed. Then it is shown how to efficiently compute the kernel using an equivalent matrix-based formulation. To validate the proposed kernel three experiments have been carried out. As an initial step, several synthetic datasets have been generated from UCR time-series repository and the KDD challenge of 2007 with the purpose of validating the kernel-derived distance over shifted time-series. Also, the kernel has been applied to the original UCR time-series to analyze its potential in time-series classification in conjunction with Support Vector Machines. Finally, two real-world applications related to ozone concentration in atmosphere and electricity demand have been considered.Zipf's law for word frequencies: Word forms versus lemmas in long texts
http://hdl.handle.net/2117/77862
Zipf's law for word frequencies: Word forms versus lemmas in long texts
Corral, Alvaro; Boleda Torrent, Gemma; Ferrer Cancho, Ramon
Zipf's law is a fundamental paradigm in the statistics of written and spoken natural language as well as in other communication systems. We raise the question of the elementary units for which Zipf's law should hold in the most natural way, studying its validity for plain word forms and for the corresponding lemma forms. We analyze several long literary texts comprising four languages, with different levels of morphological complexity. In all cases Zipf's law is fulfilled, in the sense that a power-law distribution of word or lemma frequencies is valid for several orders of magnitude. We investigate the extent to which the word-lemma transformation preserves two parameters of Zipf's law: the exponent and the low-frequency cut-off. We are not able to demonstrate a strict invariance of the tail, as for a few texts both exponents deviate significantly, but we conclude that the exponents are very similar, despite the remarkavble transformation that going from words to lemmas represents, considerably affecting all ranges of frequencies. In contrast, the low-frequency cut-offs are less stable, tending to increase substantially after the transformation.
2015-10-19T08:26:11ZCorral, AlvaroBoleda Torrent, GemmaFerrer Cancho, RamonZipf's law is a fundamental paradigm in the statistics of written and spoken natural language as well as in other communication systems. We raise the question of the elementary units for which Zipf's law should hold in the most natural way, studying its validity for plain word forms and for the corresponding lemma forms. We analyze several long literary texts comprising four languages, with different levels of morphological complexity. In all cases Zipf's law is fulfilled, in the sense that a power-law distribution of word or lemma frequencies is valid for several orders of magnitude. We investigate the extent to which the word-lemma transformation preserves two parameters of Zipf's law: the exponent and the low-frequency cut-off. We are not able to demonstrate a strict invariance of the tail, as for a few texts both exponents deviate significantly, but we conclude that the exponents are very similar, despite the remarkavble transformation that going from words to lemmas represents, considerably affecting all ranges of frequencies. In contrast, the low-frequency cut-offs are less stable, tending to increase substantially after the transformation.Displacement logic for anaphora
http://hdl.handle.net/2117/28349
Displacement logic for anaphora
Morrill, Glyn; Valentín Fernández Gallart, José Oriol
The displacement calculus of Morrill, Valentín and Fadda (2011) [25] aspires to replace the calculus of Lambek (1958) [13] as the foundation of categorial grammar by accommodating intercalation as well as concatenation while remaining free of structural rules and enjoying Cut-elimination and its good corollaries. Jäger (2005) [11] proposes a type logical treatment of anaphora with syntactic duplication using limited contraction. Morrill and Valentín (2010) [24] apply (modal) displacement calculus to anaphora with lexical duplication and propose extension with a negation as failure in conjunction with additives to capture binding conditions. In this paper we present an account of anaphora developing characteristics and employing machinery from both of these proposals.
2015-06-19T09:15:45ZMorrill, GlynValentín Fernández Gallart, José OriolThe displacement calculus of Morrill, Valentín and Fadda (2011) [25] aspires to replace the calculus of Lambek (1958) [13] as the foundation of categorial grammar by accommodating intercalation as well as concatenation while remaining free of structural rules and enjoying Cut-elimination and its good corollaries. Jäger (2005) [11] proposes a type logical treatment of anaphora with syntactic duplication using limited contraction. Morrill and Valentín (2010) [24] apply (modal) displacement calculus to anaphora with lexical duplication and propose extension with a negation as failure in conjunction with additives to capture binding conditions. In this paper we present an account of anaphora developing characteristics and employing machinery from both of these proposals.The placement of the head that minimizes online memory: a complex systems approach
http://hdl.handle.net/2117/28306
The placement of the head that minimizes online memory: a complex systems approach
Ferrer Cancho, Ramon
It is well known that the length of a syntactic dependency determines its online memory cost. Thus, the problem of the placement of a head and its dependents (complements or modifiers) that minimizes online memory is equivalent to the problem of the minimum linear arrangement of a star tree. However, how that length is translated into cognitive cost is not known. This study shows that the online memory cost is minimized when the head is placed at the center, regardless of the function that transforms length into cost, provided only that this function is strictly monotonically increasing. Online memory defines a quasi-convex adaptive landscape with a single central minimum if the number of elements is odd and two central minima if that number is even. We discuss various aspects of the dynamics of word order of subject (S), verb (V) and object (O) from a complex systems perspective and suggest that word orders tend to evolve by swapping adjacent constituents from an initial or early SOV configuration that is attracted towards a central word order by online memory minimization. We also suggest that the stability of SVO is due to at least two factors, the quasi-convex shape of the adaptive landscape in the online memory dimension and online memory adaptations that avoid regression to SOV. Although OVS is also optimal for placing the verb at the center, its low frequency is explained by its long distance to the seminal SOV in the permutation space.
2015-06-15T11:41:48ZFerrer Cancho, RamonIt is well known that the length of a syntactic dependency determines its online memory cost. Thus, the problem of the placement of a head and its dependents (complements or modifiers) that minimizes online memory is equivalent to the problem of the minimum linear arrangement of a star tree. However, how that length is translated into cognitive cost is not known. This study shows that the online memory cost is minimized when the head is placed at the center, regardless of the function that transforms length into cost, provided only that this function is strictly monotonically increasing. Online memory defines a quasi-convex adaptive landscape with a single central minimum if the number of elements is odd and two central minima if that number is even. We discuss various aspects of the dynamics of word order of subject (S), verb (V) and object (O) from a complex systems perspective and suggest that word orders tend to evolve by swapping adjacent constituents from an initial or early SOV configuration that is attracted towards a central word order by online memory minimization. We also suggest that the stability of SVO is due to at least two factors, the quasi-convex shape of the adaptive landscape in the online memory dimension and online memory adaptations that avoid regression to SOV. Although OVS is also optimal for placing the verb at the center, its low frequency is explained by its long distance to the seminal SOV in the permutation space.Reply to the commentary "Be careful when assuming the obvious", by P. Alday
http://hdl.handle.net/2117/28305
Reply to the commentary "Be careful when assuming the obvious", by P. Alday
Ferrer Cancho, Ramon
Here we respond to some comments by Alday concerning headedness in linguistic theory and the validity of the assumptions of a mathematical model for word order. For brevity, we focus only on two assumptions: the unit of measurement of dependency length and the monotonicity of the cost of a dependency as a function of its length. We also revise the implicit psychological bias in Alday’s comments. Notwithstanding, Alday is indicating the path for linguistic research with his unusual concerns about parsimony from multiple dimensions.
2015-06-15T11:27:58ZFerrer Cancho, RamonHere we respond to some comments by Alday concerning headedness in linguistic theory and the validity of the assumptions of a mathematical model for word order. For brevity, we focus only on two assumptions: the unit of measurement of dependency length and the monotonicity of the cost of a dependency as a function of its length. We also revise the implicit psychological bias in Alday’s comments. Notwithstanding, Alday is indicating the path for linguistic research with his unusual concerns about parsimony from multiple dimensions.The risks of mixing dependency lengths from sequences of different length
http://hdl.handle.net/2117/28279
The risks of mixing dependency lengths from sequences of different length
Ferrer Cancho, Ramon; Liu, Haitao
Mixing dependency lengths from sequences of different length is a common practice in language research. However, the empirical distribution of dependency lengths of sentences of the same length differs from that of sentences of varying length. The distribution of dependency lengths depends on sentence length for real sentences and also under the null hypothesis that dependencies connect vertices located in random positions of the sequence. This suggests that certain results, such as the distribution of syntactic dependency lengths mixing dependencies from sentences of varying length, could be a mere consequence of that mixing. Furthermore, differences in the global averages of dependency length (mixing lengths from sentences of varying length) for two different languages do not simply imply a priori that one language optimizes dependency lengths better than the other because those differences could be due to differences in the distribution of sentence lengths and other factors.
2015-06-11T11:35:43ZFerrer Cancho, RamonLiu, HaitaoMixing dependency lengths from sequences of different length is a common practice in language research. However, the empirical distribution of dependency lengths of sentences of the same length differs from that of sentences of varying length. The distribution of dependency lengths depends on sentence length for real sentences and also under the null hypothesis that dependencies connect vertices located in random positions of the sequence. This suggests that certain results, such as the distribution of syntactic dependency lengths mixing dependencies from sentences of varying length, could be a mere consequence of that mixing. Furthermore, differences in the global averages of dependency length (mixing lengths from sentences of varying length) for two different languages do not simply imply a priori that one language optimizes dependency lengths better than the other because those differences could be due to differences in the distribution of sentence lengths and other factors.Beyond description: Comment on “Approaching human language with complex networks” by Cong and Liu
http://hdl.handle.net/2117/28273
Beyond description: Comment on “Approaching human language with complex networks” by Cong and Liu
Ferrer Cancho, Ramon
2015-06-11T10:03:57ZFerrer Cancho, RamonA categorial type logic
http://hdl.handle.net/2117/28269
A categorial type logic
Morrill, Glyn
In logical categorial grammar [23,11] syntactic structures are categorial proofs and semantic structures are intuitionistic proofs, and the syntax-semantics interface comprises a homomorphism from syntactic proofs to semantic proofs. Thereby, logical categorial grammar embodies in a pure logical form the principles of compositionality, lex-icalism, and parsing as deduction. Interest has focused on multimodal versions but the advent of the (dis)placement calculus of Morrill, Valentín and Fadda [21] suggests that the role of structural rules can be reduced, and this facilitates computational implementation. In this paper we specify a comprehensive formalism of (dis) placement logic for the parser/theorem prover CatLog integrating categorial logic connectives proposed to date and illustrate with a cover grammar of the Montague fragment.
2015-06-11T08:22:10ZMorrill, GlynIn logical categorial grammar [23,11] syntactic structures are categorial proofs and semantic structures are intuitionistic proofs, and the syntax-semantics interface comprises a homomorphism from syntactic proofs to semantic proofs. Thereby, logical categorial grammar embodies in a pure logical form the principles of compositionality, lex-icalism, and parsing as deduction. Interest has focused on multimodal versions but the advent of the (dis)placement calculus of Morrill, Valentín and Fadda [21] suggests that the role of structural rules can be reduced, and this facilitates computational implementation. In this paper we specify a comprehensive formalism of (dis) placement logic for the parser/theorem prover CatLog integrating categorial logic connectives proposed to date and illustrate with a cover grammar of the Montague fragment.Adaptively learning probabilistic deterministic automata from data streams
http://hdl.handle.net/2117/28256
Adaptively learning probabilistic deterministic automata from data streams
Balle Pigem, Borja de; Castro Rabal, Jorge; Gavaldà Mestre, Ricard
Markovian models with hidden state are widely-used formalisms for modeling sequential phenomena. Learnability of these models has been well studied when the sample is given in batch mode, and algorithms with PAC-like learning guarantees exist for specific classes of models such as Probabilistic Deterministic Finite Automata (PDFA). Here we focus on PDFA and give an algorithm for inferring models in this class in the restrictive data stream scenario: Unlike existing methods, our algorithm works incrementally and in one pass, uses memory sublinear in the stream length, and processes input items in amortized constant time. We also present extensions of the algorithm that (1) reduce to a minimum the need for guessing parameters of the target distribution and (2) are able to adapt to changes in the input distribution, relearning new models when needed. We provide rigorous PAC-like bounds for all of the above. Our algorithm makes a key usage of stream sketching techniques for reducing memory and processing time, and is modular in that it can use different tests for state equivalence and for change detection in the stream.
2015-06-10T12:27:22ZBalle Pigem, Borja deCastro Rabal, JorgeGavaldà Mestre, RicardMarkovian models with hidden state are widely-used formalisms for modeling sequential phenomena. Learnability of these models has been well studied when the sample is given in batch mode, and algorithms with PAC-like learning guarantees exist for specific classes of models such as Probabilistic Deterministic Finite Automata (PDFA). Here we focus on PDFA and give an algorithm for inferring models in this class in the restrictive data stream scenario: Unlike existing methods, our algorithm works incrementally and in one pass, uses memory sublinear in the stream length, and processes input items in amortized constant time. We also present extensions of the algorithm that (1) reduce to a minimum the need for guessing parameters of the target distribution and (2) are able to adapt to changes in the input distribution, relearning new models when needed. We provide rigorous PAC-like bounds for all of the above. Our algorithm makes a key usage of stream sketching techniques for reducing memory and processing time, and is modular in that it can use different tests for state equivalence and for change detection in the stream.