LARCA - Laboratori d'Algorísmia Relacional, Complexitat i Aprenentatge
http://hdl.handle.net/2117/3486
Sun, 07 Feb 2016 11:09:23 GMT2016-02-07T11:09:23ZApproximating the expressive power of logics in finite models
http://hdl.handle.net/2117/82062
Approximating the expressive power of logics in finite models
Arratia Quesada, Argimiro Alejandro; Ortiz, Carlos E.
We present a probability logic (essentially a first order language extended with quantifiers that count the fraction of elements in a model that satisfy a first order formula) which, on the one hand, captures uniform circuit classes such as AC0 and TC0 over arithmetic models, namely, finite structures with linear order and arithmetic relations, and, on the other hand, their semantics, with respect to our arithmetic models, can be closely approximated by giving interpretations of their formulas on finite structures where all relations (including the order) are restricted to be “modular” (i.e. to act subject to an integer modulo). In order to give a precise measure of the proximity between satisfaction of a formula in an arithmetic model and satisfaction of the same formula in the “approximate” model, we define the approximate formulas and work on a notion of approximate truth. We also indicate how to enhance the expressive power of our probability logic in order to capture polynomial time decidable queries.
There are various motivations for this work. As of today, there is not known logical description of any computational complexity class below NP which does not requires a built–in linear order. Also, it is widely recognized that many model theoretic techniques for showing definability in logics on finite structures become almost useless when order is present. Hence, if we want to obtain significant lower bound results in computational complexity via the logical description we ought to find ways of by-passing the ordering restriction. With this work we take steps towards understanding how well can we approximate, without a true order, the expressive power of logics that capture complexity classes on ordered structures.
Tue, 26 Jan 2016 13:41:06 GMThttp://hdl.handle.net/2117/820622016-01-26T13:41:06ZArratia Quesada, Argimiro AlejandroOrtiz, Carlos E.We present a probability logic (essentially a first order language extended with quantifiers that count the fraction of elements in a model that satisfy a first order formula) which, on the one hand, captures uniform circuit classes such as AC0 and TC0 over arithmetic models, namely, finite structures with linear order and arithmetic relations, and, on the other hand, their semantics, with respect to our arithmetic models, can be closely approximated by giving interpretations of their formulas on finite structures where all relations (including the order) are restricted to be “modular” (i.e. to act subject to an integer modulo). In order to give a precise measure of the proximity between satisfaction of a formula in an arithmetic model and satisfaction of the same formula in the “approximate” model, we define the approximate formulas and work on a notion of approximate truth. We also indicate how to enhance the expressive power of our probability logic in order to capture polynomial time decidable queries.
There are various motivations for this work. As of today, there is not known logical description of any computational complexity class below NP which does not requires a built–in linear order. Also, it is widely recognized that many model theoretic techniques for showing definability in logics on finite structures become almost useless when order is present. Hence, if we want to obtain significant lower bound results in computational complexity via the logical description we ought to find ways of by-passing the ordering restriction. With this work we take steps towards understanding how well can we approximate, without a true order, the expressive power of logics that capture complexity classes on ordered structures.The robustness of periodic orchestrations in uncertain evolving environments
http://hdl.handle.net/2117/82017
The robustness of periodic orchestrations in uncertain evolving environments
Castro Rabal, Jorge; Gabarró Vallès, Joaquim; Serna Iglesias, María José; Stewart, Alan
A framework for assessing the robustness of long-duration repetitive orchestrations in uncertain evolving environments is proposed. The model assumes that service-based evaluation environments are stable over short time-frames only; over longer periods service-based environments evolve as demand fluctuates and contention for shared resources varies.
The behaviour of a short-duration orchestration E in a stable environment is assessed by an uncertainty profile U and a corresponding zero-sum angel-daemon game Gamma (U).
Here the angel-daemon approach is extended to assess evolving environments by means of a subfamily of stochastic games. These games are called strategy oblivious because their transition probabilities are strategy independent. It is shown that the value of a strategy oblivious stochastic game is well defined and that it can be computed by solving a linear system. Finally, the proposed stochastic framework is used to assess the evolution of the Gabrmn IT system.
Tue, 26 Jan 2016 08:45:30 GMThttp://hdl.handle.net/2117/820172016-01-26T08:45:30ZCastro Rabal, JorgeGabarró Vallès, JoaquimSerna Iglesias, María JoséStewart, AlanA framework for assessing the robustness of long-duration repetitive orchestrations in uncertain evolving environments is proposed. The model assumes that service-based evaluation environments are stable over short time-frames only; over longer periods service-based environments evolve as demand fluctuates and contention for shared resources varies.
The behaviour of a short-duration orchestration E in a stable environment is assessed by an uncertainty profile U and a corresponding zero-sum angel-daemon game Gamma (U).
Here the angel-daemon approach is extended to assess evolving environments by means of a subfamily of stochastic games. These games are called strategy oblivious because their transition probabilities are strategy independent. It is shown that the value of a strategy oblivious stochastic game is well defined and that it can be computed by solving a linear system. Finally, the proposed stochastic framework is used to assess the evolution of the Gabrmn IT system.Pattern Structures and Concept Lattices for Data Mining and Knowledge Processing
http://hdl.handle.net/2117/81250
Pattern Structures and Concept Lattices for Data Mining and Knowledge Processing
Kaytoue, Mehdi; Codocedo, Victor; Aleksey, Buzmakov; Baixeries i Juvillà, Jaume
This article aims at presenting recent advances in Formal Concept Analysis (2010-2015), especially when the question is dealing with complex data (numbers, graphs, sequences, etc.) in domains such as databases (functional dependencies), data-mining (local pattern discovery), information retrieval and information fusion. As these advances
are mainly published in artificial intelligence and FCA dedicated venues, a dissemination towards data mining and machine learning is worthwhile.
Mon, 11 Jan 2016 18:47:36 GMThttp://hdl.handle.net/2117/812502016-01-11T18:47:36ZKaytoue, MehdiCodocedo, VictorAleksey, BuzmakovBaixeries i Juvillà, JaumeThis article aims at presenting recent advances in Formal Concept Analysis (2010-2015), especially when the question is dealing with complex data (numbers, graphs, sequences, etc.) in domains such as databases (functional dependencies), data-mining (local pattern discovery), information retrieval and information fusion. As these advances
are mainly published in artificial intelligence and FCA dedicated venues, a dissemination towards data mining and machine learning is worthwhile.Absolute-type shaft encoding using LFSR sequences with a prescribed length
http://hdl.handle.net/2117/79981
Absolute-type shaft encoding using LFSR sequences with a prescribed length
Fuertes Armengol, José Mª; Balle Pigem, Borja de; Ventura Capell, Enric
Maximal-length binary sequences have existed for a long time. They have many interesting properties, and one of them is that, when taken in blocks of n consecutive positions, they form 2n - 1 different codes in a closed circular sequence. This property can be used to measure absolute angular positions as the circle can be divided into as many parts as different codes can be retrieved. This paper describes how a closed binary sequence with an arbitrary length can be effectively designed with the minimal possible block length using linear feedback shift registers. Such sequences can be used to measure a specified exact number of angular positions using the minimal possible number of sensors that linear methods allow.
Thu, 26 Nov 2015 17:41:12 GMThttp://hdl.handle.net/2117/799812015-11-26T17:41:12ZFuertes Armengol, José MªBalle Pigem, Borja deVentura Capell, EnricMaximal-length binary sequences have existed for a long time. They have many interesting properties, and one of them is that, when taken in blocks of n consecutive positions, they form 2n - 1 different codes in a closed circular sequence. This property can be used to measure absolute angular positions as the circle can be divided into as many parts as different codes can be retrieved. This paper describes how a closed binary sequence with an arbitrary length can be effectively designed with the minimal possible block length using linear feedback shift registers. Such sequences can be used to measure a specified exact number of angular positions using the minimal possible number of sensors that linear methods allow.Non-crossing dependencies: Least effort, not grammar
http://hdl.handle.net/2117/79345
Non-crossing dependencies: Least effort, not grammar
Ferrer Cancho, Ramon
The use of null hypotheses (in a statistical sense) is common in hard sciences but not in theoretical linguistics. Here the null hypothesis that the low frequency of syntactic dependency crossings is expected by an arbitrary ordering of words is rejected. It is shown that this would require star dependency structures, which are both unrealistic and too restrictive. The hypothesis of the limited resources of the human brain is revisited. Stronger null hypotheses taking into account actual dependency lengths for the likelihood of crossings are presented. Those hypotheses suggests that crossings are likely to reduce when dependencies are shortened. A hypothesis based on pressure to reduce dependency lengths is more parsimonious than a principle of minimization of crossings or a grammatical ban that is totally dissociated from the general and non-linguistic principle of economy.
Tue, 17 Nov 2015 09:45:11 GMThttp://hdl.handle.net/2117/793452015-11-17T09:45:11ZFerrer Cancho, RamonThe use of null hypotheses (in a statistical sense) is common in hard sciences but not in theoretical linguistics. Here the null hypothesis that the low frequency of syntactic dependency crossings is expected by an arbitrary ordering of words is rejected. It is shown that this would require star dependency structures, which are both unrealistic and too restrictive. The hypothesis of the limited resources of the human brain is revisited. Stronger null hypotheses taking into account actual dependency lengths for the likelihood of crossings are presented. Those hypotheses suggests that crossings are likely to reduce when dependencies are shortened. A hypothesis based on pressure to reduce dependency lengths is more parsimonious than a principle of minimization of crossings or a grammatical ban that is totally dissociated from the general and non-linguistic principle of economy.Entailment among probabilistic implications
http://hdl.handle.net/2117/79017
Entailment among probabilistic implications
Atserias, Albert; Balcázar Navarro, José Luis
We study a natural variant of the implicational fragment of propositional logic. Its formulas are pairs of conjunctions of positive literals, related together by an implicational-like connective, the semantics of this sort of implication is defined in terms of a threshold on a conditional probability of the consequent, given the antecedent: we are dealing with what the data analysis community calls confidence of partial implications or association rules. Existing studies of redundancy among these partial implications have characterized so far only entailment from one premise and entailment from two premises. By exploiting a previously noted alternative view of this entailment in terms of linear programming duality, we characterize exactly the cases of entailment from arbitrary numbers of premises. As a result, we obtain decision algorithms of better complexity, additionally, for each potential case of entailment, we identify a critical confidence threshold and show that it is, actually, intrinsic to each set of premises and antecedent of the conclusion.
Wed, 11 Nov 2015 12:38:12 GMThttp://hdl.handle.net/2117/790172015-11-11T12:38:12ZAtserias, AlbertBalcázar Navarro, José LuisWe study a natural variant of the implicational fragment of propositional logic. Its formulas are pairs of conjunctions of positive literals, related together by an implicational-like connective, the semantics of this sort of implication is defined in terms of a threshold on a conditional probability of the consequent, given the antecedent: we are dealing with what the data analysis community calls confidence of partial implications or association rules. Existing studies of redundancy among these partial implications have characterized so far only entailment from one premise and entailment from two premises. By exploiting a previously noted alternative view of this entailment in terms of linear programming duality, we characterize exactly the cases of entailment from arbitrary numbers of premises. As a result, we obtain decision algorithms of better complexity, additionally, for each potential case of entailment, we identify a critical confidence threshold and show that it is, actually, intrinsic to each set of premises and antecedent of the conclusion.A multi-scale smoothing kernel for measuring time-series similarity
http://hdl.handle.net/2117/78645
A multi-scale smoothing kernel for measuring time-series similarity
Troncoso, Alicia; Arias Vicente, Marta; Riquelme Santos, José Cristóbal
In this paper a kernel for time-series data is introduced so that it can be used for any data mining task that relies on a similarity or distance metric. The main idea of our kernel is that it should recognize as highly similar time-series that are essentially the same but may be slightly perturbed from each other: for example, if one series is shifted with respect to the other or if it slightly misaligned. Namely, our kernel tries to focus on the shape of the time-series and ignores small perturbations such as misalignments or shifts. First, a recursive formulation of the kernel directly based on its definition is proposed. Then it is shown how to efficiently compute the kernel using an equivalent matrix-based formulation. To validate the proposed kernel three experiments have been carried out. As an initial step, several synthetic datasets have been generated from UCR time-series repository and the KDD challenge of 2007 with the purpose of validating the kernel-derived distance over shifted time-series. Also, the kernel has been applied to the original UCR time-series to analyze its potential in time-series classification in conjunction with Support Vector Machines. Finally, two real-world applications related to ozone concentration in atmosphere and electricity demand have been considered.
Mon, 02 Nov 2015 14:11:48 GMThttp://hdl.handle.net/2117/786452015-11-02T14:11:48ZTroncoso, AliciaArias Vicente, MartaRiquelme Santos, José CristóbalIn this paper a kernel for time-series data is introduced so that it can be used for any data mining task that relies on a similarity or distance metric. The main idea of our kernel is that it should recognize as highly similar time-series that are essentially the same but may be slightly perturbed from each other: for example, if one series is shifted with respect to the other or if it slightly misaligned. Namely, our kernel tries to focus on the shape of the time-series and ignores small perturbations such as misalignments or shifts. First, a recursive formulation of the kernel directly based on its definition is proposed. Then it is shown how to efficiently compute the kernel using an equivalent matrix-based formulation. To validate the proposed kernel three experiments have been carried out. As an initial step, several synthetic datasets have been generated from UCR time-series repository and the KDD challenge of 2007 with the purpose of validating the kernel-derived distance over shifted time-series. Also, the kernel has been applied to the original UCR time-series to analyze its potential in time-series classification in conjunction with Support Vector Machines. Finally, two real-world applications related to ozone concentration in atmosphere and electricity demand have been considered.An agent-based model of the emergence and transmission of a language system for the expression of logical combinations
http://hdl.handle.net/2117/77870
An agent-based model of the emergence and transmission of a language system for the expression of logical combinations
Sierra Santibáñez, Josefina
This paper presents an agent-based model of the emergence and transmission of a language system for the expression of logical combinations of propositions. The model assumes the agents have some cognitive capacities for invention, adoption, repair, induction and adaptation, a common vocabulary for basic categories, and the ability to construct complex concepts using recursive combinations of basic categories and logical categories. It also supposes the agents initially do not have a vocabulary for logical categories (i.e. logical connectives), nor grammatical constructions for expressing logical
combinations of basic categories through language. The results of the experiments we have performed show that a language system for the expression of logical combinations emerges as a result of a process of self-organisation of the agents’ linguistic interactions. Such a language system is concise, because it only uses words and grammatical constructions for three logical categories (i.e. and, or, not). It is also expressive, since it allows the communication of logical combinations of categories of the same complexity as propositional logic formulas, using linguistic devices such as syntactic categories, word order and auxiliary words. Furthermore, it is easy to learn and reliably transmitted across generations, according to the results of our experiments.
Mon, 19 Oct 2015 10:27:18 GMThttp://hdl.handle.net/2117/778702015-10-19T10:27:18ZSierra Santibáñez, JosefinaThis paper presents an agent-based model of the emergence and transmission of a language system for the expression of logical combinations of propositions. The model assumes the agents have some cognitive capacities for invention, adoption, repair, induction and adaptation, a common vocabulary for basic categories, and the ability to construct complex concepts using recursive combinations of basic categories and logical categories. It also supposes the agents initially do not have a vocabulary for logical categories (i.e. logical connectives), nor grammatical constructions for expressing logical
combinations of basic categories through language. The results of the experiments we have performed show that a language system for the expression of logical combinations emerges as a result of a process of self-organisation of the agents’ linguistic interactions. Such a language system is concise, because it only uses words and grammatical constructions for three logical categories (i.e. and, or, not). It is also expressive, since it allows the communication of logical combinations of categories of the same complexity as propositional logic formulas, using linguistic devices such as syntactic categories, word order and auxiliary words. Furthermore, it is easy to learn and reliably transmitted across generations, according to the results of our experiments.Zipf's law for word frequencies: Word forms versus lemmas in long texts
http://hdl.handle.net/2117/77862
Zipf's law for word frequencies: Word forms versus lemmas in long texts
Corral, Alvaro; Boleda Torrent, Gemma; Ferrer Cancho, Ramon
Zipf's law is a fundamental paradigm in the statistics of written and spoken natural language as well as in other communication systems. We raise the question of the elementary units for which Zipf's law should hold in the most natural way, studying its validity for plain word forms and for the corresponding lemma forms. We analyze several long literary texts comprising four languages, with different levels of morphological complexity. In all cases Zipf's law is fulfilled, in the sense that a power-law distribution of word or lemma frequencies is valid for several orders of magnitude. We investigate the extent to which the word-lemma transformation preserves two parameters of Zipf's law: the exponent and the low-frequency cut-off. We are not able to demonstrate a strict invariance of the tail, as for a few texts both exponents deviate significantly, but we conclude that the exponents are very similar, despite the remarkavble transformation that going from words to lemmas represents, considerably affecting all ranges of frequencies. In contrast, the low-frequency cut-offs are less stable, tending to increase substantially after the transformation.
Mon, 19 Oct 2015 08:26:11 GMThttp://hdl.handle.net/2117/778622015-10-19T08:26:11ZCorral, AlvaroBoleda Torrent, GemmaFerrer Cancho, RamonZipf's law is a fundamental paradigm in the statistics of written and spoken natural language as well as in other communication systems. We raise the question of the elementary units for which Zipf's law should hold in the most natural way, studying its validity for plain word forms and for the corresponding lemma forms. We analyze several long literary texts comprising four languages, with different levels of morphological complexity. In all cases Zipf's law is fulfilled, in the sense that a power-law distribution of word or lemma frequencies is valid for several orders of magnitude. We investigate the extent to which the word-lemma transformation preserves two parameters of Zipf's law: the exponent and the low-frequency cut-off. We are not able to demonstrate a strict invariance of the tail, as for a few texts both exponents deviate significantly, but we conclude that the exponents are very similar, despite the remarkavble transformation that going from words to lemmas represents, considerably affecting all ranges of frequencies. In contrast, the low-frequency cut-offs are less stable, tending to increase substantially after the transformation.ALOJA-ML: a framework for automating characterization and knowledge discovery in Hadoop deployments
http://hdl.handle.net/2117/77791
ALOJA-ML: a framework for automating characterization and knowledge discovery in Hadoop deployments
Berral García, Josep Lluís; Poggi, Nicolas; Carrera Pérez, David; Call, Aaaron; Reinauer, Rob; Green, Daron
This article presents ALOJA-Machine Learning (ALOJA-ML) an extension to the ALOJA project that uses machine learning techniques to interpret Hadoop benchmark performance data and performance tuning; here we detail the approach, efficacy of the model and initial results.
The ALOJA-ML project is the latest phase of a long-term collaboration between BSC and Microsoft, to automate the characterization of cost-effectiveness on Big Data deployments, focusing on Hadoop.
Hadoop presents a complex execution environment, where costs and performance depends on a large number of software (SW) configurations and on multiple hardware (HW) deployment choices.
Recently the ALOJA project presented an open, vendor-neutral repository, featuring over 16.000 Hadoop executions. These results are accompanied by a test bed and tools to deploy and evaluate the cost-effectiveness of the different hardware configurations, parameter tunings, and Cloud services.
Despite early success within ALOJA from expert-guided benchmarking, it became clear that a genuinely comprehensive study requires automation of modeling procedures to allow a systematic analysis of large and resource-constrained search spaces.
ALOJA-ML provides such an automated system allowing knowledge discovery by modeling Hadoop executions from observed benchmarks across a broad set of configuration parameters.
The resulting empirically-derived performance models can be used to forecast execution behavior of various workloads; they allow a-priori prediction of the execution times for new configurations and HW choices and they offer a route to model-based anomaly detection. In addition, these models can guide the benchmarking exploration efficiently, by automatically prioritizing candidate future benchmark tests.
Insights from ALOJA-ML's models can be used to reduce the operational time on clusters, speed-up the data acquisition and knowledge discovery process, and importantly, reduce running costs.
In addition to learning from the methodology presented in this work, the community can benefit in general from ALOJA data-sets, framework, and derived insights to improve the design and deployment of Big Data applications.
Thu, 15 Oct 2015 17:25:03 GMThttp://hdl.handle.net/2117/777912015-10-15T17:25:03ZBerral García, Josep LluísPoggi, NicolasCarrera Pérez, DavidCall, AaaronReinauer, RobGreen, DaronThis article presents ALOJA-Machine Learning (ALOJA-ML) an extension to the ALOJA project that uses machine learning techniques to interpret Hadoop benchmark performance data and performance tuning; here we detail the approach, efficacy of the model and initial results.
The ALOJA-ML project is the latest phase of a long-term collaboration between BSC and Microsoft, to automate the characterization of cost-effectiveness on Big Data deployments, focusing on Hadoop.
Hadoop presents a complex execution environment, where costs and performance depends on a large number of software (SW) configurations and on multiple hardware (HW) deployment choices.
Recently the ALOJA project presented an open, vendor-neutral repository, featuring over 16.000 Hadoop executions. These results are accompanied by a test bed and tools to deploy and evaluate the cost-effectiveness of the different hardware configurations, parameter tunings, and Cloud services.
Despite early success within ALOJA from expert-guided benchmarking, it became clear that a genuinely comprehensive study requires automation of modeling procedures to allow a systematic analysis of large and resource-constrained search spaces.
ALOJA-ML provides such an automated system allowing knowledge discovery by modeling Hadoop executions from observed benchmarks across a broad set of configuration parameters.
The resulting empirically-derived performance models can be used to forecast execution behavior of various workloads; they allow a-priori prediction of the execution times for new configurations and HW choices and they offer a route to model-based anomaly detection. In addition, these models can guide the benchmarking exploration efficiently, by automatically prioritizing candidate future benchmark tests.
Insights from ALOJA-ML's models can be used to reduce the operational time on clusters, speed-up the data acquisition and knowledge discovery process, and importantly, reduce running costs.
In addition to learning from the methodology presented in this work, the community can benefit in general from ALOJA data-sets, framework, and derived insights to improve the design and deployment of Big Data applications.