Reports de recerca
http://hdl.handle.net/2117/3488
2017-01-19T15:04:55ZAn efficient closed frequent itemset miner for the MOA stream mining system
http://hdl.handle.net/2117/99416
An efficient closed frequent itemset miner for the MOA stream mining system
Quadrana, Massimo; Bifet Figuerol, Albert Carles; Gavaldà Mestre, Ricard
Mining itemsets is a central task in data mining, both in the batch and the streaming paradigms. While robust, efficient, and well-tested implementations exist for batch mining, hardly any publicly available equivalent exists for the streaming scenario. The lack of an efficient, usable tool for the task hinders its use by practitioners and makes it difficult to assess new research in the area. To alleviate this situation, we review the algorithms described in the literature, and implement and evaluate the IncMine algorithm by Cheng, Ke, and Ng (2008) for mining frequent closed itemsets from data streams. Our implementation works on top of the MOA (Massive Online Analysis) stream mining framework to ease its use and integration with other stream mining tasks. We provide a PAC-style rigorous analysis of the quality of the output of IncMine as a function of its parameters; this type of analysis is rare in pattern mining algorithms. As a by-product, the analysis shows how one of the user-provided parameters in the original description can be removed entirely while retaining the performance guarantees. Finally, we experimentally confirm both on synthetic and real data the excellent performance of the algorithm, as reported in the original paper, and its ability to handle concept drift.
2017-01-17T10:16:22ZQuadrana, MassimoBifet Figuerol, Albert CarlesGavaldà Mestre, RicardMining itemsets is a central task in data mining, both in the batch and the streaming paradigms. While robust, efficient, and well-tested implementations exist for batch mining, hardly any publicly available equivalent exists for the streaming scenario. The lack of an efficient, usable tool for the task hinders its use by practitioners and makes it difficult to assess new research in the area. To alleviate this situation, we review the algorithms described in the literature, and implement and evaluate the IncMine algorithm by Cheng, Ke, and Ng (2008) for mining frequent closed itemsets from data streams. Our implementation works on top of the MOA (Massive Online Analysis) stream mining framework to ease its use and integration with other stream mining tasks. We provide a PAC-style rigorous analysis of the quality of the output of IncMine as a function of its parameters; this type of analysis is rare in pattern mining algorithms. As a by-product, the analysis shows how one of the user-provided parameters in the original description can be removed entirely while retaining the performance guarantees. Finally, we experimentally confirm both on synthetic and real data the excellent performance of the algorithm, as reported in the original paper, and its ability to handle concept drift.Modeling cloud resources using machine learning
http://hdl.handle.net/2117/99413
Modeling cloud resources using machine learning
Berral, Josep Ll.; Gavaldà Mestre, Ricard; Torres Viñals, Jordi
Cloud computing is a new Internet infrastructure paradigm where management optimization has become a challenge to be solved, as all current management systems are human-driven or ad-hoc automatic systems that must be tuned manually by experts. Management of cloud resources require accurate information about all the elements involved (host machines, resources, offered services, and clients), and some of this information can only be obtained a posteriori. Here we present the cloud and part of its architecture as a new scenario where data mining and machine learning can be applied to discover information and improve its management thanks to modeling and prediction. As a novel case of study we show in this work the modeling of basic cloud resources using machine learning, predicting resource requirements from context information like amount of load and clients, and also predicting the quality of service from resource planning, in order to feed cloud schedulers. Further, this work is an important part of our ongoing research program, where accurate models and predictors are essential to optimize cloud management autonomic systems.
2017-01-17T10:07:13ZBerral, Josep Ll.Gavaldà Mestre, RicardTorres Viñals, JordiCloud computing is a new Internet infrastructure paradigm where management optimization has become a challenge to be solved, as all current management systems are human-driven or ad-hoc automatic systems that must be tuned manually by experts. Management of cloud resources require accurate information about all the elements involved (host machines, resources, offered services, and clients), and some of this information can only be obtained a posteriori. Here we present the cloud and part of its architecture as a new scenario where data mining and machine learning can be applied to discover information and improve its management thanks to modeling and prediction. As a novel case of study we show in this work the modeling of basic cloud resources using machine learning, predicting resource requirements from context information like amount of load and clients, and also predicting the quality of service from resource planning, in order to feed cloud schedulers. Further, this work is an important part of our ongoing research program, where accurate models and predictors are essential to optimize cloud management autonomic systems.The Plogi and ACi-1 operators on the polynomial time hierarchy
http://hdl.handle.net/2117/97278
The Plogi and ACi-1 operators on the polynomial time hierarchy
Castro Rabal, Jorge; Seara Ojea, Carlos
In a previous paper ([CS-92]) we studied the agreement of operators P_{log^i} and AC^{i-1} acting on NP. In this article we extend this work to other classes of the polynomial time hierarchy. We show that on Sigma_k^p, Pi_k^p, Delta_k^P and Theta_k^P-classes both operators have the same behaviour, but this coincidence does not seem to be true on other classes included in the PH hierarchy: we give a set A such that, relativized to A, P_{log^i}(P_{log^j}(NP)) is different from AC^{i-1}(P_{log^j}(NP)). As a result of these characterizations we show P_{log}(Theta_k^p) = Theta_k^p, an equality that is useful to show lowness properties. In fact, we get easily the Theta-lowness results given by Long and Sheu in their paper [LS-91]. Besides, we clarify the situation of the classes in L_2^{p,Delta} for which their membership to L_2^{p,Theta} was not clear.
2016-11-28T09:01:43ZCastro Rabal, JorgeSeara Ojea, CarlosIn a previous paper ([CS-92]) we studied the agreement of operators P_{log^i} and AC^{i-1} acting on NP. In this article we extend this work to other classes of the polynomial time hierarchy. We show that on Sigma_k^p, Pi_k^p, Delta_k^P and Theta_k^P-classes both operators have the same behaviour, but this coincidence does not seem to be true on other classes included in the PH hierarchy: we give a set A such that, relativized to A, P_{log^i}(P_{log^j}(NP)) is different from AC^{i-1}(P_{log^j}(NP)). As a result of these characterizations we show P_{log}(Theta_k^p) = Theta_k^p, an equality that is useful to show lowness properties. In fact, we get easily the Theta-lowness results given by Long and Sheu in their paper [LS-91]. Besides, we clarify the situation of the classes in L_2^{p,Delta} for which their membership to L_2^{p,Theta} was not clear.Syntactic connectivity
http://hdl.handle.net/2117/97249
Syntactic connectivity
Morrill, Glyn
Type logical grammar presents a paradigm of linguistic
description based on what we may refer to as a
Lambek-van Benthem correspondence: (logical) formulas
as (linguistic) categories. Lexical signs are classified by
category formulas, and the language model projected by
a lexicon is determined by the consequence relation
induced on category formulas by their interpretation.
In this logical model of language, (logical) proofs
correspond to (linguistic) derivations, but such syntax
serves just to calculate what is generated,
not to define it. Although syntax
plays no definitional role linguistically, from a
computational linguistic point of view we are interested
in the process of grammatical reasoning, and we propose
to reinstate syntactic
structure as the trace of such processing.
Addressing the question `What is the essential structure
of the relevant kinds of proofs?' yields a new answer
to the question `What is syntactic structure?' under
the slogan *proof nets as syntactic structures*.
This provides a particularly vivid realisation of
the notion of categorial syntactic connection of Ajdukiewicz (1935)
as a harmonic mutual connectivity of the valencies of the words
making up a sentence.
We offer a general methodology
for the development of proof nets for partially commutative
categorial logics.
2016-11-25T12:22:04ZMorrill, GlynType logical grammar presents a paradigm of linguistic
description based on what we may refer to as a
Lambek-van Benthem correspondence: (logical) formulas
as (linguistic) categories. Lexical signs are classified by
category formulas, and the language model projected by
a lexicon is determined by the consequence relation
induced on category formulas by their interpretation.
In this logical model of language, (logical) proofs
correspond to (linguistic) derivations, but such syntax
serves just to calculate what is generated,
not to define it. Although syntax
plays no definitional role linguistically, from a
computational linguistic point of view we are interested
in the process of grammatical reasoning, and we propose
to reinstate syntactic
structure as the trace of such processing.
Addressing the question `What is the essential structure
of the relevant kinds of proofs?' yields a new answer
to the question `What is syntactic structure?' under
the slogan *proof nets as syntactic structures*.
This provides a particularly vivid realisation of
the notion of categorial syntactic connection of Ajdukiewicz (1935)
as a harmonic mutual connectivity of the valencies of the words
making up a sentence.
We offer a general methodology
for the development of proof nets for partially commutative
categorial logics.Memoisation of categorial proof nets: parallelism in categorial processing
http://hdl.handle.net/2117/97106
Memoisation of categorial proof nets: parallelism in categorial processing
Morrill, Glyn
We introduce a method of memoisation of categorial proof nets.
Exploiting the planarity of non-commutative proof nets, and
unifiability as a correctness criterion, parallelism is simulated
through construction of a proof net matrix of most general unifiers
for modules, in a manner analogous to the Cocke-Younger-Kasami
algorithm for context free grammars.
2016-11-23T11:50:47ZMorrill, GlynWe introduce a method of memoisation of categorial proof nets.
Exploiting the planarity of non-commutative proof nets, and
unifiability as a correctness criterion, parallelism is simulated
through construction of a proof net matrix of most general unifiers
for modules, in a manner analogous to the Cocke-Younger-Kasami
algorithm for context free grammars.Learnability of Kolmogorov-easy circuit expressions via queries
http://hdl.handle.net/2117/96996
Learnability of Kolmogorov-easy circuit expressions via queries
Balcázar Navarro, José Luis; Buhrman, H; Hermo, M
Circuit expressions were introduced to provide a natural link between Computational Learning and certain aspects of Structural Complexity. Upper and lower bounds on the learnability of circuit expressions are known. We study here the case in which the circuit expressions are of low (time-bounded) Kolmogorov complexity. We show that these are polynomial-time learnable from membership queries in the presence of an NP oracle. We also exactly characterize, in terms of advice classes, the sets that have such easy circuit expressions, obtain consequences regarding their lowness, and precisely identify the subclass whose circuit expressions can be learned from membership queries alone, by means of doubly tally polynomial time degrees. The extension of the results to various Kolmogorov complexity bounds is discussed.
2016-11-22T12:03:01ZBalcázar Navarro, José LuisBuhrman, HHermo, MCircuit expressions were introduced to provide a natural link between Computational Learning and certain aspects of Structural Complexity. Upper and lower bounds on the learnability of circuit expressions are known. We study here the case in which the circuit expressions are of low (time-bounded) Kolmogorov complexity. We show that these are polynomial-time learnable from membership queries in the presence of an NP oracle. We also exactly characterize, in terms of advice classes, the sets that have such easy circuit expressions, obtain consequences regarding their lowness, and precisely identify the subclass whose circuit expressions can be learned from membership queries alone, by means of doubly tally polynomial time degrees. The extension of the results to various Kolmogorov complexity bounds is discussed.Incremental processing and acceptability
http://hdl.handle.net/2117/96982
Incremental processing and acceptability
Morrill, Glyn
We present a left to right incremental algorithm for the processing of Lambek categorial grammar by proof net construction. A simple metric of complexity, the profile in time of the number of unresolved valencies, correctly predicts a wide variety of performance phenomena: garden pathing, left to right quantifier scope preference, centre embedding unacceptability, preference for lower attachment, and heavy noun phrase shift.
2016-11-22T11:07:40ZMorrill, GlynWe present a left to right incremental algorithm for the processing of Lambek categorial grammar by proof net construction. A simple metric of complexity, the profile in time of the number of unresolved valencies, correctly predicts a wide variety of performance phenomena: garden pathing, left to right quantifier scope preference, centre embedding unacceptability, preference for lower attachment, and heavy noun phrase shift.Higher-order linear logic programming of categorial deduction
http://hdl.handle.net/2117/96971
Higher-order linear logic programming of categorial deduction
Morrill, Glyn
We show how categorial deduction can be implemented in (higher-order) linear logic programming, thereby realising parsing as deduction for the associative and non-associative Lambek calculi. This provides a method of solution to the parsing problem of Lambek categorial grammar applicable to a variety of its extensions. We illustrate categorial calculi of discontinuity, and calculi with unary bracket operators.
2016-11-22T09:49:46ZMorrill, GlynWe show how categorial deduction can be implemented in (higher-order) linear logic programming, thereby realising parsing as deduction for the associative and non-associative Lambek calculi. This provides a method of solution to the parsing problem of Lambek categorial grammar applicable to a variety of its extensions. We illustrate categorial calculi of discontinuity, and calculi with unary bracket operators.Discontinuity and pied-piping in categorial grammar
http://hdl.handle.net/2117/96884
Discontinuity and pied-piping in categorial grammar
Morrill, Glyn
Discontinuity refers to the phenomena in many natural language constructions whereby signs differ markedly in their prosodic and semantic forms. As such it presents interesting demands on monostratal computational formalisms which aspire to descriptive adequacy. Pied-piping, in particular, is argued by Pollard (1988) to motivate phrase structure-style feature percolation. In the context of categorial grammar, Bach (1984), Moortgat (1988, 1990, 1991) and others have sought to provide categorial operators suited to discontinuity. These attempts encounter certain difficulties with respect to model theory and/or proof theory, difficulties which the current proposals are intended to resolve. These proposals are accompanied by introduction of a new categorial proof format: labelled Fitch-style natural deduction. The associative and non-associative Lambek calculi are sound and complete with respect to interpretation by residuation with respect to prosodic adjunction operations in semigroup and groupoid algebras (Buszkowski 1986; Kandulski 1988). In Moortgat and Morrill (1991) is is shown how to give calculi for families of categorial operators, each defined by residuation with respect to an operation of prosodic adjunction (associative, non-associative, or with interactive axioms). The present paper treats discontinuity in this way, by residuation with respect to three adjunctions: + (associative), (., .) (split-point marking), and W (wrapping) related by the equation (s_1, s_2)Ws_3 = s_1+s_3+s_2. The system is illustrated by reference to particle verbs, discontinuous idioms, quantifier scope and quantifier scope ambiguity, pied-piping, gapping, and object-antecedent reflexivisation.
2016-11-21T10:58:45ZMorrill, GlynDiscontinuity refers to the phenomena in many natural language constructions whereby signs differ markedly in their prosodic and semantic forms. As such it presents interesting demands on monostratal computational formalisms which aspire to descriptive adequacy. Pied-piping, in particular, is argued by Pollard (1988) to motivate phrase structure-style feature percolation. In the context of categorial grammar, Bach (1984), Moortgat (1988, 1990, 1991) and others have sought to provide categorial operators suited to discontinuity. These attempts encounter certain difficulties with respect to model theory and/or proof theory, difficulties which the current proposals are intended to resolve. These proposals are accompanied by introduction of a new categorial proof format: labelled Fitch-style natural deduction. The associative and non-associative Lambek calculi are sound and complete with respect to interpretation by residuation with respect to prosodic adjunction operations in semigroup and groupoid algebras (Buszkowski 1986; Kandulski 1988). In Moortgat and Morrill (1991) is is shown how to give calculi for families of categorial operators, each defined by residuation with respect to an operation of prosodic adjunction (associative, non-associative, or with interactive axioms). The present paper treats discontinuity in this way, by residuation with respect to three adjunctions: + (associative), (., .) (split-point marking), and W (wrapping) related by the equation (s_1, s_2)Ws_3 = s_1+s_3+s_2. The system is illustrated by reference to particle verbs, discontinuous idioms, quantifier scope and quantifier scope ambiguity, pied-piping, gapping, and object-antecedent reflexivisation.Clausal proof nets and discontinuity
http://hdl.handle.net/2117/96810
Clausal proof nets and discontinuity
Morrill, Glyn
We consider the task of theorem proving in Lambek calculi and their generalisation to "multimodal residuation calculi". These form an integral part of categorial logic, a logic of signs stemming from categorial grammar, on the basis of which language processing is essentially theorem proving. The demand of this application is not just for efficient processing of some or other specific calculus, but for methods that will be generally applicable to categorial logics. It is proposed that multimodal cases be treated by dealing with the highest common factor of all the connectives as linear (propositional) validity. The prosodic (sublinear) aspects are encoded in labels, in effect the term-structure of quantified linear logic. The correctness condition on proof nets ("long trip condition") can be implemented by SLD resolution in linear logic with unification on labels/terms limited to one-way matching. A suitable unification strategy is obtained for calculi of discontinuity by normalisation of the ground goal term followed by recursive decent and redex pattern-matching on the head term.
2016-11-18T11:09:27ZMorrill, GlynWe consider the task of theorem proving in Lambek calculi and their generalisation to "multimodal residuation calculi". These form an integral part of categorial logic, a logic of signs stemming from categorial grammar, on the basis of which language processing is essentially theorem proving. The demand of this application is not just for efficient processing of some or other specific calculus, but for methods that will be generally applicable to categorial logics. It is proposed that multimodal cases be treated by dealing with the highest common factor of all the connectives as linear (propositional) validity. The prosodic (sublinear) aspects are encoded in labels, in effect the term-structure of quantified linear logic. The correctness condition on proof nets ("long trip condition") can be implemented by SLD resolution in linear logic with unification on labels/terms limited to one-way matching. A suitable unification strategy is obtained for calculi of discontinuity by normalisation of the ground goal term followed by recursive decent and redex pattern-matching on the head term.