Ponències/Comunicacions de congressos
http://hdl.handle.net/2117/3095
2017-08-18T07:17:43ZAnti-alignments in conformance checking: the dark side of process models
http://hdl.handle.net/2117/103958
Anti-alignments in conformance checking: the dark side of process models
Chatain, Thomas; Carmona Vargas, Josep
Conformance checking techniques asses the suitability of a process model in representing an underlying process, observed through a collection of real executions. These techniques suffer from the wellknown state space explosion problem, hence handling process models exhibiting large or even infinite state spaces remains a challenge. One important metric in conformance checking is to asses the precision of the model with respect to the observed executions, i.e., characterize the ability of the model to produce behavior unrelated to the one observed. By avoiding the computation of the full state space of a model, current techniques only provide estimations of the precision metric, which in some situations tend to be very optimistic, thus hiding real problems a process model may have. In this paper we present the notion of antialignment as a concept to help unveiling traces in the model that may deviate significantly from the observed behavior. Using anti-alignments, current estimations can be improved, e.g., in precision checking. We show how to express the problem of finding anti-alignments as the satisfiability of a Boolean formula, and provide a tool which can deal with large models efficiently.
2017-05-03T11:38:28ZChatain, ThomasCarmona Vargas, JosepConformance checking techniques asses the suitability of a process model in representing an underlying process, observed through a collection of real executions. These techniques suffer from the wellknown state space explosion problem, hence handling process models exhibiting large or even infinite state spaces remains a challenge. One important metric in conformance checking is to asses the precision of the model with respect to the observed executions, i.e., characterize the ability of the model to produce behavior unrelated to the one observed. By avoiding the computation of the full state space of a model, current techniques only provide estimations of the precision metric, which in some situations tend to be very optimistic, thus hiding real problems a process model may have. In this paper we present the notion of antialignment as a concept to help unveiling traces in the model that may deviate significantly from the observed behavior. Using anti-alignments, current estimations can be improved, e.g., in precision checking. We show how to express the problem of finding anti-alignments as the satisfiability of a Boolean formula, and provide a tool which can deal with large models efficiently.Trade-offs between time and memory in a tighter model of CDCL SAT solvers
http://hdl.handle.net/2117/103872
Trade-offs between time and memory in a tighter model of CDCL SAT solvers
Elffers, J.; Johannsen, Jan; Lauria, Massimo; Magnard, Thomas; Nordström, Jakob; Vinyals, Marc
A long line of research has studied the power of conflict- driven clause learning (CDCL) and how it compares to the resolution proof system in which it searches for proofs. It has been shown that CDCL can polynomially simulate resolution even with an adversarially chosen learning scheme as long as it is asserting. However, the simulation only works under the assumption that no learned clauses are ever forgot- ten, and the polynomial blow-up is significant. Moreover, the simulation requires very frequent restarts, whereas the power of CDCL with less frequent or entirely without restarts remains poorly understood. With a view towards obtaining results with tighter relations between CDCL and resolution, we introduce a more fine-grained model of CDCL that cap- tures not only time but also memory usage and number of restarts. We show how previously established strong size-space trade-offs for resolution can be transformed into equally strong trade-offs between time and memory usage for CDCL, where the upper bounds hold for CDCL with- out any restarts using the standard 1UIP clause learning scheme, and the (in some cases tightly matching) lower bounds hold for arbitrarily frequent restarts and arbitrary clause learning schemes.
"The final publication is available at http://link.springer.com/chapter/10.1007/978-3-319-40970-2_11
2017-04-28T20:33:22ZElffers, J.Johannsen, JanLauria, MassimoMagnard, ThomasNordström, JakobVinyals, MarcA long line of research has studied the power of conflict- driven clause learning (CDCL) and how it compares to the resolution proof system in which it searches for proofs. It has been shown that CDCL can polynomially simulate resolution even with an adversarially chosen learning scheme as long as it is asserting. However, the simulation only works under the assumption that no learned clauses are ever forgot- ten, and the polynomial blow-up is significant. Moreover, the simulation requires very frequent restarts, whereas the power of CDCL with less frequent or entirely without restarts remains poorly understood. With a view towards obtaining results with tighter relations between CDCL and resolution, we introduce a more fine-grained model of CDCL that cap- tures not only time but also memory usage and number of restarts. We show how previously established strong size-space trade-offs for resolution can be transformed into equally strong trade-offs between time and memory usage for CDCL, where the upper bounds hold for CDCL with- out any restarts using the standard 1UIP clause learning scheme, and the (in some cases tightly matching) lower bounds hold for arbitrarily frequent restarts and arbitrary clause learning schemes.Computing alignments with constraint programming : the acyclic case
http://hdl.handle.net/2117/103063
Computing alignments with constraint programming : the acyclic case
Borrego, Diana; Gómez López, María Teresa; Carmona Vargas, Josep; Martínez Gasca, Rafael
Conformance checking confronts process models with real process executions to detect and measure deviations between modelled and observed behaviour. The core technique for conformance checking is the computation of an alignment. Current approaches for alignment computation rely on a shortest-path technique over the product of the state-space of a model and the observed trace, thus suffering from the well-known state explosion problem. This paper presents a fresh alternative for alignment computation of acyclic process models, that encodes the alignment problem as a Constraint Satisfaction Problem. Since modern solvers for this framework are capable of dealing with large instances, this contribution has a clear potential. Remarkably, our prototype implementation can handle instances that represent a real challenge for current techniques. Main advantages of using Constraint Programming paradigm lie in the possibility to adapt parameters such as the maximum search time, or the maximum misalignment allowed. Moreover, using search and propagation algorithms incorporated in Constraint Programming Solvers permits to find solutions for problems unsolvable with other techniques.
2017-03-30T06:45:05ZBorrego, DianaGómez López, María TeresaCarmona Vargas, JosepMartínez Gasca, RafaelConformance checking confronts process models with real process executions to detect and measure deviations between modelled and observed behaviour. The core technique for conformance checking is the computation of an alignment. Current approaches for alignment computation rely on a shortest-path technique over the product of the state-space of a model and the observed trace, thus suffering from the well-known state explosion problem. This paper presents a fresh alternative for alignment computation of acyclic process models, that encodes the alignment problem as a Constraint Satisfaction Problem. Since modern solvers for this framework are capable of dealing with large instances, this contribution has a clear potential. Remarkably, our prototype implementation can handle instances that represent a real challenge for current techniques. Main advantages of using Constraint Programming paradigm lie in the possibility to adapt parameters such as the maximum search time, or the maximum misalignment allowed. Moreover, using search and propagation algorithms incorporated in Constraint Programming Solvers permits to find solutions for problems unsolvable with other techniques.Fraud detection in energy consumption: a supervised approach
http://hdl.handle.net/2117/101913
Fraud detection in energy consumption: a supervised approach
Coma Puig, Bernat; Carmona Vargas, Josep; Gavaldà Mestre, Ricard; Alcoverro, Santiago; Martín, Victor
Data from utility meters (gas, electricity, water) is a rich source of information for distribution companies, beyond billing. In this paper we present a supervised technique, which primarily but not only feeds on meter information, to detect meter anomalies and customer fraudulent behavior (meter tampering). Our system detects anomalous meter readings on the basis of models built using machine learning techniques on past data. Unlike most previous work, it can incrementally incorporate the result of field checks to grow the database of fraud and non-fraud patterns, therefore increasing model precision over time and potentially adapting to emerging fraud patterns. The full system has been developed with a company providing electricity and gas and already used to carry out several field checks, with large improvements in fraud detection over the previous checks which used simpler techniques.
2017-03-03T12:09:34ZComa Puig, BernatCarmona Vargas, JosepGavaldà Mestre, RicardAlcoverro, SantiagoMartín, VictorData from utility meters (gas, electricity, water) is a rich source of information for distribution companies, beyond billing. In this paper we present a supervised technique, which primarily but not only feeds on meter information, to detect meter anomalies and customer fraudulent behavior (meter tampering). Our system detects anomalous meter readings on the basis of models built using machine learning techniques on past data. Unlike most previous work, it can incrementally incorporate the result of field checks to grow the database of fraud and non-fraud patterns, therefore increasing model precision over time and potentially adapting to emerging fraud patterns. The full system has been developed with a company providing electricity and gas and already used to carry out several field checks, with large improvements in fraud detection over the previous checks which used simpler techniques.Partial match queries in relaxed K-dt trees
http://hdl.handle.net/2117/101520
Partial match queries in relaxed K-dt trees
Duch Brown, Amalia; Lau Laynes-Lozada, Gustavo Salvador
The study of partial match queries on random hierarchical multidimensional data structures dates back to Ph. Flajolet and C. Puech’s 1986 seminal paper on partial match retrieval. It was not until recently that fixed (as opposed to random) partial match queries were studied for random relaxed K-d trees, random standard K-d trees, and random 2-dimensional quad trees. Based on those results it seemed
natural to classify the general form of the cost of fixed partial match queries into two families: that of either random hierarchical structures or perfectly balanced structures, as conjectured by Duch, Lau and Martínez (On the Cost of Fixed Partial Queries in K-d trees Algorithmica, 75(4):684–723, 2016). Here we show that the conjecture just mentioned does not hold by introducing relaxed K-dt trees and providing the average-case analysis for random partial match queries as well as some advances on the average-case analysis for fixed partial match queries on them. In fact this cost –for fixed partial match queries– does not follow the conjectured forms.
2017-02-24T10:27:25ZDuch Brown, AmaliaLau Laynes-Lozada, Gustavo SalvadorThe study of partial match queries on random hierarchical multidimensional data structures dates back to Ph. Flajolet and C. Puech’s 1986 seminal paper on partial match retrieval. It was not until recently that fixed (as opposed to random) partial match queries were studied for random relaxed K-d trees, random standard K-d trees, and random 2-dimensional quad trees. Based on those results it seemed
natural to classify the general form of the cost of fixed partial match queries into two families: that of either random hierarchical structures or perfectly balanced structures, as conjectured by Duch, Lau and Martínez (On the Cost of Fixed Partial Queries in K-d trees Algorithmica, 75(4):684–723, 2016). Here we show that the conjecture just mentioned does not hold by introducing relaxed K-dt trees and providing the average-case analysis for random partial match queries as well as some advances on the average-case analysis for fixed partial match queries on them. In fact this cost –for fixed partial match queries– does not follow the conjectured forms.Mining structured Petri nets for the visualization of process behavior
http://hdl.handle.net/2117/100797
Mining structured Petri nets for the visualization of process behavior
San Pedro Martín, Javier de; Cortadella Fortuny, Jordi
Visualization is essential for understanding the models obtained by process mining. Clear and efficient visual representations make the embedded information more accessible and analyzable. This work presents a novel approach for generating process models with structural properties that induce visually friendly layouts. Rather than generating a single model that captures all behaviors, a set of Petri net models is delivered, each one covering a subset of traces of the log. The models are mined by extracting slices of labelled transition systems with specific properties from the complete state space produced by the process logs. In most cases, few Petri nets are sufficient to cover a significant part of the behavior produced by the log.
2017-02-10T08:35:24ZSan Pedro Martín, Javier deCortadella Fortuny, JordiVisualization is essential for understanding the models obtained by process mining. Clear and efficient visual representations make the embedded information more accessible and analyzable. This work presents a novel approach for generating process models with structural properties that induce visually friendly layouts. Rather than generating a single model that captures all behaviors, a set of Petri net models is delivered, each one covering a subset of traces of the log. The models are mined by extracting slices of labelled transition systems with specific properties from the complete state space produced by the process logs. In most cases, few Petri nets are sufficient to cover a significant part of the behavior produced by the log.Comparing MapReduce and pipeline implementations for counting triangles
http://hdl.handle.net/2117/100579
Comparing MapReduce and pipeline implementations for counting triangles
Pasarella Sánchez, Ana Edelmira; Vidal Serodio, Maria Esther; Zoltan, Cristina
A generalized method to define the Divide & Conquer paradigm in order to have processors acting on its own data and scheduled in a
parallel fashion. MapReduce is a programming model that follows this paradigm, and allows for the definition of efficient solutions by both decomposing a problem into steps on subsets of the input data
and combining the results of each step to produce final results. Albeit used for the implementation of a wide variety of computational problems, MapReduce performance can be negatively affected
whenever the replication factor grows or the size of the input is larger than the resources available at each processor. In this paper we show an alternative approach to implement the Divide & Conquer
paradigm, named pipeline. The main features of pipeline are illustrated on a parallel implementation of the well-known problem of counting triangles in a graph. This problem is especially interesting either when the input graph does not fit in memory or is dynamically generated. To evaluate the properties of pipeline, a dynamic pipeline of processes and an ad-hoc version of MapReduce are implemented in the language Go, exploiting its ability to deal with channels and spawned processes.
An empirical evaluation is conducted on graphs of different sizes and densities. Observed results suggest that pipeline allows for the implementation of an efficient solution of the problem of counting
triangles in a graph, particularly, in dense and large graphs, drastically reducing the execution time with respect to the MapReduce implementation.
2017-02-06T08:48:11ZPasarella Sánchez, Ana EdelmiraVidal Serodio, Maria EstherZoltan, CristinaA generalized method to define the Divide & Conquer paradigm in order to have processors acting on its own data and scheduled in a
parallel fashion. MapReduce is a programming model that follows this paradigm, and allows for the definition of efficient solutions by both decomposing a problem into steps on subsets of the input data
and combining the results of each step to produce final results. Albeit used for the implementation of a wide variety of computational problems, MapReduce performance can be negatively affected
whenever the replication factor grows or the size of the input is larger than the resources available at each processor. In this paper we show an alternative approach to implement the Divide & Conquer
paradigm, named pipeline. The main features of pipeline are illustrated on a parallel implementation of the well-known problem of counting triangles in a graph. This problem is especially interesting either when the input graph does not fit in memory or is dynamically generated. To evaluate the properties of pipeline, a dynamic pipeline of processes and an ad-hoc version of MapReduce are implemented in the language Go, exploiting its ability to deal with channels and spawned processes.
An empirical evaluation is conducted on graphs of different sizes and densities. Observed results suggest that pipeline allows for the implementation of an efficient solution of the problem of counting
triangles in a graph, particularly, in dense and large graphs, drastically reducing the execution time with respect to the MapReduce implementation.A semantics of business configurations using symbolic graphs
http://hdl.handle.net/2117/100385
A semantics of business configurations using symbolic graphs
Mylonakis Pascual, Nicolás; Orejas Valdés, Fernando; Fiadeiro, José Luiz
In this paper we give graph-semantics to a fundamental part of the semantics of the service modeling language SRML: business configurations. To achieve this goal we use symbolic graph transformation systems. We formalize the semantics using this graph transformation system and illustrating it with a simple running example of a trip booking agent.
2017-01-31T13:43:47ZMylonakis Pascual, NicolásOrejas Valdés, FernandoFiadeiro, José LuizIn this paper we give graph-semantics to a fundamental part of the semantics of the service modeling language SRML: business configurations. To achieve this goal we use symbolic graph transformation systems. We formalize the semantics using this graph transformation system and illustrating it with a simple running example of a trip booking agent.Conditions for compatibility of components: The case of masters and slaves
http://hdl.handle.net/2117/100199
Conditions for compatibility of components: The case of masters and slaves
Beek, Maurice ter; Carmona Vargas, Josep; Kleijn, Jetty
We consider systems composed of reactive components that collaborate through synchronised execution of common actions. These multi-component systems are formally represented as team automata, a model that allows a wide spectrum of synchronisation policies to combine components into higher-level systems. We investigate the
correct-by-construction engineering of such systems of systems from the point of view of correct communications between the components (no message loss or deadlocks due to indefinite waiting). This leads to a proposal for a generic definition of compatibility of components relative to the adopted synchronisation policy. This definition appears
to be particularly appropriate for so-called master-slave synchronisations by which input actions (for `slaves') are driven
by output actions (from `masters').
2017-01-27T12:09:50ZBeek, Maurice terCarmona Vargas, JosepKleijn, JettyWe consider systems composed of reactive components that collaborate through synchronised execution of common actions. These multi-component systems are formally represented as team automata, a model that allows a wide spectrum of synchronisation policies to combine components into higher-level systems. We investigate the
correct-by-construction engineering of such systems of systems from the point of view of correct communications between the components (no message loss or deadlocks due to indefinite waiting). This leads to a proposal for a generic definition of compatibility of components relative to the adopted synchronisation policy. This definition appears
to be particularly appropriate for so-called master-slave synchronisations by which input actions (for `slaves') are driven
by output actions (from `masters').Non-homogenizable classes of finite structures
http://hdl.handle.net/2117/100189
Non-homogenizable classes of finite structures
Atserias, Albert; Torunczyk, Szymon Abram
Homogenization is a powerful way of taming a class of finite structures with several interesting applications in different areas, from Ramsey theory in combinatorics to constraint satisfaction problems (CSPs) in computer science, through (finite) model theory. A few sufficient conditions for a class of finite structures to allow homogenization are known, and here we provide a necessary condition. This lets us show that certain natural classes are not homogenizable: 1) the class of locally consistent systems of linear equations over the two-element field or any finite Abelian group, and 2) the class of finite structures that forbid homomorphisms from a specific MSO-definable class of structures of treewidth two. In combination with known results, the first example shows that, up to pp-interpretability, the CSPs that are solvable by local consistency methods are distinguished from the rest by the fact that their classes of locally consistent instances are homogenizable. The second example shows that, for MSO-definable classes of forbidden patterns, treewidth one versus two is the dividing line to homogenizability.
2017-01-27T10:51:13ZAtserias, AlbertTorunczyk, Szymon AbramHomogenization is a powerful way of taming a class of finite structures with several interesting applications in different areas, from Ramsey theory in combinatorics to constraint satisfaction problems (CSPs) in computer science, through (finite) model theory. A few sufficient conditions for a class of finite structures to allow homogenization are known, and here we provide a necessary condition. This lets us show that certain natural classes are not homogenizable: 1) the class of locally consistent systems of linear equations over the two-element field or any finite Abelian group, and 2) the class of finite structures that forbid homomorphisms from a specific MSO-definable class of structures of treewidth two. In combination with known results, the first example shows that, up to pp-interpretability, the CSPs that are solvable by local consistency methods are distinguished from the rest by the fact that their classes of locally consistent instances are homogenizable. The second example shows that, for MSO-definable classes of forbidden patterns, treewidth one versus two is the dividing line to homogenizability.