Capítols de llibrehttp://hdl.handle.net/2117/30972024-03-28T16:42:18Z2024-03-28T16:42:18ZLotterySampling: A randomized algorithm for the Heavy Hitters and Top-k problems in data streamsMartínez Parra, ConradoSolera Pardo, Gonzalohttp://hdl.handle.net/2117/3860012024-01-01T01:32:16Z2023-04-06T07:05:00ZLotterySampling: A randomized algorithm for the Heavy Hitters and Top-k problems in data streams
Martínez Parra, Conrado; Solera Pardo, Gonzalo
We propose a new randomized count-based algorithm to solve the Heavy Hitters and Top-k problems in data streams. This algorithm, called LotterySampling, uses the intuitive concept of “lottery tickets”, to decide which elements to sample. We prove that LotterySampling is inside the (δ,ϵ)-deficient framework for the Heavy Hitters problem and that it has a similar performance to the well known StickySampling algorithm, although they are very different in nature. More importantly, we define a similar (δ,ϵ)-deficient framework for the harder Top-k problem and we prove that LotterySampling is inside it. Hence, LotterySampling can be used, without any previous assumption of the data distribution, as a probabilistic approximation scheme to find the k most frequent elements and to approximate the frequencies of the reported elements by a factor of 1−ϵ . To the best of our knowledge, this is the first algorithm that gives theoretical guarantees for the Top-k problem for unknown and arbitrary streams, which is the most important contribution of this paper. Its memory usage is adaptive to the distribution of the stream, and it will increase or decrease depending on whether the stream becomes less or more skewed. More precisely, the sample size depends, at any given moment, on the unknown kth highest frequency but it is independent of the length of the stream and of the number of distinct elements. The user just needs to provide two parameters that determine the quality of the answers, independently of the stream. We compare LotterySampling with other existing probabilistic and deterministic algorithms showing its strengths and weaknesses.
2023-04-06T07:05:00ZMartínez Parra, ConradoSolera Pardo, GonzaloWe propose a new randomized count-based algorithm to solve the Heavy Hitters and Top-k problems in data streams. This algorithm, called LotterySampling, uses the intuitive concept of “lottery tickets”, to decide which elements to sample. We prove that LotterySampling is inside the (δ,ϵ)-deficient framework for the Heavy Hitters problem and that it has a similar performance to the well known StickySampling algorithm, although they are very different in nature. More importantly, we define a similar (δ,ϵ)-deficient framework for the harder Top-k problem and we prove that LotterySampling is inside it. Hence, LotterySampling can be used, without any previous assumption of the data distribution, as a probabilistic approximation scheme to find the k most frequent elements and to approximate the frequencies of the reported elements by a factor of 1−ϵ . To the best of our knowledge, this is the first algorithm that gives theoretical guarantees for the Top-k problem for unknown and arbitrary streams, which is the most important contribution of this paper. Its memory usage is adaptive to the distribution of the stream, and it will increase or decrease depending on whether the stream becomes less or more skewed. More precisely, the sample size depends, at any given moment, on the unknown kth highest frequency but it is independent of the length of the stream and of the number of distinct elements. The user just needs to provide two parameters that determine the quality of the answers, independently of the stream. We compare LotterySampling with other existing probabilistic and deterministic algorithms showing its strengths and weaknesses.A probabilistic model revealing shortcomings in Lua’s hybrid tablesMartínez Parra, ConradoNicaud, CyrilRotondo, Pablohttp://hdl.handle.net/2117/3859992024-01-01T01:27:44Z2023-04-06T06:42:09ZA probabilistic model revealing shortcomings in Lua’s hybrid tables
Martínez Parra, Conrado; Nicaud, Cyril; Rotondo, Pablo
Lua (Ierusalimschy et al., 1996) is a well-known scripting language, popular among many programmers, most notably in the gaming industry. Remarkably, the only data-structuring mechanism in Lua, is an associative array called a table. With Lua 5.0, the reference implementation of Lua introduced hybrid tables to implement tables using both a hash table and a dynamically growing array combined together: the values associated with integer keys are stored in the array part, when suitable. All this is transparent to the user, which has a unique simple interface to handle tables. In this paper we carry out a theoretical analysis of the performance of Lua ’s tables, by considering various worst-case and probabilistic scenarios. In particular, we uncover some problematic situations for the simple probabilistic model where we add a new key with some fixed probability p>1/2 and delete a key with probability 1−p: the cost of performing T such operations is proved to be Ω(TlogT) with high probability, instead of linear in T.
2023-04-06T06:42:09ZMartínez Parra, ConradoNicaud, CyrilRotondo, PabloLua (Ierusalimschy et al., 1996) is a well-known scripting language, popular among many programmers, most notably in the gaming industry. Remarkably, the only data-structuring mechanism in Lua, is an associative array called a table. With Lua 5.0, the reference implementation of Lua introduced hybrid tables to implement tables using both a hash table and a dynamically growing array combined together: the values associated with integer keys are stored in the array part, when suitable. All this is transparent to the user, which has a unique simple interface to handle tables. In this paper we carry out a theoretical analysis of the performance of Lua ’s tables, by considering various worst-case and probabilistic scenarios. In particular, we uncover some problematic situations for the simple probabilistic model where we add a new key with some fixed probability p>1/2 and delete a key with probability 1−p: the cost of performing T such operations is proved to be Ω(TlogT) with high probability, instead of linear in T.All power structures are achievable in basic weighted gamesFreixas Bosch, JosepPons Vallès, Montserrathttp://hdl.handle.net/2117/3859612023-04-05T08:51:18Z2023-04-05T08:48:12ZAll power structures are achievable in basic weighted games
Freixas Bosch, Josep; Pons Vallès, Montserrat
A major problem in decision making is designing voting systems that are as simple as possible and able to reflect a given hierarchy of power of its members. It is known that in the class of weighted games all hierarchies are achievable except two of them. However, many weighted games are either improper, or do not admit a minimum representation in integers, or do not assign a minimum weight of 1 to the weakest non-null players. These factors prevent obtaining a good representation. The purpose of the paper is to prove that for each achievable hierarchy for weighted games there is a handy weighted game fulfilling these three desirable properties. A representation of this type is ideal for the design of a weighted game with a given hierarchy. Moreover, the subclass of weighted games with these properties is considerably smaller than the class of weighted games.
2023-04-05T08:48:12ZFreixas Bosch, JosepPons Vallès, MontserratA major problem in decision making is designing voting systems that are as simple as possible and able to reflect a given hierarchy of power of its members. It is known that in the class of weighted games all hierarchies are achievable except two of them. However, many weighted games are either improper, or do not admit a minimum representation in integers, or do not assign a minimum weight of 1 to the weakest non-null players. These factors prevent obtaining a good representation. The purpose of the paper is to prove that for each achievable hierarchy for weighted games there is a handy weighted game fulfilling these three desirable properties. A representation of this type is ideal for the design of a weighted game with a given hierarchy. Moreover, the subclass of weighted games with these properties is considerably smaller than the class of weighted games.The decline of the Buchholz tiebreaker system: a preferable alternativeFreixas Bosch, Josephttp://hdl.handle.net/2117/3821192023-02-03T16:42:33Z2023-02-03T16:38:11ZThe decline of the Buchholz tiebreaker system: a preferable alternative
Freixas Bosch, Josep
We propose a simple method of undoing tiebreaks in sport competitions with a large number of competitors and relatively small number of rounds of competition. Such methods are common in many games including Chess, Go, Bridge or Scrabble, among others. Tie-breaking methods decide in strict order the prizes to be received. One of the most commonly used methods is the well-known Buchholz method, based on the arithmetic mean of the scores obtained by the opponents. The alternative method that we propose in this paper, which is quite close to the median of the scores obtained by the opponents, is also a weighted average of the opponents’ scores, whose weights are based on the binomial distribution. The main objective of the article is to compare the proposed method with that of Buchholz, highlighting the many advantages over it.
Unfortunately, even today Buchholz’s method and its variants are routinely used as the first and second tiebreaker criteria. It is the used as a first and second criteria in the rapid and blitz chess world championships that took place in December 2021. We believe that Buchholz’s method should be replaced by the one proposed here as soon as possible.
2023-02-03T16:38:11ZFreixas Bosch, JosepWe propose a simple method of undoing tiebreaks in sport competitions with a large number of competitors and relatively small number of rounds of competition. Such methods are common in many games including Chess, Go, Bridge or Scrabble, among others. Tie-breaking methods decide in strict order the prizes to be received. One of the most commonly used methods is the well-known Buchholz method, based on the arithmetic mean of the scores obtained by the opponents. The alternative method that we propose in this paper, which is quite close to the median of the scores obtained by the opponents, is also a weighted average of the opponents’ scores, whose weights are based on the binomial distribution. The main objective of the article is to compare the proposed method with that of Buchholz, highlighting the many advantages over it.
Unfortunately, even today Buchholz’s method and its variants are routinely used as the first and second tiebreaker criteria. It is the used as a first and second criteria in the rapid and blitz chess world championships that took place in December 2021. We believe that Buchholz’s method should be replaced by the one proposed here as soon as possible.Decomposed process discovery and conformance checkingCarmona Vargas, Josephttp://hdl.handle.net/2117/1788752024-02-18T08:13:55Z2020-02-28T11:16:44ZDecomposed process discovery and conformance checking
Carmona Vargas, Josep
Decomposed process discovery and decomposed conformance checking are the corresponding variants of the two monolithic fundamental problems in process mining (van der Aalst 2011): automated process discovery, which considers the problem of discovering a process model from an event log (Leemans 2009), and conformance checking, which addresses the problem of analyzing the adequacy of a process model with respect to observed behavior (Munoz-Gama 2009), respectively.
The term decomposed in the two definitions is mainly describing the way the two problems are tackled operationally, to face their computational complexity by splitting the initial problem into smaller problems, that can be solved individually and often more efficiently.
2020-02-28T11:16:44ZCarmona Vargas, JosepDecomposed process discovery and decomposed conformance checking are the corresponding variants of the two monolithic fundamental problems in process mining (van der Aalst 2011): automated process discovery, which considers the problem of discovering a process model from an event log (Leemans 2009), and conformance checking, which addresses the problem of analyzing the adequacy of a process model with respect to observed behavior (Munoz-Gama 2009), respectively.
The term decomposed in the two definitions is mainly describing the way the two problems are tackled operationally, to face their computational complexity by splitting the initial problem into smaller problems, that can be solved individually and often more efficiently.Petri net analysis using boolean manipulationPastor Llorens, EnricRoig Mansilla, OriolCortadella, JordiBadia Sala, Rosa Mariahttp://hdl.handle.net/2117/1343092020-07-23T21:26:03Z2019-06-12T09:58:35ZPetri net analysis using boolean manipulation
Pastor Llorens, Enric; Roig Mansilla, Oriol; Cortadella, Jordi; Badia Sala, Rosa Maria
This paper presents a novel analysis approach for bounded Petri nets. The net behavior is modeled by boolean functions, thus reducing reasoning about Petri nets to boolean calculation. The state explosion problem is managed by using Binary Decision Diagrams (BDDs), which are capable to represent large sets of markings in small data structures. The ability of Petri nets to model systems, the flexibility and generality of boolean algebras, and the efficient implementation of BDDs, provide a general environment to handle a large variety of problems. Examples are presented that show how all the reachable states (1018) of a Petri net can be efficiently calculated and represented with a small BDD (103 nodes). Properties requiring an exhaustive analysis of the state space can be verified in polynomial time in the size of the BDD.
2019-06-12T09:58:35ZPastor Llorens, EnricRoig Mansilla, OriolCortadella, JordiBadia Sala, Rosa MariaThis paper presents a novel analysis approach for bounded Petri nets. The net behavior is modeled by boolean functions, thus reducing reasoning about Petri nets to boolean calculation. The state explosion problem is managed by using Binary Decision Diagrams (BDDs), which are capable to represent large sets of markings in small data structures. The ability of Petri nets to model systems, the flexibility and generality of boolean algebras, and the efficient implementation of BDDs, provide a general environment to handle a large variety of problems. Examples are presented that show how all the reachable states (1018) of a Petri net can be efficiently calculated and represented with a small BDD (103 nodes). Properties requiring an exhaustive analysis of the state space can be verified in polynomial time in the size of the BDD.A compositional method for the synthesis of asynchronous communication mechanismsCosta Gorgônio, KyllerCortadella, JordiXia, Feihttp://hdl.handle.net/2117/1334322020-07-23T23:17:24Z2019-05-24T09:28:17ZA compositional method for the synthesis of asynchronous communication mechanisms
Costa Gorgônio, Kyller; Cortadella, Jordi; Xia, Fei
Asynchronous data communication mechanisms (ACMs) have been extensively studied as data connectors between independently timed concurrent processes. In previous work, an automatic ACM synthesis method based on the generation of the reachability graph and the theory of regions was proposed. In this paper, we propose a new synthesis method based on the composition of Petri net modules, avoiding the exploration of the reachability graph. The behavior of ACMs is formally defined and correctness properties are specified in CTL. Model checking is used to verify the correctness of the Petri net models. The algorithms to generate the Petri net models are presented. Finally, a method to automatically generate C++ source code from the Petri net model is described.
2019-05-24T09:28:17ZCosta Gorgônio, KyllerCortadella, JordiXia, FeiAsynchronous data communication mechanisms (ACMs) have been extensively studied as data connectors between independently timed concurrent processes. In previous work, an automatic ACM synthesis method based on the generation of the reachability graph and the theory of regions was proposed. In this paper, we propose a new synthesis method based on the composition of Petri net modules, avoiding the exploration of the reachability graph. The behavior of ACMs is formally defined and correctness properties are specified in CTL. Model checking is used to verify the correctness of the Petri net models. The algorithms to generate the Petri net models are presented. Finally, a method to automatically generate C++ source code from the Petri net model is described.Nature inspired meta-heuristics for grid scheduling: single and multi-objective optimization approachesAbraham, AjithLiu, HongboGrosan, CrinaXhafa Xhafa, Fatoshttp://hdl.handle.net/2117/1289512020-07-23T20:14:19Z2019-02-12T12:33:02ZNature inspired meta-heuristics for grid scheduling: single and multi-objective optimization approaches
Abraham, Ajith; Liu, Hongbo; Grosan, Crina; Xhafa Xhafa, Fatos
In this chapter, we review a few important concepts from Grid computing related to scheduling problems and their resolution using heuristic and meta-heuristic approaches. Scheduling problems are at the heart of any Grid-like computational system. Different types of scheduling based on different criteria, such as static vs. dynamic environment, multi-objectivity, adaptivity, etc., are identified. Then, heuristics and meta-heuristics methods for scheduling in Grids are presented. The chapter reveals the complexity of the scheduling problem in Computational Grids when compared to scheduling in classical parallel and distributed systems and shows the usefulness of heuristics and meta-heuristics approaches for the design of efficient Grid schedulers.
2019-02-12T12:33:02ZAbraham, AjithLiu, HongboGrosan, CrinaXhafa Xhafa, FatosIn this chapter, we review a few important concepts from Grid computing related to scheduling problems and their resolution using heuristic and meta-heuristic approaches. Scheduling problems are at the heart of any Grid-like computational system. Different types of scheduling based on different criteria, such as static vs. dynamic environment, multi-objectivity, adaptivity, etc., are identified. Then, heuristics and meta-heuristics methods for scheduling in Grids are presented. The chapter reveals the complexity of the scheduling problem in Computational Grids when compared to scheduling in classical parallel and distributed systems and shows the usefulness of heuristics and meta-heuristics approaches for the design of efficient Grid schedulers.Parallel algorithms for two processors precedence constraint schedulingSerna Iglesias, María Joséhttp://hdl.handle.net/2117/1049352020-07-23T21:20:06Z2017-05-26T14:27:35ZParallel algorithms for two processors precedence constraint scheduling
Serna Iglesias, María José
The final publication is available at link.springer.com
2017-05-26T14:27:35ZSerna Iglesias, María JoséRandomized parallel approximations to max flowSerna Iglesias, María Joséhttp://hdl.handle.net/2117/1049342020-07-23T21:04:15Z2017-05-26T14:21:33ZRandomized parallel approximations to max flow
Serna Iglesias, María José
The final publication is available at link.springer.com
2017-05-26T14:21:33ZSerna Iglesias, María José