ALBCOM - Algorísmia, Bioinformàtica, Complexitat i Mètodes Formals

ALBCOM - Algorísmia, Bioinformàtica, Complexitat i Mètodes Formals http://hdl.handle.net/2117/3092 Tue, 23 Apr 2024 18:45:39 GMT 2024-04-23T18:45:39Z On the expected cost of partial match queries in random Quad-K-d trees http://hdl.handle.net/2117/405893 On the expected cost of partial match queries in random Quad-K-d trees Duch Brown, Amalia; Martínez Parra, Conrado Quad-K-d trees introduced by Bereckzy et al. (In: Proceedings of the 11th Latin merican Theoretical Informatics Conference (LATIN). Lecture Notes in Computer Science, vol. 8392, pp. 743–754, 2014) are a generalization of several well-known hierarchical multidimensional data structures. They provide a unified framework for the analysis of associative queries, and they are specially suitable to investigate the trade-offs between the cost of different operations and thememory needs (each node x of a quad-K-d tree has arity 2m(x) for somem(x), 1 ≤ m(x) ≤ K). Indeed, we consider here partial match—one of the fundamental associative queries for several families of quad-K-d trees including, among others, relaxed K-d trees and quadtrees. In particular, we prove that the expected cost ˆPn of a random partial match query that has s out of K specified coordinates in a random quad-K-d tree of size n is ˆPn ∼ β · nα, where α and β are constants given in terms of K and s as well as additional parameters that characterize the specific family of quad-K-d trees under consideration. Additionally, we derive a precise asymptotic estimate for the main order term of the expected cost Pn,q of a fixed partial match with query q in a random quad-K-d tree of size n. The techniques used to derive the mentioned costs are those already applied successfully to derive analogous results in quadtrees and relaxed K-d trees; our results show that the previous results are just particular cases and prove the validity of the conjecture made in Duch et al. (In: Proceedings of the 12th Latin American Theoretical Informatics Conference (LATIN). Lecture Notes in Computer Science, vol. 9644, pp. 376–389, 2016) for a wider variety of multidimensional data structures. Thu, 04 Apr 2024 11:42:58 GMT http://hdl.handle.net/2117/405893 2024-04-04T11:42:58Z Duch Brown, Amalia Martínez Parra, Conrado Quad-K-d trees introduced by Bereckzy et al. (In: Proceedings of the 11th Latin merican Theoretical Informatics Conference (LATIN). Lecture Notes in Computer Science, vol. 8392, pp. 743–754, 2014) are a generalization of several well-known hierarchical multidimensional data structures. They provide a unified framework for the analysis of associative queries, and they are specially suitable to investigate the trade-offs between the cost of different operations and thememory needs (each node x of a quad-K-d tree has arity 2m(x) for somem(x), 1 ≤ m(x) ≤ K). Indeed, we consider here partial match—one of the fundamental associative queries for several families of quad-K-d trees including, among others, relaxed K-d trees and quadtrees. In particular, we prove that the expected cost ˆPn of a random partial match query that has s out of K specified coordinates in a random quad-K-d tree of size n is ˆPn ∼ β · nα, where α and β are constants given in terms of K and s as well as additional parameters that characterize the specific family of quad-K-d trees under consideration. Additionally, we derive a precise asymptotic estimate for the main order term of the expected cost Pn,q of a fixed partial match with query q in a random quad-K-d tree of size n. The techniques used to derive the mentioned costs are those already applied successfully to derive analogous results in quadtrees and relaxed K-d trees; our results show that the previous results are just particular cases and prove the validity of the conjecture made in Duch et al. (In: Proceedings of the 12th Latin American Theoretical Informatics Conference (LATIN). Lecture Notes in Computer Science, vol. 9644, pp. 376–389, 2016) for a wider variety of multidimensional data structures. Social disruption games in signed networks http://hdl.handle.net/2117/405654 Social disruption games in signed networks Molinero Albareda, Xavier; Riquelme Csori, Fabián; Serna Iglesias, María José Signed networks describe many real-world relations among users. Positive connections between two users or vertices generally mean good feelings between them, but negative connections mean bad feelings. A disruptor cycle in a graph is a cycle containing only one negative edge. A signed graph is known to be clusterable if and only if it does not contain a disruptor cycle. In this paper, we study the clusterability of a signed graph from the point of view of game theory introducing social disruption games on signed graphs. In these games, a coalition wins if the subgraph induced by the coalition is non-clusterable, i.e., it contains a disruptor cycle. Moreover, we study parameters and properties of players and compare them to other subclasses of simple games. In addition, we give some complexity results. In particular, we show that, unlike other subclasses of simple games, given a social disruption game, computing its length, deciding whether it is proper, or deciding whether it has a dummy player can be done in polynomial time. However, other problems, such as deciding whether the game is strong, or computing known power indices, remain computationally hard. Tue, 02 Apr 2024 11:37:20 GMT http://hdl.handle.net/2117/405654 2024-04-02T11:37:20Z Molinero Albareda, Xavier Riquelme Csori, Fabián Serna Iglesias, María José Signed networks describe many real-world relations among users. Positive connections between two users or vertices generally mean good feelings between them, but negative connections mean bad feelings. A disruptor cycle in a graph is a cycle containing only one negative edge. A signed graph is known to be clusterable if and only if it does not contain a disruptor cycle. In this paper, we study the clusterability of a signed graph from the point of view of game theory introducing social disruption games on signed graphs. In these games, a coalition wins if the subgraph induced by the coalition is non-clusterable, i.e., it contains a disruptor cycle. Moreover, we study parameters and properties of players and compare them to other subclasses of simple games. In addition, we give some complexity results. In particular, we show that, unlike other subclasses of simple games, given a social disruption game, computing its length, deciding whether it is proper, or deciding whether it has a dummy player can be done in polynomial time. However, other problems, such as deciding whether the game is strong, or computing known power indices, remain computationally hard. The k-Robinson–Foulds dissimilarity measures for comparison of labeled treesd http://hdl.handle.net/2117/405500 The k-Robinson–Foulds dissimilarity measures for comparison of labeled treesd Khayatian, Elahe; Valiente Feruglio, Gabriel Alejandro; Zhang, Louxin Understanding the mutational history of tumor cells is a critical endeavor in unraveling the mechanisms that drive the onset and progression of cancer. Modeling tumor cell evolution with labeled trees motivates researchers to develop different measures to compare labeled trees. Although the Robinson–Foulds (RF) distance is widely used for comparing species trees, its applicability to labeled trees reveals certain limitations. This study introduces the k-RF dissimilarity measures, tailored to address the challenges of labeled tree comparison. The RF distance is succinctly expressed as n-RF in the space of labeled trees with n nodes. Like the RF distance, the k-RF is a pseudometric for multiset-labeled trees and becomes a metric in the space of 1-labeled trees. By setting k to a small value, the k-RF dissimilarity can capture analogous local regions in two labeled trees with different size or different labels. Thu, 28 Mar 2024 10:55:53 GMT http://hdl.handle.net/2117/405500 2024-03-28T10:55:53Z Khayatian, Elahe Valiente Feruglio, Gabriel Alejandro Zhang, Louxin Understanding the mutational history of tumor cells is a critical endeavor in unraveling the mechanisms that drive the onset and progression of cancer. Modeling tumor cell evolution with labeled trees motivates researchers to develop different measures to compare labeled trees. Although the Robinson–Foulds (RF) distance is widely used for comparing species trees, its applicability to labeled trees reveals certain limitations. This study introduces the k-RF dissimilarity measures, tailored to address the challenges of labeled tree comparison. The RF distance is succinctly expressed as n-RF in the space of labeled trees with n nodes. Like the RF distance, the k-RF is a pseudometric for multiset-labeled trees and becomes a metric in the space of 1-labeled trees. By setting k to a small value, the k-RF dissimilarity can capture analogous local regions in two labeled trees with different size or different labels. Polynomial calculus for MaxSAT http://hdl.handle.net/2117/405497 Polynomial calculus for MaxSAT Bonacina, Ilario; Bonet Carbonell, M. Luisa; Levy Díaz, Jordi MaxSAT is the problem of finding an assignment satisfying the maximum number of clauses in a CNF formula. We consider a natural generalization of this problem to generic sets of polynomials and propose a weighted version of Polynomial Calculus to address this problem. Weighted Polynomial Calculus is a natural generalization of MaxSAT-Resolution and weighted Resolution that manipulates polynomials with coefficients in a finite field and either weights in N or Z. We show the soundness and completeness of these systems via an algorithmic procedure. Weighted Polynomial Calculus, with weights in N and coefficients in F2, is able to prove efficiently that Tseitin formulas on a connected graph are minimally unsatisfiable. Using weights in Z, it also proves efficiently that the Pigeonhole Principle is minimally unsatisfiable. Thu, 28 Mar 2024 10:32:40 GMT http://hdl.handle.net/2117/405497 2024-03-28T10:32:40Z Bonacina, Ilario Bonet Carbonell, M. Luisa Levy Díaz, Jordi MaxSAT is the problem of finding an assignment satisfying the maximum number of clauses in a CNF formula. We consider a natural generalization of this problem to generic sets of polynomials and propose a weighted version of Polynomial Calculus to address this problem. Weighted Polynomial Calculus is a natural generalization of MaxSAT-Resolution and weighted Resolution that manipulates polynomials with coefficients in a finite field and either weights in N or Z. We show the soundness and completeness of these systems via an algorithmic procedure. Weighted Polynomial Calculus, with weights in N and coefficients in F2, is able to prove efficiently that Tseitin formulas on a connected graph are minimally unsatisfiable. Using weights in Z, it also proves efficiently that the Pigeonhole Principle is minimally unsatisfiable. GMX: Instruction set extensions for fast, scalable, and efficient genome sequence alignment http://hdl.handle.net/2117/405488 GMX: Instruction set extensions for fast, scalable, and efficient genome sequence alignment Doblas Font, Max; Lostes Cazorla, Oscar; Aguado Puig, Quim; Cebry, Nicholas; Fontova Muste, Pau; Batten, Christopher; Marco Sola, Santiago; Moretó Planas, Miquel Sequence alignment remains a fundamental problem in computer science with practical applications ranging from pattern matching to computational biology. The ever-increasing volumes of genomic data produced by modern DNA sequencers motivate improved software and hardware sequence alignment accelerators that scale with longer sequence lengths and high error rates without losing accuracy. Furthermore, the wide variety of use cases requiring sequence alignment demands flexible and efficient solutions that can match or even outperform expensive application-specific accelerators. To address these challenges, we propose GMX, a set of ISA extensions that enable efficient sequence alignment computations based on dynamic programming (DP). GMX extensions provide the basic building-block operations to perform fast tile-wise computations of the DP matrix, reducing the memory footprint and allowing easy integration into widely-used algorithms and tools. Furthermore, we provide an efficient hardware implementation that integrates GMX extensions in a RISC-V-based edge system-on-chip (SoC). Compared to widely-used software implementations, our hardware-software co-design leveraging GMX extensions obtains speed-ups from 25–265 ×, scaling to megabyte-long sequences. Compared to domain-specific accelerators (DSA), we demonstrate that GMX-accelerated implementations demand significantly less memory bandwidth, requiring less area per processing element (PE). As a result, a single GMX-enabled core achieves a throughput per area between 0.35-0.52 × that of state-of-the-art DSAs while being more flexible and reusing the core’s resources. Post-place-and-route results for a GMX-enhanced SoC in 22nm technology shows that GMX extensions only account for 1.7% of the overall area while consuming just 8.47mW. We conclude that GMX extensions represent versatile and scalable ISA additions to improve the performance of genome analysis tools and other use cases that require fast and efficient sequence alignment. Thu, 28 Mar 2024 07:34:55 GMT http://hdl.handle.net/2117/405488 2024-03-28T07:34:55Z Doblas Font, Max Lostes Cazorla, Oscar Aguado Puig, Quim Cebry, Nicholas Fontova Muste, Pau Batten, Christopher Marco Sola, Santiago Moretó Planas, Miquel Sequence alignment remains a fundamental problem in computer science with practical applications ranging from pattern matching to computational biology. The ever-increasing volumes of genomic data produced by modern DNA sequencers motivate improved software and hardware sequence alignment accelerators that scale with longer sequence lengths and high error rates without losing accuracy. Furthermore, the wide variety of use cases requiring sequence alignment demands flexible and efficient solutions that can match or even outperform expensive application-specific accelerators. To address these challenges, we propose GMX, a set of ISA extensions that enable efficient sequence alignment computations based on dynamic programming (DP). GMX extensions provide the basic building-block operations to perform fast tile-wise computations of the DP matrix, reducing the memory footprint and allowing easy integration into widely-used algorithms and tools. Furthermore, we provide an efficient hardware implementation that integrates GMX extensions in a RISC-V-based edge system-on-chip (SoC). Compared to widely-used software implementations, our hardware-software co-design leveraging GMX extensions obtains speed-ups from 25–265 ×, scaling to megabyte-long sequences. Compared to domain-specific accelerators (DSA), we demonstrate that GMX-accelerated implementations demand significantly less memory bandwidth, requiring less area per processing element (PE). As a result, a single GMX-enabled core achieves a throughput per area between 0.35-0.52 × that of state-of-the-art DSAs while being more flexible and reusing the core’s resources. Post-place-and-route results for a GMX-enhanced SoC in 22nm technology shows that GMX extensions only account for 1.7% of the overall area while consuming just 8.47mW. We conclude that GMX extensions represent versatile and scalable ISA additions to improve the performance of genome analysis tools and other use cases that require fast and efficient sequence alignment. Weighted, circular and semi-algebraic proofs http://hdl.handle.net/2117/405462 Weighted, circular and semi-algebraic proofs Bonacina, Ilario; Bonet Carbonell, M. Luisa; Levy Díaz, Jordi In recent years there has been an increasing interest in studying proof systems stronger than Resolution, with the aim of building more efficient SAT solvers based on them. In defining these proof systems, we try to find a balance between the power of the proof system (the size of the proofs required to refute a formula) and the difficulty of finding the proofs. In this paper we consider the proof systems circular Resolution, Sherali-Adams, Nullstellensatz and Weighted Resolution and we study their relative power from a theoretical perspective. We prove that circular Resolution, Sherali-Adams and Weighted Resolution are polynomially equivalent proof systems. We also prove that Nullstellensatz is polynomially equivalent to a restricted version of Weighted Resolution. The equivalences carry on also for versions of the systems where the coefficients/weights are expressed in unary. Wed, 27 Mar 2024 12:03:32 GMT http://hdl.handle.net/2117/405462 2024-03-27T12:03:32Z Bonacina, Ilario Bonet Carbonell, M. Luisa Levy Díaz, Jordi In recent years there has been an increasing interest in studying proof systems stronger than Resolution, with the aim of building more efficient SAT solvers based on them. In defining these proof systems, we try to find a balance between the power of the proof system (the size of the proofs required to refute a formula) and the difficulty of finding the proofs. In this paper we consider the proof systems circular Resolution, Sherali-Adams, Nullstellensatz and Weighted Resolution and we study their relative power from a theoretical perspective. We prove that circular Resolution, Sherali-Adams and Weighted Resolution are polynomially equivalent proof systems. We also prove that Nullstellensatz is polynomially equivalent to a restricted version of Weighted Resolution. The equivalences carry on also for versions of the systems where the coefficients/weights are expressed in unary. The K-Robinson Foulds measures for labeled trees http://hdl.handle.net/2117/405052 The K-Robinson Foulds measures for labeled trees Khayatian, Elahe; Valiente Feruglio, Gabriel Alejandro; Zhang, Louxin Investigating the mutational history of tumor cells is important for understanding the underlying mechanisms of cancer and its evolution. Now that the evolution of tumor cells is modeled using labeled trees, researchers are motivated to propose different measures for the comparison of mutation trees and other labeled trees. While the Robinson-Foulds distance is widely used for the comparison of phylogenetic trees, it has weaknesses when it is applied to labeled trees. Here, k-Robinson-Foulds dissimilarity measures are introduced for labeled tree comparison. Thu, 21 Mar 2024 07:49:22 GMT http://hdl.handle.net/2117/405052 2024-03-21T07:49:22Z Khayatian, Elahe Valiente Feruglio, Gabriel Alejandro Zhang, Louxin Investigating the mutational history of tumor cells is important for understanding the underlying mechanisms of cancer and its evolution. Now that the evolution of tumor cells is modeled using labeled trees, researchers are motivated to propose different measures for the comparison of mutation trees and other labeled trees. While the Robinson-Foulds distance is widely used for the comparison of phylogenetic trees, it has weaknesses when it is applied to labeled trees. Here, k-Robinson-Foulds dissimilarity measures are introduced for labeled tree comparison. On the consistency of circuit lower bounds for non-deterministic time http://hdl.handle.net/2117/403574 On the consistency of circuit lower bounds for non-deterministic time Atserias, Albert; Buss, Sam; Müller, Moritz We prove the first unconditional consistency result for superpolynomial circuit lower bounds with a relatively strong theory of bounded arithmetic. Namely, we show that the theory ‍V20 is consistent with the conjecture that ‍NEXP ‍⊈ ‍P/poly, i.e., some problem that is solvable in non-deterministic exponential time does not have polynomial size circuits. We suggest this is the best currently available evidence for the truth of the conjecture. Additionally, we establish a magnification result on the hardness of proving circuit lower bounds. Fri, 01 Mar 2024 09:03:15 GMT http://hdl.handle.net/2117/403574 2024-03-01T09:03:15Z Atserias, Albert Buss, Sam Müller, Moritz We prove the first unconditional consistency result for superpolynomial circuit lower bounds with a relatively strong theory of bounded arithmetic. Namely, we show that the theory ‍V20 is consistent with the conjecture that ‍NEXP ‍⊈ ‍P/poly, i.e., some problem that is solvable in non-deterministic exponential time does not have polynomial size circuits. We suggest this is the best currently available evidence for the truth of the conjecture. Additionally, we establish a magnification result on the hardness of proving circuit lower bounds. WFAsic: A high-performance ASIC accelerator for DNA sequence alignment on a RISC-V SoC http://hdl.handle.net/2117/402007 WFAsic: A high-performance ASIC accelerator for DNA sequence alignment on a RISC-V SoC Haghi, Abbas; Álvarez Martí, Lluc; Fornt Mas, Jordi; Haro Ruiz, Juan Miguel de; Figueras Bagué, Roger; Doblas Font, Max; Marco Sola, Santiago; Moretó Planas, Miquel The ever-increasing yields in genome sequence data production pose a computational challenge to current genome sequence analysis tools, jeopardizing the future of personalized medicine. Leveraging hardware accelerators (GPUs, FPGAs, and ASICs) to accelerate computationally-intensive algorithms like sequence alignment has become paramount. Recently, the wavefront alignment algorithm was introduced, significantly reducing the execution time to perform sequence alignment. This paper presents the first-ever ASIC accelerator of the WFA integrated into a RISC-V system-on-chip. Our designed chip greatly accelerates sequence alignment, delivering up to 1076 × better performance over the CPU implementation of the WFA running on the RISC-V core of the chip. Thu, 15 Feb 2024 12:23:48 GMT http://hdl.handle.net/2117/402007 2024-02-15T12:23:48Z Haghi, Abbas Álvarez Martí, Lluc Fornt Mas, Jordi Haro Ruiz, Juan Miguel de Figueras Bagué, Roger Doblas Font, Max Marco Sola, Santiago Moretó Planas, Miquel The ever-increasing yields in genome sequence data production pose a computational challenge to current genome sequence analysis tools, jeopardizing the future of personalized medicine. Leveraging hardware accelerators (GPUs, FPGAs, and ASICs) to accelerate computationally-intensive algorithms like sequence alignment has become paramount. Recently, the wavefront alignment algorithm was introduced, significantly reducing the execution time to perform sequence alignment. This paper presents the first-ever ASIC accelerator of the WFA integrated into a RISC-V system-on-chip. Our designed chip greatly accelerates sequence alignment, delivering up to 1076 × better performance over the CPU implementation of the WFA running on the RISC-V core of the chip. New characterizations and a concept of potential for each multinomial (probabilistic) value http://hdl.handle.net/2117/401322 New characterizations and a concept of potential for each multinomial (probabilistic) value Domènech Blázquez, Margarita; Giménez Pradales, José Miguel; Puente del Campo, María Albina In this paper we focus on multinomial probabilistic values and we consider two special classes of players: necessary and nullifying players. By introducing new properties related to this kind of players, we provide new axiomatic characterizations of each multinomial probabilistic value, giving, in all cases, a set of independent properties that univocally determine them. Moreover, when the profile defining the value is positive, the multilinear extension is interpreted as a potential function and provides a computational tool. Wed, 07 Feb 2024 13:18:01 GMT http://hdl.handle.net/2117/401322 2024-02-07T13:18:01Z Domènech Blázquez, Margarita Giménez Pradales, José Miguel Puente del Campo, María Albina In this paper we focus on multinomial probabilistic values and we consider two special classes of players: necessary and nullifying players. By introducing new properties related to this kind of players, we provide new axiomatic characterizations of each multinomial probabilistic value, giving, in all cases, a set of independent properties that univocally determine them. Moreover, when the profile defining the value is positive, the multilinear extension is interpreted as a potential function and provides a computational tool.