Articles de revista
http://hdl.handle.net/2117/3093
2017-07-21T08:55:52ZMetaShot: an accurate workflow for taxon classification of host-associated microbiome from shotgun metagenomic data
http://hdl.handle.net/2117/106077
MetaShot: an accurate workflow for taxon classification of host-associated microbiome from shotgun metagenomic data
Fosso, Bruno; Santamaria, Monica; D'Antonio, M.; Lovero, D.; Corrado, G.; Vizza, E.; Passaro, N.; Garbuglia, A.R.; Capobianchi, M.R.; Crescenzi, M.; Valiente Feruglio, Gabriel Alejandro; Pesole, Graziano
Shotgun metagenomics by high-throughput sequencing may allow deep and accurate characterization of host-associated total microbiomes, including bacteria, viruses, protists and fungi. However, the analysis of such sequencing data is still extremely challenging in terms of both overall accuracy and computational efficiency, and current methodologies show substantial variability in misclassification rate and resolution at lower taxonomic ranks or are limited to specific life domains (e.g. only bacteria). We present here MetaShot, a workflow for assessing the total microbiome composition from host-associated shotgun sequence data, and show its overall optimal accuracy performance by analyzing both simulated and real datasets.
2017-07-03T07:48:29ZFosso, BrunoSantamaria, MonicaD'Antonio, M.Lovero, D.Corrado, G.Vizza, E.Passaro, N.Garbuglia, A.R.Capobianchi, M.R.Crescenzi, M.Valiente Feruglio, Gabriel AlejandroPesole, GrazianoShotgun metagenomics by high-throughput sequencing may allow deep and accurate characterization of host-associated total microbiomes, including bacteria, viruses, protists and fungi. However, the analysis of such sequencing data is still extremely challenging in terms of both overall accuracy and computational efficiency, and current methodologies show substantial variability in misclassification rate and resolution at lower taxonomic ranks or are limited to specific life domains (e.g. only bacteria). We present here MetaShot, a workflow for assessing the total microbiome composition from host-associated shotgun sequence data, and show its overall optimal accuracy performance by analyzing both simulated and real datasets.Large neighborhood search for the most strings with few bad columns problem
http://hdl.handle.net/2117/104908
Large neighborhood search for the most strings with few bad columns problem
Lizárraga Olivas, Evelia; Blesa Aguilera, Maria Josep; Blum, Christian; Raidl, Günther
In this work, we consider the following NP-hard combinatorial optimization problem from computational biology. Given a set of input strings of equal length, the goal is to identify a maximum cardinality subset of strings that differ maximally in a pre-defined number of positions. First of all, we introduce an integer linear programming model for this problem. Second, two variants of a rather simple greedy strategy are proposed. Finally, a large neighborhood search algorithm is presented. A comprehensive experimental comparison among the proposed techniques shows, first, that larger neighborhood search generally outperforms both greedy strategies. Second, while large neighborhood search shows to be competitive with the stand-alone application of CPLEX for small- and medium-sized problem instances, it outperforms CPLEX in the context of larger instances.
2017-05-26T10:29:18ZLizárraga Olivas, EveliaBlesa Aguilera, Maria JosepBlum, ChristianRaidl, GüntherIn this work, we consider the following NP-hard combinatorial optimization problem from computational biology. Given a set of input strings of equal length, the goal is to identify a maximum cardinality subset of strings that differ maximally in a pre-defined number of positions. First of all, we introduce an integer linear programming model for this problem. Second, two variants of a rather simple greedy strategy are proposed. Finally, a large neighborhood search algorithm is presented. A comprehensive experimental comparison among the proposed techniques shows, first, that larger neighborhood search generally outperforms both greedy strategies. Second, while large neighborhood search shows to be competitive with the stand-alone application of CPLEX for small- and medium-sized problem instances, it outperforms CPLEX in the context of larger instances.Non recursive functions have transcendental generating functions
http://hdl.handle.net/2117/104671
Non recursive functions have transcendental generating functions
Cucker Farkas, Juan Felipe; Gabarró Vallès, Joaquim
Proves that nonprimitive recursive functions have transcendental generating series. This result translates a certain measure of the complexity of a function, the fact of not being primitive recursive, into another measure of the complexity of the generating series associated to the function, the fact of being transcendental.; On démontre que les fonctions qui ne sont pas recursives primitives ont des séries génératrices transcendantes. Ce résultat traduit une certaine mesure de complexité d'une fonction, le fait de ne pas être recursive primitive, dans une autre mesure de la complexité de la série génératrice associée à cette fonction, le fait d'être transcendante.
2017-05-22T09:14:24ZCucker Farkas, Juan FelipeGabarró Vallès, JoaquimProves that nonprimitive recursive functions have transcendental generating series. This result translates a certain measure of the complexity of a function, the fact of not being primitive recursive, into another measure of the complexity of the generating series associated to the function, the fact of being transcendental.
On démontre que les fonctions qui ne sont pas recursives primitives ont des séries génératrices transcendantes. Ce résultat traduit une certaine mesure de complexité d'une fonction, le fait de ne pas être recursive primitive, dans une autre mesure de la complexité de la série génératrice associée à cette fonction, le fait d'être transcendante.Nonuniform complexity classes specified by lower and upper bounds
http://hdl.handle.net/2117/104347
Nonuniform complexity classes specified by lower and upper bounds
Balcázar Navarro, José Luis; Gabarró Vallès, Joaquim
We characterize in terms of oracle Turing machines the classes defined by exponential lower bounds on some nonuniform complexity measures. After, we use the same methods to giue a new characterization of classes defined by polynomial and polylog upper bounds, obtaining an unified approach to deal with upper and lower bounds, The main measures are the initial index, the context-free cosU ond the boolean circuits size. We interpret our results by discussing a trade- off between oracle information and computed information for oracle Turing machines.; NOMS caractérisons en termes de machines de Turing avec oracles les classes définies par des bornes inférieures exponentielles pour des mesures de complexité non uniformes. Nous utilisons ensuite les mêmes méthodes pour donner une nouvelle caractérisation des classes définies par des bornes supérieures polynomiales et polylogarithmiques, obtenanrainsi une approche unifiée pour les bornes inférieures et supérieures. Les mesures principales sont F index initial, le coût grammatical et la taille des circuits booléens. Nous interprétons nos résultats en étudiant, pour les machines de Turing avec oracle, la relation entre l'information due à Voracle et l'information calculée par la machine.
2017-05-12T08:15:24ZBalcázar Navarro, José LuisGabarró Vallès, JoaquimWe characterize in terms of oracle Turing machines the classes defined by exponential lower bounds on some nonuniform complexity measures. After, we use the same methods to giue a new characterization of classes defined by polynomial and polylog upper bounds, obtaining an unified approach to deal with upper and lower bounds, The main measures are the initial index, the context-free cosU ond the boolean circuits size. We interpret our results by discussing a trade- off between oracle information and computed information for oracle Turing machines.
NOMS caractérisons en termes de machines de Turing avec oracles les classes définies par des bornes inférieures exponentielles pour des mesures de complexité non uniformes. Nous utilisons ensuite les mêmes méthodes pour donner une nouvelle caractérisation des classes définies par des bornes supérieures polynomiales et polylogarithmiques, obtenanrainsi une approche unifiée pour les bornes inférieures et supérieures. Les mesures principales sont F index initial, le coût grammatical et la taille des circuits booléens. Nous interprétons nos résultats en étudiant, pour les machines de Turing avec oracle, la relation entre l'information due à Voracle et l'information calculée par la machine.The HOM problem is EXPTIME-complete
http://hdl.handle.net/2117/102817
The HOM problem is EXPTIME-complete
Creus López, Carles; Gascon Caro, Adrian; Godoy Balil, Guillem; Ramos Garrido, Lander
We define a new class of tree automata with constraints and prove decidability of the emptiness problem for this class in exponential time. As a consequence, we obtain several EXPTIME-completeness results for problems on images of regular tree languages under tree homomorphisms, like set inclusion, regularity (HOM problem), and finiteness of set difference. Our result also has implications in term rewriting, since the set of reducible terms of a term rewrite system can be described as the image of a tree homomorphism. In particular, we prove that inclusion of sets of normal forms of term rewrite systems can be decided in exponential time. Analogous consequences arise in the context of XML typechecking, since types are defined by tree automata and some type transformations are homomorphic.
2017-03-23T09:26:34ZCreus López, CarlesGascon Caro, AdrianGodoy Balil, GuillemRamos Garrido, LanderWe define a new class of tree automata with constraints and prove decidability of the emptiness problem for this class in exponential time. As a consequence, we obtain several EXPTIME-completeness results for problems on images of regular tree languages under tree homomorphisms, like set inclusion, regularity (HOM problem), and finiteness of set difference. Our result also has implications in term rewriting, since the set of reducible terms of a term rewrite system can be described as the image of a tree homomorphism. In particular, we prove that inclusion of sets of normal forms of term rewrite systems can be decided in exponential time. Analogous consequences arise in the context of XML typechecking, since types are defined by tree automata and some type transformations are homomorphic.Construct, Merge, Solve and Adapt: Application to the repetition-free longest common subsequence problem
http://hdl.handle.net/2117/102814
Construct, Merge, Solve and Adapt: Application to the repetition-free longest common subsequence problem
Blum, Christian; Blesa Aguilera, Maria Josep
In this paper we present the application of a recently proposed, general, algorithm for combinatorial optimization to the repetition-free longest common subsequence problem. The applied algorithm, which is labelled Construct, Merge, Solve & Adapt, generates sub-instances based on merging the solution components found in randomly constructed solutions. These sub-instances are subsequently solved by means of an exact solver. Moreover, the considered sub-instances are dynamically changing due to adding new solution components at each iteration, and removing existing solution components on the basis of indicators about their usefulness. The results of applying this algorithm to the repetition-free longest common subsequence problem show that the algorithm generally outperforms competing approaches from the literature. Moreover, they show that the algorithm is competitive with CPLEX for small and medium size problem instances, whereas it outperforms CPLEX for larger problem instances.
2017-03-23T07:48:23ZBlum, ChristianBlesa Aguilera, Maria JosepIn this paper we present the application of a recently proposed, general, algorithm for combinatorial optimization to the repetition-free longest common subsequence problem. The applied algorithm, which is labelled Construct, Merge, Solve & Adapt, generates sub-instances based on merging the solution components found in randomly constructed solutions. These sub-instances are subsequently solved by means of an exact solver. Moreover, the considered sub-instances are dynamically changing due to adding new solution components at each iteration, and removing existing solution components on the basis of indicators about their usefulness. The results of applying this algorithm to the repetition-free longest common subsequence problem show that the algorithm generally outperforms competing approaches from the literature. Moreover, they show that the algorithm is competitive with CPLEX for small and medium size problem instances, whereas it outperforms CPLEX for larger problem instances.On the stability of generalized second price auctions with budgets
http://hdl.handle.net/2117/101923
On the stability of generalized second price auctions with budgets
Díaz Cort, Josep; Giotis, Ioannis; Kirousis, Lefteris; Markakis, Evangelos; Serna Iglesias, María José
The Generalized Second Price (GSP) auction used typically to model sponsored search auctions does not include the notion of budget constraints, which is present in practice. Motivated by this, we introduce the different variants of GSP auctions that take budgets into account in natural ways. We examine their stability by focusing on the existence of Nash equilibria and envy-free assignments. We highlight the differences between these mechanisms and find that only some of them exhibit both notions of stability. This shows the importance of carefully picking the right mechanism to ensure stable outcomes in the presence of budgets.
2017-03-04T12:43:43ZDíaz Cort, JosepGiotis, IoannisKirousis, LefterisMarkakis, EvangelosSerna Iglesias, María JoséThe Generalized Second Price (GSP) auction used typically to model sponsored search auctions does not include the notion of budget constraints, which is present in practice. Motivated by this, we introduce the different variants of GSP auctions that take budgets into account in natural ways. We examine their stability by focusing on the existence of Nash equilibria and envy-free assignments. We highlight the differences between these mechanisms and find that only some of them exhibit both notions of stability. This shows the importance of carefully picking the right mechanism to ensure stable outcomes in the presence of budgets.Amalgamation of domain specific languages with behaviour
http://hdl.handle.net/2117/100652
Amalgamation of domain specific languages with behaviour
Duran, Francisco; Moreno Delgado, Antonio; Orejas Valdés, Fernando; Zschaler, Steffen
Domain-specific languages (DSLs) become more useful the more specific they are to a particular domain. The resulting need for developing a substantial number of DSLs can only be satisfied if DSL development can be made as efficient as possible. One way in which to address this challenge is by enabling the reuse of (partial) DSLs in the construction of new DSLs. Reuse of DSLs builds on two foundations: a notion of DSL composition and theoretical results ensuring the safeness of composing DSLs with respect to the semantics of the component DSLs.
Given a graph-grammar formalisation of DSLs, in this paper, we build on graph transformation system morphisms to define parameterised DSLs and their instantiation by an amalgamation construction. Results on the protection of the behaviour along the induced morphisms allow us to safely reuse and combine definitions of DSLs to build more complex ones. We illustrate our proposal in e-Motions for a DSL for production-line systems and three independent DSLs for describing non-functional properties, namely response time, throughput, and failure rate.
2017-02-08T08:20:14ZDuran, FranciscoMoreno Delgado, AntonioOrejas Valdés, FernandoZschaler, SteffenDomain-specific languages (DSLs) become more useful the more specific they are to a particular domain. The resulting need for developing a substantial number of DSLs can only be satisfied if DSL development can be made as efficient as possible. One way in which to address this challenge is by enabling the reuse of (partial) DSLs in the construction of new DSLs. Reuse of DSLs builds on two foundations: a notion of DSL composition and theoretical results ensuring the safeness of composing DSLs with respect to the semantics of the component DSLs.
Given a graph-grammar formalisation of DSLs, in this paper, we build on graph transformation system morphisms to define parameterised DSLs and their instantiation by an amalgamation construction. Results on the protection of the behaviour along the induced morphisms allow us to safely reuse and combine definitions of DSLs to build more complex ones. We illustrate our proposal in e-Motions for a DSL for production-line systems and three independent DSLs for describing non-functional properties, namely response time, throughput, and failure rate.Comparing MapReduce and pipeline implementations for counting triangles
http://hdl.handle.net/2117/100102
Comparing MapReduce and pipeline implementations for counting triangles
Pasarella Sánchez, Ana Edelmira; Vidal, Maria-Esther; Zoltan Torres, Ana Cristina
A common method to define a parallel solution for a computational problem consists in finding a way to use the Divide and Conquer paradigm in order to have processors acting on its own data and scheduled in a parallel fashion. MapReduce is a programming model that follows this paradigm, and allows for the definition of efficient solutions by both decomposing a problem into steps on subsets of the input data and combining the results of each step to produce final results. Albeit used for the implementation of a wide variety of computational problems, MapReduce performance can be negatively affected whenever the replication factor grows or the size of the input is larger than the resources available at each processor. In this paper we show an alternative approach to implement the Divide and Conquer paradigm, named dynamic pipeline. The main features of dynamic pipelines are illustrated on a parallel implementation of the well-known problem of counting triangles in a graph. This problem is especially interesting either when the input graph does not fit in memory or is dynamically generated. To evaluate the properties of pipeline, a dynamic pipeline of processes and an ad-hoc version of MapReduce are implemented in the language Go, exploiting its ability to deal with channels and spawned processes. An empirical evaluation is conducted on graphs of different topologies, sizes, and densities. Observed results suggest that dynamic pipelines allows for an efficient implementation of the problem of counting triangles in a graph, particularly, in dense and large graphs, drastically reducing the execution time with respect to the MapReduce implementation.
2017-01-26T11:13:06ZPasarella Sánchez, Ana EdelmiraVidal, Maria-EstherZoltan Torres, Ana CristinaA common method to define a parallel solution for a computational problem consists in finding a way to use the Divide and Conquer paradigm in order to have processors acting on its own data and scheduled in a parallel fashion. MapReduce is a programming model that follows this paradigm, and allows for the definition of efficient solutions by both decomposing a problem into steps on subsets of the input data and combining the results of each step to produce final results. Albeit used for the implementation of a wide variety of computational problems, MapReduce performance can be negatively affected whenever the replication factor grows or the size of the input is larger than the resources available at each processor. In this paper we show an alternative approach to implement the Divide and Conquer paradigm, named dynamic pipeline. The main features of dynamic pipelines are illustrated on a parallel implementation of the well-known problem of counting triangles in a graph. This problem is especially interesting either when the input graph does not fit in memory or is dynamically generated. To evaluate the properties of pipeline, a dynamic pipeline of processes and an ad-hoc version of MapReduce are implemented in the language Go, exploiting its ability to deal with channels and spawned processes. An empirical evaluation is conducted on graphs of different topologies, sizes, and densities. Observed results suggest that dynamic pipelines allows for an efficient implementation of the problem of counting triangles in a graph, particularly, in dense and large graphs, drastically reducing the execution time with respect to the MapReduce implementation.Narrow proofs may be maximally long
http://hdl.handle.net/2117/99737
Narrow proofs may be maximally long
Atserias, Albert; Lauria, Massimo; Nordström, Jakob
We prove that there are 3-CNF formulas over n variables that can be refuted in resolution in width w but require resolution proofs of size n(Omega(w)). This shows that the simple counting argument that any formula refutable in width w must have a proof in size n(O(w)) is essentially tight. Moreover, our lower bound generalizes to polynomial calculus resolution and Sherali-Adams, implying that the corresponding size upper bounds in terms of degree and rank are tight as well. The lower bound does not extend all the way to Lasserre, however, since we show that there the formulas we study have proofs of constant rank and size polynomial in both n and w.
2017-01-20T08:41:23ZAtserias, AlbertLauria, MassimoNordström, JakobWe prove that there are 3-CNF formulas over n variables that can be refuted in resolution in width w but require resolution proofs of size n(Omega(w)). This shows that the simple counting argument that any formula refutable in width w must have a proof in size n(O(w)) is essentially tight. Moreover, our lower bound generalizes to polynomial calculus resolution and Sherali-Adams, implying that the corresponding size upper bounds in terms of degree and rank are tight as well. The lower bound does not extend all the way to Lasserre, however, since we show that there the formulas we study have proofs of constant rank and size polynomial in both n and w.