ALBCOM  Algorismia, Bioinformàtica, Complexitat i Mètodes Formals
20160930T05:10:30Z

Securitysensitive tackling of obstructed workflow executions
Securitysensitive tackling of obstructed workflow executions
Holderer, Julius; Carmona Vargas, Josep; Müller, Günter
20160928T08:32:00Z
Holderer, Julius
Carmona Vargas, Josep
Müller, Günter
Complexity and dynamics of the winemaking bacterial communities in berries, musts, and wines from apulian grape cultivars through time and space
Complexity and dynamics of the winemaking bacterial communities in berries, musts, and wines from apulian grape cultivars through time and space
Marzano, Marinella; Fosso, Bruno; Manzari, Caterina; Grieco, Francesco; Intranuovo, Marianna; Cozzi, Giuseppe; Mulè, Giuseppina; Scioscia, Gaetano; Valiente Feruglio, Gabriel Alejandro; Tullo, Apollonia; Sbisa, Elisabetta; Pesole, Graziano; Santamaria, Monica
Currently, there is very little information available regarding the microbiome associated with the wine production chain. Here, we used an amplicon sequencing approach based on highthroughput sequencing (HTS) to obtain a comprehensive assessment of the bacterial community associated with the production of three Apulian red wines, from grape to final product. The relationships among grape variety, the microbial community, and fermentation was investigated. Moreover, the winery microbiota was evaluated compared to the autochthonous species in vineyards that persist until the end of the winemaking process. The analysis highlighted the remarkable dynamics within the microbial communities during fermentation. A common microbial core shared among the examined wine varieties was observed, and the unique taxonomic signature of each wine appellation was revealed. New species belonging to the genus Halomonas were also reported. This study demonstrates the potential of this metagenomic approach, supported by optimized protocols, for identifying the biodiversity of the wine supply chain. The developed experimental pipeline offers new prospects for other research fields in which a comprehensive view of microbial community complexity and dynamics is desirable.
20160923T10:40:20Z
Marzano, Marinella
Fosso, Bruno
Manzari, Caterina
Grieco, Francesco
Intranuovo, Marianna
Cozzi, Giuseppe
Mulè, Giuseppina
Scioscia, Gaetano
Valiente Feruglio, Gabriel Alejandro
Tullo, Apollonia
Sbisa, Elisabetta
Pesole, Graziano
Santamaria, Monica
Analysis of pivot sampling in dualpivot Quicksort: A holistic analysis of Yaroslavskiy's partitioning scheme
Analysis of pivot sampling in dualpivot Quicksort: A holistic analysis of Yaroslavskiy's partitioning scheme
Nebel, Markus E.; Wild, Sebastian; Martínez Parra, Conrado
The new dualpivot Quicksort by Vladimir Yaroslavskiyused in Oracle's Java runtime library since version 7features intriguing asymmetries. They make a basic variant of this algorithm use less comparisons than classic singlepivot Quicksort. In this paper, we extend the analysis to the case where the two pivots are chosen as fixed order statistics of a random sample. Surprisingly, dualpivot Quicksort then needs more comparisons than a corresponding version of classic Quicksort, so it is clear that counting comparisons is not sufficient to explain the running time advantages observed for Yaroslavskiy's algorithm in practice. Consequently, we take a more holistic approach and give also the precise leading term of the average number of swaps, the number of executed Java Bytecode instructions and the number of scanned elements, a new simple cost measure that approximates I/O costs in the memory hierarchy. We determine optimal order statistics for each of the cost measures. It turns out that the asymmetries in Yaroslavskiy's algorithm render pivots with a systematic skew more efficient than the symmetric choice. Moreover, we finally have a convincing explanation for the success of Yaroslavskiy's algorithm in practice: compared with corresponding versions of classic singlepivot Quicksort, dualpivot Quicksort needs significantly less I/Os, both with and without pivot sampling.
20160914T07:49:06Z
Nebel, Markus E.
Wild, Sebastian
Martínez Parra, Conrado
On the cost of fixed partial match queries in Kd trees
On the cost of fixed partial match queries in Kd trees
Duch Brown, Amalia; Lau LaynesLozada, Gustavo Salvador; Martínez Parra, Conrado
Partial match queries constitute the most basic type of associative queries in multidimensional data structures such as Kd trees or quadtrees. Given a query q=(q0,…,qK1) where s of the coordinates are specified and Ks are left unspecified (qi=*), a partial match search returns the subset of data points x=(x0,…,xK1) in the data structure that match the given query, that is, the data points such that xi=qi whenever qi¿*. There exists a wealth of results about the cost of partial match searches in many different multidimensional data structures, but most of these results deal with random queries. Only recently a few papers have begun to investigate the cost of partial match queries with a fixed query q. This paper represents a new contribution in this direction, giving a detailed asymptotic estimate of the expected cost Pn,q for a given fixed query q. From previous results on the cost of partial matches with a fixed query and the ones presented here, a deeper understanding is emerging, uncovering the following functional shape for Pn,q
Pn,q=¿·(¿i:qi is specifiedqi(1qi))a/2·na+l.o.t.
(l.o.t. lower order terms, throughout this work) in many multidimensional data structures, which differ only in the exponent a and the constant ¿, both dependent on s and K, and, for some data structures, on the whole pattern of specified and unspecified coordinates in q as well. Although it is tempting to conjecture that this functional shape is “universal”, we have shown experimentally that it seems not to be true for a variant of Kd trees called squarish Kd trees.
20160913T10:21:11Z
Duch Brown, Amalia
Lau LaynesLozada, Gustavo Salvador
Martínez Parra, Conrado
Partial match queries constitute the most basic type of associative queries in multidimensional data structures such as Kd trees or quadtrees. Given a query q=(q0,…,qK1) where s of the coordinates are specified and Ks are left unspecified (qi=*), a partial match search returns the subset of data points x=(x0,…,xK1) in the data structure that match the given query, that is, the data points such that xi=qi whenever qi¿*. There exists a wealth of results about the cost of partial match searches in many different multidimensional data structures, but most of these results deal with random queries. Only recently a few papers have begun to investigate the cost of partial match queries with a fixed query q. This paper represents a new contribution in this direction, giving a detailed asymptotic estimate of the expected cost Pn,q for a given fixed query q. From previous results on the cost of partial matches with a fixed query and the ones presented here, a deeper understanding is emerging, uncovering the following functional shape for Pn,q
Pn,q=¿·(¿i:qi is specifiedqi(1qi))a/2·na+l.o.t.
Selfsynchronized dutycycling for sensor networks with energy harvesting capabilities: Implementation in Wiselib
Selfsynchronized dutycycling for sensor networks with energy harvesting capabilities: Implementation in Wiselib
Hernández, H.; Baumgartner, Tobias; Blum, Christian; Blesa Aguilera, Maria Josep; Fekete, Sandor P.; Kröller, Alexander
In this work we present a protocol for a self synchronized dutycycling mechanism in wireless sensor net works with energy harvesting capabilities. The protocol is im plemented in Wiselib, a library of generic algorithms for sensor networks. Simulations are conducted with the sensor network simulator Shawn. They are based on the specifications of real hardware known as iSense sensor nodes. The experimental results show that the proposed mechanism is able to adapt to changing energy availabilities. Moreover, it is shown that the system is very robust against packet loss.
20160913T07:13:53Z
Hernández, H.
Baumgartner, Tobias
Blum, Christian
Blesa Aguilera, Maria Josep
Fekete, Sandor P.
Kröller, Alexander
In this work we present a protocol for a self synchronized dutycycling mechanism in wireless sensor net works with energy harvesting capabilities. The protocol is im plemented in Wiselib, a library of generic algorithms for sensor networks. Simulations are conducted with the sensor network simulator Shawn. They are based on the specifications of real hardware known as iSense sensor nodes. The experimental results show that the proposed mechanism is able to adapt to changing energy availabilities. Moreover, it is shown that the system is very robust against packet loss.

De Menos a Distinto: Estudio de la Implantación de R en las asignaturas del grado de estadística
De Menos a Distinto: Estudio de la Implantación de R en las asignaturas del grado de estadística
Baixeries i Juvillà, Jaume; Fairén González, Marta; Gabarró Vallès, Joaquim; Pasarella Sánchez, Ana Edelmira
Teaching computer science in degrees that are not computer science related presents an important challenge: to motivate the students and to achieve good average grades. The student’s complaint is always based on his lack of motivation: What is this subject useful for? and this is specially relevant when this subject is not easy to learn by the student. In this paper we show the case of computer courses in the Statistics degree taught in the Universitat de Barcelona (UB) and the Universitat Politècnica de Catalunya (UPC) (two Catalan universities). We initially tried to reduce the complexity of their contents in order to obtain better average grades. Yet, it did not workout as expected. Therefore, we changed our strategy and instead of making the contents easier (less complex), we changed the tools that were used to teach and tried to adapt them to the students’ interests. In this particular case, we decided to use the R programming language, a language widely used by statisticians, in order to explain the basics of programming. Therefore, we changed our strategy from less (simpler contents) to different (more elaborated and nontrivial contents adapted to meet their expectations).
20160905T15:19:46Z
Baixeries i Juvillà, Jaume
Fairén González, Marta
Gabarró Vallès, Joaquim
Pasarella Sánchez, Ana Edelmira
Teaching computer science in degrees that are not computer science related presents an important challenge: to motivate the students and to achieve good average grades. The student’s complaint is always based on his lack of motivation: What is this subject useful for? and this is specially relevant when this subject is not easy to learn by the student. In this paper we show the case of computer courses in the Statistics degree taught in the Universitat de Barcelona (UB) and the Universitat Politècnica de Catalunya (UPC) (two Catalan universities). We initially tried to reduce the complexity of their contents in order to obtain better average grades. Yet, it did not workout as expected. Therefore, we changed our strategy and instead of making the contents easier (less complex), we changed the tools that were used to teach and tried to adapt them to the students’ interests. In this particular case, we decided to use the R programming language, a language widely used by statisticians, in order to explain the basics of programming. Therefore, we changed our strategy from less (simpler contents) to different (more elaborated and nontrivial contents adapted to meet their expectations).

Absorption time of the Moran process
Absorption time of the Moran process
Díaz Cort, Josep; Goldberg, Leslie Ann; Richerby, David; Serna Iglesias, María José
© 2016 Wiley Periodicals, Inc.
The Moran process models the spread of mutations in populations on graphs. We investigate the absorption time of the process, which is the time taken for a mutation introduced at a randomly chosen vertex to either spread to the whole population, or to become extinct. It is known that the expected absorption time for an advantageous mutation is O(n4) on an nvertex undirected graph, which allows the behaviour of the process on undirected graphs to be analysed using the Markov chain Monte Carlo method. We show that this does not extend to directed graphs by exhibiting an infinite family of directed graphs for which the expected absorption time is exponential in the number of vertices. However, for regular directed graphs, we show that the expected absorption time is O(nlogn) and O(n2). We exhibit families of graphs matching these bounds and give improved bounds for other families of graphs, based on isoperimetric number. Our results are obtained via stochastic dominations which we demonstrate by establishing a coupling in a related continuoustime model. The coupling also implies several natural domination results regarding the fixation probability of the original (discretetime) process, resolving a conjecture of Shakarian, Roos and Johnson.
20160729T13:05:23Z
Díaz Cort, Josep
Goldberg, Leslie Ann
Richerby, David
Serna Iglesias, María José
© 2016 Wiley Periodicals, Inc.
The Moran process models the spread of mutations in populations on graphs. We investigate the absorption time of the process, which is the time taken for a mutation introduced at a randomly chosen vertex to either spread to the whole population, or to become extinct. It is known that the expected absorption time for an advantageous mutation is O(n4) on an nvertex undirected graph, which allows the behaviour of the process on undirected graphs to be analysed using the Markov chain Monte Carlo method. We show that this does not extend to directed graphs by exhibiting an infinite family of directed graphs for which the expected absorption time is exponential in the number of vertices. However, for regular directed graphs, we show that the expected absorption time is O(nlogn) and O(n2). We exhibit families of graphs matching these bounds and give improved bounds for other families of graphs, based on isoperimetric number. Our results are obtained via stochastic dominations which we demonstrate by establishing a coupling in a related continuoustime model. The coupling also implies several natural domination results regarding the fixation probability of the original (discretetime) process, resolving a conjecture of Shakarian, Roos and Johnson.

Process mining meets abstract interpretation
Process mining meets abstract interpretation
Carmona Vargas, Josep; Cortadella Fortuny, Jordi
The discovery of process models out of system traces is an interesting problem that has received significant attention in the last years. In this work, a theory for the derivation of a Petri net from a set of traces is presented. The method is based on the theory of abstract interpretation, which has been applied successfully in other areas. The principal application of the theory presented is Process Mining, an area that tries to incorporate the use of formal models both in the design and use of information systems.
20160616T08:50:11Z
Carmona Vargas, Josep
Cortadella Fortuny, Jordi
The discovery of process models out of system traces is an interesting problem that has received significant attention in the last years. In this work, a theory for the derivation of a Petri net from a set of traces is presented. The method is based on the theory of abstract interpretation, which has been applied successfully in other areas. The principal application of the theory presented is Process Mining, an area that tries to incorporate the use of formal models both in the design and use of information systems.

Process mining from a basis of state regions
Process mining from a basis of state regions
Solé, Marc; Carmona Vargas, Josep
A central problem in the area of Process Mining is to obtain a formal model that represents selected behavior of a system. The theory of regions has been applied to address this problem, enabling the derivation of a Petri net whose language includes a set of traces. However, when dealing with reallife systems, the available tool support for performing such task is unsatisfactory, due to the complex algorithms that are required. In this paper, the theory of regions is revisited to devise a novel technique that explores the space of regions by combining the elements of a region basis. Due to its light space requirements, the approach can represent an important step for bridging the gap between the theory of regions and its industrial application. Experimental results improve in orders of magnitude stateoftheart tools for the same task.
20160615T07:59:02Z
Solé, Marc
Carmona Vargas, Josep
A central problem in the area of Process Mining is to obtain a formal model that represents selected behavior of a system. The theory of regions has been applied to address this problem, enabling the derivation of a Petri net whose language includes a set of traces. However, when dealing with reallife systems, the available tool support for performing such task is unsatisfactory, due to the complex algorithms that are required. In this paper, the theory of regions is revisited to devise a novel technique that explores the space of regions by combining the elements of a region basis. Due to its light space requirements, the approach can represent an important step for bridging the gap between the theory of regions and its industrial application. Experimental results improve in orders of magnitude stateoftheart tools for the same task.

Multikey Quickselect
Multikey Quickselect
Frias Moya, Leonor; Roura Ferret, Salvador
In this paper we introduce Multikey Quickselect: an efficient, inplace, and easy to implement algorithm for the selection problem for strings. We present several variants of our basic algorithm, which apply to two different flavors of the selection problem. Also, we analyze the cost of the main variants, measured as the expected number of character comparisons and elements swaps. Some of the enhancements presented in this paper apply to Multikey Quicksort as well.
20160615T07:21:39Z
Frias Moya, Leonor
Roura Ferret, Salvador
In this paper we introduce Multikey Quickselect: an efficient, inplace, and easy to implement algorithm for the selection problem for strings. We present several variants of our basic algorithm, which apply to two different flavors of the selection problem. Also, we analyze the cost of the main variants, measured as the expected number of character comparisons and elements swaps. Some of the enhancements presented in this paper apply to Multikey Quicksort as well.