ALBCOM - Algorismia, Bioinformàtica, Complexitat i Mètodes Formals
http://hdl.handle.net/2117/3092
2016-10-27T01:49:47ZSelf-tracking reloaded: Applying process mining to personalized health care from labeled sensor data
http://hdl.handle.net/2117/91090
Self-tracking reloaded: Applying process mining to personalized health care from labeled sensor data
Sztyler, Timo; Carmona Vargas, Josep; Völker, Johanna; Stuckenschmidt, Heiner
Currently, there is a trend to promote personalized health care in order to prevent diseases or to have a healthier life. Using current devices such as smart-phones and smart-watches, an individual can easily record detailed data from her daily life. Yet, this data has been mainly used for self-tracking in order to enable personalized health care. In this paper, we provide ideas on how process mining can be used as a fine-grained evolution of traditional self-tracking. We have applied the ideas of the paper on recorded data from a set of individuals, and present conclusions and challenges.
2016-10-26T09:21:42ZSztyler, TimoCarmona Vargas, JosepVölker, JohannaStuckenschmidt, HeinerCurrently, there is a trend to promote personalized health care in order to prevent diseases or to have a healthier life. Using current devices such as smart-phones and smart-watches, an individual can easily record detailed data from her daily life. Yet, this data has been mainly used for self-tracking in order to enable personalized health care. In this paper, we provide ideas on how process mining can be used as a fine-grained evolution of traditional self-tracking. We have applied the ideas of the paper on recorded data from a set of individuals, and present conclusions and challenges.Mining conditional partial order graphs from event logs
http://hdl.handle.net/2117/91088
Mining conditional partial order graphs from event logs
Mokhov, Andrey; Carmona Vargas, Josep; Beaumont, Jonathan
Process mining techniques rely on event logs: the extraction of a process model (discovery) takes an event log as the input, the adequacy of a process model (conformance) is checked against an event log, and the enhancement of a process model is performed by using available data in the log. Several notations and formalisms for event log representation have been proposed in the recent years to enable efficient algorithms for the aforementioned process mining problems. In this paper we show how Conditional Partial Order Graphs (CPOGs), a recently introduced formalism for compact representation of families of partial orders, can be used in the process mining field, in particular for addressing the problem of compact and easy-to-comprehend representation of event logs with data. We present algorithms for extracting both the control flow as well as the relevant data parameters from a given event log and show how CPOGs can be used for efficient and effective visualisation of the obtained results. We demonstrate that the resulting representation can be used to reveal the hidden interplay between the control and data flows of a process, thereby opening way for new process mining techniques capable of exploiting this interplay. Finally, we present open-source software support and discuss current limitations of the proposed approach.
2016-10-26T09:05:38ZMokhov, AndreyCarmona Vargas, JosepBeaumont, JonathanProcess mining techniques rely on event logs: the extraction of a process model (discovery) takes an event log as the input, the adequacy of a process model (conformance) is checked against an event log, and the enhancement of a process model is performed by using available data in the log. Several notations and formalisms for event log representation have been proposed in the recent years to enable efficient algorithms for the aforementioned process mining problems. In this paper we show how Conditional Partial Order Graphs (CPOGs), a recently introduced formalism for compact representation of families of partial orders, can be used in the process mining field, in particular for addressing the problem of compact and easy-to-comprehend representation of event logs with data. We present algorithms for extracting both the control flow as well as the relevant data parameters from a given event log and show how CPOGs can be used for efficient and effective visualisation of the obtained results. We demonstrate that the resulting representation can be used to reveal the hidden interplay between the control and data flows of a process, thereby opening way for new process mining techniques capable of exploiting this interplay. Finally, we present open-source software support and discuss current limitations of the proposed approach.A fast and retargetable framework for logic-IP-internal electromigration assessment comprehending advanced waveform effects
http://hdl.handle.net/2117/90714
A fast and retargetable framework for logic-IP-internal electromigration assessment comprehending advanced waveform effects
Jain, Palkesh; Cortadella Fortuny, Jordi; Sapatnekar, Sachin S.
A new methodology for system-on-chip-level logic-IP-internal electromigration verification is presented in this paper, which significantly improves accuracy by comprehending the impact of the parasitic RC loading and voltage-dependent pin capacitance in the library model. It additionally provides an on-the-fly retargeting capability for reliability constraints by allowing arbitrary specifications of lifetimes, temperatures, voltages, and failure rates, as well as interoperability of the IPs across foundries. The characterization part of the methodology is expedited through the intelligent IP-response modeling. The ultimate benefit of the proposed approach is demonstrated on a 28-nm design by providing an on-the-fly specification of retargeted reliability constraints. The results show a high correlation with SPICE and were obtained with an order of magnitude reduction in the verification runtime.
2016-10-13T07:40:06ZJain, PalkeshCortadella Fortuny, JordiSapatnekar, Sachin S.A new methodology for system-on-chip-level logic-IP-internal electromigration verification is presented in this paper, which significantly improves accuracy by comprehending the impact of the parasitic RC loading and voltage-dependent pin capacitance in the library model. It additionally provides an on-the-fly retargeting capability for reliability constraints by allowing arbitrary specifications of lifetimes, temperatures, voltages, and failure rates, as well as interoperability of the IPs across foundries. The characterization part of the methodology is expedited through the intelligent IP-response modeling. The ultimate benefit of the proposed approach is demonstrated on a 28-nm design by providing an on-the-fly specification of retargeted reliability constraints. The results show a high correlation with SPICE and were obtained with an order of magnitude reduction in the verification runtime.MapReduce vs. pipelining counting triangles
http://hdl.handle.net/2117/90688
MapReduce vs. pipelining counting triangles
Pasarella Sánchez, Ana Edelmira; Vidal Serodio, Maria Esther; Zoltan, Cristina
In this paper we follow an alternative approach named pipeline, to implement a parallel implementation of the well-known problem of counting triangles in a graph. This problem is especially interesting either when the input graph does not fit in memory or is dynamically generated. To be concrete, we implement a dynamic pipeline of processes and an ad-hoc MapReduce version using the language Go. We explote the ability of Go language to deal with channels and spawned processes. An empirical evaluation is conducted on graphs of different size and density. Observed results suggest that pipeline allows for the implementation of an efficient solution of the problem of counting triangles in a graph, particularly, in dense and large graphs, drastically reducing the execution time with respect to the MapReduce implementation.
2016-10-11T11:41:38ZPasarella Sánchez, Ana EdelmiraVidal Serodio, Maria EstherZoltan, CristinaIn this paper we follow an alternative approach named pipeline, to implement a parallel implementation of the well-known problem of counting triangles in a graph. This problem is especially interesting either when the input graph does not fit in memory or is dynamically generated. To be concrete, we implement a dynamic pipeline of processes and an ad-hoc MapReduce version using the language Go. We explote the ability of Go language to deal with channels and spawned processes. An empirical evaluation is conducted on graphs of different size and density. Observed results suggest that pipeline allows for the implementation of an efficient solution of the problem of counting triangles in a graph, particularly, in dense and large graphs, drastically reducing the execution time with respect to the MapReduce implementation.Security-sensitive tackling of obstructed workflow executions
http://hdl.handle.net/2117/90247
Security-sensitive tackling of obstructed workflow executions
Holderer, Julius; Carmona Vargas, Josep; Müller, Günter
Imposing access control onto workflows considerably reduces the set of users authorized to execute the workflow tasks. Further constraints (e.g. Separation of Duties) as well as unexpected unavailabilty of users may finally obstruct the successful workflow execution. To still complete the execution of an obstructed workflow, we envisage a hybrid
approach. If a log is provided, we partition its traces into “successful” and “obstructed” ones by analysing the given workflow and its authorizations. An obstruction should then be solved by finding its nearest match from the list of successful traces. If no log is provided, we flatten the workflow and its authorizations into a Petri net and encode the obstruction with a corresponding “obstruction marking”. The structural theory of Petri nets shall then be tweaked to provide a minimized Parikh vector, that may violate given firing rules, however reach a complete marking and by that, complete the workflow.
2016-09-28T08:32:00ZHolderer, JuliusCarmona Vargas, JosepMüller, GünterImposing access control onto workflows considerably reduces the set of users authorized to execute the workflow tasks. Further constraints (e.g. Separation of Duties) as well as unexpected unavailabilty of users may finally obstruct the successful workflow execution. To still complete the execution of an obstructed workflow, we envisage a hybrid
approach. If a log is provided, we partition its traces into “successful” and “obstructed” ones by analysing the given workflow and its authorizations. An obstruction should then be solved by finding its nearest match from the list of successful traces. If no log is provided, we flatten the workflow and its authorizations into a Petri net and encode the obstruction with a corresponding “obstruction marking”. The structural theory of Petri nets shall then be tweaked to provide a minimized Parikh vector, that may violate given firing rules, however reach a complete marking and by that, complete the workflow.Complexity and dynamics of the winemaking bacterial communities in berries, musts, and wines from apulian grape cultivars through time and space
http://hdl.handle.net/2117/90157
Complexity and dynamics of the winemaking bacterial communities in berries, musts, and wines from apulian grape cultivars through time and space
Marzano, Marinella; Fosso, Bruno; Manzari, Caterina; Grieco, Francesco; Intranuovo, Marianna; Cozzi, Giuseppe; Mulè, Giuseppina; Scioscia, Gaetano; Valiente Feruglio, Gabriel Alejandro; Tullo, Apollonia; Sbisa, Elisabetta; Pesole, Graziano; Santamaria, Monica
Currently, there is very little information available regarding the microbiome associated with the wine production chain. Here, we used an amplicon sequencing approach based on high-throughput sequencing (HTS) to obtain a comprehensive assessment of the bacterial community associated with the production of three Apulian red wines, from grape to final product. The relationships among grape variety, the microbial community, and fermentation was investigated. Moreover, the winery microbiota was evaluated compared to the autochthonous species in vineyards that persist until the end of the winemaking process. The analysis highlighted the remarkable dynamics within the microbial communities during fermentation. A common microbial core shared among the examined wine varieties was observed, and the unique taxonomic signature of each wine appellation was revealed. New species belonging to the genus Halomonas were also reported. This study demonstrates the potential of this metagenomic approach, supported by optimized protocols, for identifying the biodiversity of the wine supply chain. The developed experimental pipeline offers new prospects for other research fields in which a comprehensive view of microbial community complexity and dynamics is desirable.
2016-09-23T10:40:20ZMarzano, MarinellaFosso, BrunoManzari, CaterinaGrieco, FrancescoIntranuovo, MariannaCozzi, GiuseppeMulè, GiuseppinaScioscia, GaetanoValiente Feruglio, Gabriel AlejandroTullo, ApolloniaSbisa, ElisabettaPesole, GrazianoSantamaria, MonicaCurrently, there is very little information available regarding the microbiome associated with the wine production chain. Here, we used an amplicon sequencing approach based on high-throughput sequencing (HTS) to obtain a comprehensive assessment of the bacterial community associated with the production of three Apulian red wines, from grape to final product. The relationships among grape variety, the microbial community, and fermentation was investigated. Moreover, the winery microbiota was evaluated compared to the autochthonous species in vineyards that persist until the end of the winemaking process. The analysis highlighted the remarkable dynamics within the microbial communities during fermentation. A common microbial core shared among the examined wine varieties was observed, and the unique taxonomic signature of each wine appellation was revealed. New species belonging to the genus Halomonas were also reported. This study demonstrates the potential of this metagenomic approach, supported by optimized protocols, for identifying the biodiversity of the wine supply chain. The developed experimental pipeline offers new prospects for other research fields in which a comprehensive view of microbial community complexity and dynamics is desirable.Analysis of pivot sampling in dual-pivot Quicksort: A holistic analysis of Yaroslavskiy's partitioning scheme
http://hdl.handle.net/2117/89895
Analysis of pivot sampling in dual-pivot Quicksort: A holistic analysis of Yaroslavskiy's partitioning scheme
Nebel, Markus E.; Wild, Sebastian; Martínez Parra, Conrado
The new dual-pivot Quicksort by Vladimir Yaroslavskiy-used in Oracle's Java runtime library since version 7-features intriguing asymmetries. They make a basic variant of this algorithm use less comparisons than classic single-pivot Quicksort. In this paper, we extend the analysis to the case where the two pivots are chosen as fixed order statistics of a random sample. Surprisingly, dual-pivot Quicksort then needs more comparisons than a corresponding version of classic Quicksort, so it is clear that counting comparisons is not sufficient to explain the running time advantages observed for Yaroslavskiy's algorithm in practice. Consequently, we take a more holistic approach and give also the precise leading term of the average number of swaps, the number of executed Java Bytecode instructions and the number of scanned elements, a new simple cost measure that approximates I/O costs in the memory hierarchy. We determine optimal order statistics for each of the cost measures. It turns out that the asymmetries in Yaroslavskiy's algorithm render pivots with a systematic skew more efficient than the symmetric choice. Moreover, we finally have a convincing explanation for the success of Yaroslavskiy's algorithm in practice: compared with corresponding versions of classic single-pivot Quicksort, dual-pivot Quicksort needs significantly less I/Os, both with and without pivot sampling.
The final publication is available at Springer via http://dx.doi.org/10.1007/s00453-015-0041-7
2016-09-14T07:49:06ZNebel, Markus E.Wild, SebastianMartínez Parra, ConradoThe new dual-pivot Quicksort by Vladimir Yaroslavskiy-used in Oracle's Java runtime library since version 7-features intriguing asymmetries. They make a basic variant of this algorithm use less comparisons than classic single-pivot Quicksort. In this paper, we extend the analysis to the case where the two pivots are chosen as fixed order statistics of a random sample. Surprisingly, dual-pivot Quicksort then needs more comparisons than a corresponding version of classic Quicksort, so it is clear that counting comparisons is not sufficient to explain the running time advantages observed for Yaroslavskiy's algorithm in practice. Consequently, we take a more holistic approach and give also the precise leading term of the average number of swaps, the number of executed Java Bytecode instructions and the number of scanned elements, a new simple cost measure that approximates I/O costs in the memory hierarchy. We determine optimal order statistics for each of the cost measures. It turns out that the asymmetries in Yaroslavskiy's algorithm render pivots with a systematic skew more efficient than the symmetric choice. Moreover, we finally have a convincing explanation for the success of Yaroslavskiy's algorithm in practice: compared with corresponding versions of classic single-pivot Quicksort, dual-pivot Quicksort needs significantly less I/Os, both with and without pivot sampling.On the cost of fixed partial match queries in K-d trees
http://hdl.handle.net/2117/89860
On the cost of fixed partial match queries in K-d trees
Duch Brown, Amalia; Lau Laynes-Lozada, Gustavo Salvador; Martínez Parra, Conrado
Partial match queries constitute the most basic type of associative queries in multidimensional data structures such as K-d trees or quadtrees. Given a query q=(q0,…,qK-1) where s of the coordinates are specified and K-s are left unspecified (qi=*), a partial match search returns the subset of data points x=(x0,…,xK-1) in the data structure that match the given query, that is, the data points such that xi=qi whenever qi¿*. There exists a wealth of results about the cost of partial match searches in many different multidimensional data structures, but most of these results deal with random queries. Only recently a few papers have begun to investigate the cost of partial match queries with a fixed query q. This paper represents a new contribution in this direction, giving a detailed asymptotic estimate of the expected cost Pn,q for a given fixed query q. From previous results on the cost of partial matches with a fixed query and the ones presented here, a deeper understanding is emerging, uncovering the following functional shape for Pn,q
Pn,q=¿·(¿i:qi is specifiedqi(1-qi))a/2·na+l.o.t.
(l.o.t. lower order terms, throughout this work) in many multidimensional data structures, which differ only in the exponent a and the constant ¿, both dependent on s and K, and, for some data structures, on the whole pattern of specified and unspecified coordinates in q as well. Although it is tempting to conjecture that this functional shape is “universal”, we have shown experimentally that it seems not to be true for a variant of K-d trees called squarish K-d trees.
The final publication is available at Springer via http://dx.doi.org/10.1007/s00453-015-0097-4
2016-09-13T10:21:11ZDuch Brown, AmaliaLau Laynes-Lozada, Gustavo SalvadorMartínez Parra, ConradoPartial match queries constitute the most basic type of associative queries in multidimensional data structures such as K-d trees or quadtrees. Given a query q=(q0,…,qK-1) where s of the coordinates are specified and K-s are left unspecified (qi=*), a partial match search returns the subset of data points x=(x0,…,xK-1) in the data structure that match the given query, that is, the data points such that xi=qi whenever qi¿*. There exists a wealth of results about the cost of partial match searches in many different multidimensional data structures, but most of these results deal with random queries. Only recently a few papers have begun to investigate the cost of partial match queries with a fixed query q. This paper represents a new contribution in this direction, giving a detailed asymptotic estimate of the expected cost Pn,q for a given fixed query q. From previous results on the cost of partial matches with a fixed query and the ones presented here, a deeper understanding is emerging, uncovering the following functional shape for Pn,q
Pn,q=¿·(¿i:qi is specifiedqi(1-qi))a/2·na+l.o.t.
(l.o.t. lower order terms, throughout this work) in many multidimensional data structures, which differ only in the exponent a and the constant ¿, both dependent on s and K, and, for some data structures, on the whole pattern of specified and unspecified coordinates in q as well. Although it is tempting to conjecture that this functional shape is “universal”, we have shown experimentally that it seems not to be true for a variant of K-d trees called squarish K-d trees.Self-synchronized duty-cycling for sensor networks with energy harvesting capabilities: Implementation in Wiselib
http://hdl.handle.net/2117/89839
Self-synchronized duty-cycling for sensor networks with energy harvesting capabilities: Implementation in Wiselib
Hernández, H.; Baumgartner, Tobias; Blum, Christian; Blesa Aguilera, Maria Josep; Fekete, Sandor P.; Kröller, Alexander
In this work we present a protocol for a self- synchronized duty-cycling mechanism in wireless sensor net- works with energy harvesting capabilities. The protocol is im- plemented in Wiselib, a library of generic algorithms for sensor networks. Simulations are conducted with the sensor network simulator Shawn. They are based on the specifications of real hardware known as iSense sensor nodes. The experimental results show that the proposed mechanism is able to adapt to changing energy availabilities. Moreover, it is shown that the system is very robust against packet loss.
2016-09-13T07:13:53ZHernández, H.Baumgartner, TobiasBlum, ChristianBlesa Aguilera, Maria JosepFekete, Sandor P.Kröller, AlexanderIn this work we present a protocol for a self- synchronized duty-cycling mechanism in wireless sensor net- works with energy harvesting capabilities. The protocol is im- plemented in Wiselib, a library of generic algorithms for sensor networks. Simulations are conducted with the sensor network simulator Shawn. They are based on the specifications of real hardware known as iSense sensor nodes. The experimental results show that the proposed mechanism is able to adapt to changing energy availabilities. Moreover, it is shown that the system is very robust against packet loss.De Menos a Distinto: Estudio de la Implantación de R en las asignaturas del grado de estadística
http://hdl.handle.net/2117/89582
De Menos a Distinto: Estudio de la Implantación de R en las asignaturas del grado de estadística
Baixeries i Juvillà, Jaume; Fairén González, Marta; Gabarró Vallès, Joaquim; Pasarella Sánchez, Ana Edelmira
Teaching computer science in degrees that are not computer science related presents an important challenge: to motivate the students and to achieve good average grades. The student’s complaint is always based on his lack of motivation: What is this subject useful for? and this is specially relevant when this subject is not easy to learn by the student. In this paper we show the case of computer courses in the Statistics degree taught in the Universitat de Barcelona (UB) and the Universitat Politècnica de Catalunya (UPC) (two Catalan universities). We initially tried to reduce the complexity of their contents in order to obtain better average grades. Yet, it did not workout as expected. Therefore, we changed our strategy and instead of making the contents easier (less complex), we changed the tools that were used to teach and tried to adapt them to the students’ interests. In this particular case, we decided to use the R programming language, a language widely used by statisticians, in order to explain the basics of programming. Therefore, we changed our strategy from less (simpler contents) to different (more elaborated and nontrivial contents adapted to meet their expectations).
2016-09-05T15:19:46ZBaixeries i Juvillà, JaumeFairén González, MartaGabarró Vallès, JoaquimPasarella Sánchez, Ana EdelmiraTeaching computer science in degrees that are not computer science related presents an important challenge: to motivate the students and to achieve good average grades. The student’s complaint is always based on his lack of motivation: What is this subject useful for? and this is specially relevant when this subject is not easy to learn by the student. In this paper we show the case of computer courses in the Statistics degree taught in the Universitat de Barcelona (UB) and the Universitat Politècnica de Catalunya (UPC) (two Catalan universities). We initially tried to reduce the complexity of their contents in order to obtain better average grades. Yet, it did not workout as expected. Therefore, we changed our strategy and instead of making the contents easier (less complex), we changed the tools that were used to teach and tried to adapt them to the students’ interests. In this particular case, we decided to use the R programming language, a language widely used by statisticians, in order to explain the basics of programming. Therefore, we changed our strategy from less (simpler contents) to different (more elaborated and nontrivial contents adapted to meet their expectations).