Articles de revista
http://hdl.handle.net/2117/184693
2024-03-29T05:55:38Z
-
A unified formal framework for factorial and probabilistic topic modelling
http://hdl.handle.net/2117/396668
A unified formal framework for factorial and probabilistic topic modelling
Gibert, Karina; Hernández Potiomkin, Yaroslav
Topic modelling has become a highly popular technique for extracting knowledge from texts. It encompasses various method families, including Factorial methods, Probabilistic methods, and Natural Language Processing methods. This paper introduces a unified conceptual framework for Factorial and Probabilistic methods by identifying shared elements and representing them using a homogeneous notation. The paper presents 12 different methods within this framework, enabling easy comparative analysis to assess the flexibility and how realistic the assumptions of each approach are. This establishes the initial stage of a broader analysis aimed at relating all method families to this common framework, comprehensively understanding their strengths and weaknesses, and establishing general application guidelines. Also, an experimental setup reinforces the convenience of having harmonized notational schema. The paper concludes with a discussion on the presented methods and outlines future research directions.
2023-11-20T09:34:52Z
Gibert, Karina
Hernández Potiomkin, Yaroslav
Topic modelling has become a highly popular technique for extracting knowledge from texts. It encompasses various method families, including Factorial methods, Probabilistic methods, and Natural Language Processing methods. This paper introduces a unified conceptual framework for Factorial and Probabilistic methods by identifying shared elements and representing them using a homogeneous notation. The paper presents 12 different methods within this framework, enabling easy comparative analysis to assess the flexibility and how realistic the assumptions of each approach are. This establishes the initial stage of a broader analysis aimed at relating all method families to this common framework, comprehensively understanding their strengths and weaknesses, and establishing general application guidelines. Also, an experimental setup reinforces the convenience of having harmonized notational schema. The paper concludes with a discussion on the presented methods and outlines future research directions.
-
SurvLIMEpy: a Python package implementing SurvLIME
http://hdl.handle.net/2117/395350
SurvLIMEpy: a Python package implementing SurvLIME
Pachón García, Cristian; Hernández Pérez, Carlos; Delicado Useros, Pedro Francisco; Vilaplana Besler, Verónica
In this paper we present SurvLIMEpy, an open-source Python package that implements the SurvLIME algorithm. This method allows to compute local feature importance for machine learning algorithms designed for modelling Survival Analysis data. The presented implementation uses a matrix-wise formulation, which allows to speed up the execution time. Additionally, SurvLIMEpy assists the user with visualisation tools to better understand the result of the algorithm. The package supports a wide variety of survival models, from the Cox Proportional Hazards Model to deep learning models such as DeepHit or DeepSurv. Two types of experiments are presented in this paper. First, by means of simulated data, we study the ability of the algorithm to capture the importance of the features. Second, we use three open source survival datasets together with a set of survival algorithms in order to demonstrate how SurvLIMEpy behaves when applied to different models.
© 2024 Elsevier. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
2023-10-25T11:21:16Z
Pachón García, Cristian
Hernández Pérez, Carlos
Delicado Useros, Pedro Francisco
Vilaplana Besler, Verónica
In this paper we present SurvLIMEpy, an open-source Python package that implements the SurvLIME algorithm. This method allows to compute local feature importance for machine learning algorithms designed for modelling Survival Analysis data. The presented implementation uses a matrix-wise formulation, which allows to speed up the execution time. Additionally, SurvLIMEpy assists the user with visualisation tools to better understand the result of the algorithm. The package supports a wide variety of survival models, from the Cox Proportional Hazards Model to deep learning models such as DeepHit or DeepSurv. Two types of experiments are presented in this paper. First, by means of simulated data, we study the ability of the algorithm to capture the importance of the features. Second, we use three open source survival datasets together with a set of survival algorithms in order to demonstrate how SurvLIMEpy behaves when applied to different models.
-
Conditional poisson regression with random effects for the analysis of multi-site time series studies
http://hdl.handle.net/2117/394763
Conditional poisson regression with random effects for the analysis of multi-site time series studies
Barrera Gómez, Jose Antonio; Puig Oriol, Xavier; Ginebra Molins, Josep; Basagaña Flores, Xavier
The analysis of time series studies linking daily counts of a health indicator with environmental variables (e.g., mortality or hospital admissions with air pollution concentrations or temperature; or motor vehicle crashes with temperature) is usually conducted with Poisson regression models controlling for long-term and seasonal trends using temporal strata. When the study includes multiple zones, analysts usually apply a two-stage approach: first, each zone is analyzed separately, and the resulting zone-specific estimates are then combined using meta-analysis. This approach allows zone-specific control for trends. A one-stage approach uses spatio-temporal strata and could be seen as a particular case of the case–time series framework recently proposed. However, the number of strata can escalate very rapidly in a long time series with many zones. A computationally efficient alternative is to fit a conditional Poisson regression model, avoiding the estimation of the nuisance strata. To allow for zone-specific effects, we propose a conditional Poisson regression model with a random slope, although available frequentist software does not implement this model. Here, we implement our approach in the Bayesian paradigm, which also facilitates the inclusion of spatial patterns in the effect of interest. We also provide a possible extension to deal with overdispersed data. We first introduce the equations of the framework and then illustrate their application to data from a previously published study on the effects of temperature on the risk of motor vehicle crashes. We provide R code and a semi-synthetic dataset to reproduce all analyses presented.
2023-10-10T07:09:07Z
Barrera Gómez, Jose Antonio
Puig Oriol, Xavier
Ginebra Molins, Josep
Basagaña Flores, Xavier
The analysis of time series studies linking daily counts of a health indicator with environmental variables (e.g., mortality or hospital admissions with air pollution concentrations or temperature; or motor vehicle crashes with temperature) is usually conducted with Poisson regression models controlling for long-term and seasonal trends using temporal strata. When the study includes multiple zones, analysts usually apply a two-stage approach: first, each zone is analyzed separately, and the resulting zone-specific estimates are then combined using meta-analysis. This approach allows zone-specific control for trends. A one-stage approach uses spatio-temporal strata and could be seen as a particular case of the case–time series framework recently proposed. However, the number of strata can escalate very rapidly in a long time series with many zones. A computationally efficient alternative is to fit a conditional Poisson regression model, avoiding the estimation of the nuisance strata. To allow for zone-specific effects, we propose a conditional Poisson regression model with a random slope, although available frequentist software does not implement this model. Here, we implement our approach in the Bayesian paradigm, which also facilitates the inclusion of spatial patterns in the effect of interest. We also provide a possible extension to deal with overdispersed data. We first introduce the equations of the framework and then illustrate their application to data from a previously published study on the effects of temperature on the risk of motor vehicle crashes. We provide R code and a semi-synthetic dataset to reproduce all analyses presented.
-
Scalability evaluation of forecasting methods applied to bicycle sharing systems
http://hdl.handle.net/2117/394762
Scalability evaluation of forecasting methods applied to bicycle sharing systems
Cortez Ordóñez, Alexandra Piedad; Vázquez Alcocer, Pere Pau; Sánchez Espigares, Josep Anton
Public Bicycle Sharing Systems have spread in many cities for the last decade. The need of analysis tools to predict the behavior or estimate balancing needs has fostered a wide set of approaches that consider many variables. Often, these approaches use a single scenario to evaluate their algorithms, and little is known about the applicability of such algorithms in cities of different sizes. In this paper, we evaluate the performance of widely known prediction algorithms for three sized scenarios: a small system, with around 20 docking stations, a medium-sized one, with 400+ docking stations, and a large one, with more than 1500 stations. The results show that Prophet and Random Forest are the prediction algorithms with more consistent results, and that small systems often have not enough data for the algorithms to perform a solid work.
2023-10-10T06:53:20Z
Cortez Ordóñez, Alexandra Piedad
Vázquez Alcocer, Pere Pau
Sánchez Espigares, Josep Anton
Public Bicycle Sharing Systems have spread in many cities for the last decade. The need of analysis tools to predict the behavior or estimate balancing needs has fostered a wide set of approaches that consider many variables. Often, these approaches use a single scenario to evaluate their algorithms, and little is known about the applicability of such algorithms in cities of different sizes. In this paper, we evaluate the performance of widely known prediction algorithms for three sized scenarios: a small system, with around 20 docking stations, a medium-sized one, with 400+ docking stations, and a large one, with more than 1500 stations. The results show that Prophet and Random Forest are the prediction algorithms with more consistent results, and that small systems often have not enough data for the algorithms to perform a solid work.
-
Modeling SARS-CoV-2 true infections in Catalonia through a digital twin
http://hdl.handle.net/2117/391910
Modeling SARS-CoV-2 true infections in Catalonia through a digital twin
Fonseca Casas, Pau; Garcia Subirana, Joan; García Carrasco, Víctor
A dynamic view of the evolution of the infections of SARS-CoV-2 in Cataloniausing a Digital Twin approach that forecasts the true infection curve ispresented. The forecast model incorporates the vaccination process, theconfinement, and the detection rate, and virtually allows to consider anynonpharmaceutical intervention, enabling to understand their effects on thedisease’s containment while forecasting the trend of the pandemic. Acontinuous validation process of the model is performed using real data andan optimization model that automatically provides information regarding theeffects of the containment actions on the population. To simplify thisvalidation process, a formal graphical language that simplifies the interactionwith the different specialists and an easy modification of the modelparameters are used. The Digital Twin of the pandemic in Catalonia provides aforecast of the future trend of the SARS-CoV-2 spread and informationregarding the true cases and effectiveness of the NPIs to control theSARS-CoV-2 spread over the population. This approach can be applied easilyto other regions and can become an excellent tool for decision-making.
This is the peer reviewed version of the following article: Fonseca, P., Garcia, J. and Garcia, V., 2023. Modeling SARS-CoV-2 true infections in Catalonia through a digital twin. Advanced theory and simulations, (2200917), which has been published in final form at https://onlinelibrary.wiley.com/doi/10.1002/adts.202200917. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Use of Self-Archived Versions. This article may not be enhanced, enriched or otherwise transformed into a derivative work, without express permission from Wiley or by statutory rights under applicable legislation. Copyright notices must not be removed, obscured or modified. The article must be linked to Wiley’s version of record on Wiley Online Library and any embedding, framing or otherwise making available the article or pages thereof by third parties from platforms, services and websites other than Wiley Online Library must be prohibited.
2023-07-21T08:55:52Z
Fonseca Casas, Pau
Garcia Subirana, Joan
García Carrasco, Víctor
A dynamic view of the evolution of the infections of SARS-CoV-2 in Cataloniausing a Digital Twin approach that forecasts the true infection curve ispresented. The forecast model incorporates the vaccination process, theconfinement, and the detection rate, and virtually allows to consider anynonpharmaceutical intervention, enabling to understand their effects on thedisease’s containment while forecasting the trend of the pandemic. Acontinuous validation process of the model is performed using real data andan optimization model that automatically provides information regarding theeffects of the containment actions on the population. To simplify thisvalidation process, a formal graphical language that simplifies the interactionwith the different specialists and an easy modification of the modelparameters are used. The Digital Twin of the pandemic in Catalonia provides aforecast of the future trend of the SARS-CoV-2 spread and informationregarding the true cases and effectiveness of the NPIs to control theSARS-CoV-2 spread over the population. This approach can be applied easilyto other regions and can become an excellent tool for decision-making.
-
The landscape of expression and alternative splicing variation across human traits
http://hdl.handle.net/2117/391783
The landscape of expression and alternative splicing variation across human traits
García Pérez, Raquel; Ramírez Cardeñosa, José Miguel; Ripoll Cladellas, Aida; Chazarra Gil, Rubén; Oliveros Díez, Winona; Soldatkina, Oleksandra; Bosio, Mattia; Rognon, Paul Joris Denis; Capella Gutiérrez, Salvador; Calvo Llorca, Miguel; Reverter Comes, Ferran; Guigo Serra, Roderic; Aguet, François; Ferreira, Pedro G.; Ardlie, Kristin G.; Mele Messeguer, Marta
Understanding the consequences of individual transcriptome variation is fundamental to deciphering human biology and disease. We implement a statistical framework to quantify the contributions of 21 individual traits as drivers of gene expression and alternative splicing variation across 46 human tissues and 781 individuals from the Genotype-Tissue Expression project. We demonstrate that ancestry, sex, age, and BMI make additive and tissue-specific contributions to expression variability, whereas interactions are rare. Variation in splicing is dominated by ancestry and is under genetic control in most tissues, with ribosomal proteins showing a strong enrichment of tissue-shared splicing events. Our analyses reveal a systemic contribution of types 1 and 2 diabetes to tissue transcriptome variation with the strongest signal in the nerve, where histopathology image analysis identifies novel genes related to diabetic neuropathy. Our multi-tissue and multi-trait approach provides an extensive characterization of the main drivers of human transcriptome variation in health and disease.
2023-07-20T08:26:05Z
García Pérez, Raquel
Ramírez Cardeñosa, José Miguel
Ripoll Cladellas, Aida
Chazarra Gil, Rubén
Oliveros Díez, Winona
Soldatkina, Oleksandra
Bosio, Mattia
Rognon, Paul Joris Denis
Capella Gutiérrez, Salvador
Calvo Llorca, Miguel
Reverter Comes, Ferran
Guigo Serra, Roderic
Aguet, François
Ferreira, Pedro G.
Ardlie, Kristin G.
Mele Messeguer, Marta
Understanding the consequences of individual transcriptome variation is fundamental to deciphering human biology and disease. We implement a statistical framework to quantify the contributions of 21 individual traits as drivers of gene expression and alternative splicing variation across 46 human tissues and 781 individuals from the Genotype-Tissue Expression project. We demonstrate that ancestry, sex, age, and BMI make additive and tissue-specific contributions to expression variability, whereas interactions are rare. Variation in splicing is dominated by ancestry and is under genetic control in most tissues, with ribosomal proteins showing a strong enrichment of tissue-shared splicing events. Our analyses reveal a systemic contribution of types 1 and 2 diabetes to tissue transcriptome variation with the strongest signal in the nerve, where histopathology image analysis identifies novel genes related to diabetic neuropathy. Our multi-tissue and multi-trait approach provides an extensive characterization of the main drivers of human transcriptome variation in health and disease.
-
An approach based on simulation and optimisation for the intermodal dispatching of public transport and ride-pooling services
http://hdl.handle.net/2117/387471
An approach based on simulation and optimisation for the intermodal dispatching of public transport and ride-pooling services
Lorente García, Ester; Codina Sancho, Esteve; Barceló Bugeda, Jaime; Nökel, Klaus
This paper provides a simulation and optimisation-based system to combine public transport (PT) with ride-pooling services (RP). According to the International Transport Forum (ITF), the RP could be established as a feeder of PT and included as the first or last leg of the journey with the option of transferring to/from PT in between. The system contains a dispatching core that uses an optimisation model with heuristic parameters to quickly analyse the potential permutations for each request. This topic is frequently based on simplistic modelling in the literature, and it has not been extensively tested in major urban regions. The whole metropolitan region of Barcelona is employed in this study, with a large realistic simulation model encompassing a 20 × 15 km area with a PT network of about 3000 stations and 300 route lines and nearly 114,000 traffic links. This enables for a more accurate evaluation of system performance and trip quality computation.
2023-05-16T08:39:34Z
Lorente García, Ester
Codina Sancho, Esteve
Barceló Bugeda, Jaime
Nökel, Klaus
This paper provides a simulation and optimisation-based system to combine public transport (PT) with ride-pooling services (RP). According to the International Transport Forum (ITF), the RP could be established as a feeder of PT and included as the first or last leg of the journey with the option of transferring to/from PT in between. The system contains a dispatching core that uses an optimisation model with heuristic parameters to quickly analyse the potential permutations for each request. This topic is frequently based on simplistic modelling in the literature, and it has not been extensively tested in major urban regions. The whole metropolitan region of Barcelona is employed in this study, with a large realistic simulation model encompassing a 20 × 15 km area with a PT network of about 3000 stations and 300 route lines and nearly 114,000 traffic links. This enables for a more accurate evaluation of system performance and trip quality computation.
-
A mixture model application in monitoring error message rates for a distributed industrial fleet
http://hdl.handle.net/2117/380600
A mixture model application in monitoring error message rates for a distributed industrial fleet
Plandolit López, Bernat; Puig de Dou, Ignacio; Costigan, Gráinne; Puig Oriol, Xavier; Rodero de Lamo, Lourdes; Martínez Martínez, José Miguel
Remotely monitoring industrial printers for an unexpected increase of warning and error messages reduces equipment downtime and increases customer satisfaction. Directly tracking raw error messages rates during a given observation period poses some issues. Firstly, when a printer has not been used much during the observation period, its actual printing time is low. In this situation, even a small set of error messages can become an unexpectedly large rate of messages per printing hour. Secondly, classifying printers in error messages groups based on their rate (for instance, low, medium and high) and studying group changes over time, is useful in identifying potential problems. To overcome these issues, a nonparametric estimation method which simultaneously obtains empirical Bayes estimations of error messages rates and the number of error messages groups is used. This approach has been used in epidemiology, mainly in disease mapping research, but not in an industrial reliability context. The objective of our work is to show the application of the mixture model to real-time monitoring of printers’ error message rates in a way that addresses the two issues mentioned above.
This is an Accepted Manuscript of an article published by Taylor & Francis Group in Quality engineering on 2023, available online at: http://www.tandfonline.com/https://www.tandfonline.com/doi/full/10.1080/08982112.2022.2132866
2023-01-17T12:58:24Z
Plandolit López, Bernat
Puig de Dou, Ignacio
Costigan, Gráinne
Puig Oriol, Xavier
Rodero de Lamo, Lourdes
Martínez Martínez, José Miguel
Remotely monitoring industrial printers for an unexpected increase of warning and error messages reduces equipment downtime and increases customer satisfaction. Directly tracking raw error messages rates during a given observation period poses some issues. Firstly, when a printer has not been used much during the observation period, its actual printing time is low. In this situation, even a small set of error messages can become an unexpectedly large rate of messages per printing hour. Secondly, classifying printers in error messages groups based on their rate (for instance, low, medium and high) and studying group changes over time, is useful in identifying potential problems. To overcome these issues, a nonparametric estimation method which simultaneously obtains empirical Bayes estimations of error messages rates and the number of error messages groups is used. This approach has been used in epidemiology, mainly in disease mapping research, but not in an industrial reliability context. The objective of our work is to show the application of the mixture model to real-time monitoring of printers’ error message rates in a way that addresses the two issues mentioned above.
-
Development of a platform for the assessment of demand-side flexibility in a microgrid laboratory
http://hdl.handle.net/2117/380368
Development of a platform for the assessment of demand-side flexibility in a microgrid laboratory
Etxandi Santolaya, Maite; Colet Subirachs, Alba; Barbero, Mattia; Corchero García, Cristina
Demand-side flexibility has gained attention as a powerful tool to increase the flexibility of the electricity system and counteract the uncertainties caused by the increase of Renewable Energy Sources. Up to date, few markets allow the participation of Demand Aggregators, which are key to make use of the flexibility of small consumers. Therefore, research surrounding demand aggregation is in many cases limited to simulations or resource consuming pilot programs. This project integrates a microgrid laboratory with a commercial aggregation platform in order to set up and configure the necessary tools to operate the laboratory as a platform to test flexibility. The flexibility platform defined in this work offers a customizable and controllable environment for Demand Response and aggregation testing, while providing a realistic assessment due to the consideration of a commercial aggregator and the use of real and emulated devices. As a first application of the platform, two customer types have been defined and tested: a residential one with a Heating Ventilation and Air Conditioning Unit and a prosumer owning a second-life Electric Vehicle battery in a solar Photovoltaic self- consumption system. The scenarios have shown how, for defined users, the interaction with the aggregator can be beneficial for all sides, as long as proper activations and incentives are defined for the customers. Through future applications of the platform, new use cases can be covered and used to gather valuable information for the aggregator or any interested stakeholder.
2023-01-13T08:08:06Z
Etxandi Santolaya, Maite
Colet Subirachs, Alba
Barbero, Mattia
Corchero García, Cristina
Demand-side flexibility has gained attention as a powerful tool to increase the flexibility of the electricity system and counteract the uncertainties caused by the increase of Renewable Energy Sources. Up to date, few markets allow the participation of Demand Aggregators, which are key to make use of the flexibility of small consumers. Therefore, research surrounding demand aggregation is in many cases limited to simulations or resource consuming pilot programs. This project integrates a microgrid laboratory with a commercial aggregation platform in order to set up and configure the necessary tools to operate the laboratory as a platform to test flexibility. The flexibility platform defined in this work offers a customizable and controllable environment for Demand Response and aggregation testing, while providing a realistic assessment due to the consideration of a commercial aggregator and the use of real and emulated devices. As a first application of the platform, two customer types have been defined and tested: a residential one with a Heating Ventilation and Air Conditioning Unit and a prosumer owning a second-life Electric Vehicle battery in a solar Photovoltaic self- consumption system. The scenarios have shown how, for defined users, the interaction with the aggregator can be beneficial for all sides, as long as proper activations and incentives are defined for the customers. Through future applications of the platform, new use cases can be covered and used to gather valuable information for the aggregator or any interested stakeholder.
-
Comparing the impacts of sustainability narratives on american and european energy shareholders: a multi-event study analysing reactions to news before and during COVID-19
http://hdl.handle.net/2117/379663
Comparing the impacts of sustainability narratives on american and european energy shareholders: a multi-event study analysing reactions to news before and during COVID-19
Barroso del Toro, Alberto; Vivas Crisol, Laura; Tort-Martorell Llabrés, Xavier
This study analysed how positive, neutral, and negative sustainability news impacted the share prices of American and European energy companies, focusing on short-term market reactions. Our goal was to understand whether or not the sustainability narrative had similar effects on share-holder behaviour in both markets, and whether the COVID-19 pandemic changed the way shareholders invested as they faced uncertainty. We used the event study methodology to analyse the cumulative average abnormal returns (CAAR). We gathered 2134 event studies according to the type of energy source (renewable, fossil fuel or nuclear) and news sentiments. We analysed all global and digital news on sustainability from 2017 to 2020 using the GDELT news database as a source of information, which contains 295,093 viral news stories (high-volume news). The results showed notable differences between the American and European market reactions. The American market was much more optimistic, particularly during the pandemic. At the same time, the European market was more negative, showing declines in prices even in the face of positive news about nuclear and renewable energy. Nevertheless, both markets agreed that nuclear power was still on investors’ agenda. Finally, fossil fuels were less penalised by investors following negative or neutral news than other types of energy and were equally or more rewarded following positive news. So, it could be concluded that fossil fuel investors were less impacted by negative news about the energy market before and during COVID-19. These results could be relevant for policy makers in the context of changing the current shareholders’ narratives and incentives towards an effective sustainable energy transition through the use of new incentives/legislations.
2023-01-10T12:43:10Z
Barroso del Toro, Alberto
Vivas Crisol, Laura
Tort-Martorell Llabrés, Xavier
This study analysed how positive, neutral, and negative sustainability news impacted the share prices of American and European energy companies, focusing on short-term market reactions. Our goal was to understand whether or not the sustainability narrative had similar effects on share-holder behaviour in both markets, and whether the COVID-19 pandemic changed the way shareholders invested as they faced uncertainty. We used the event study methodology to analyse the cumulative average abnormal returns (CAAR). We gathered 2134 event studies according to the type of energy source (renewable, fossil fuel or nuclear) and news sentiments. We analysed all global and digital news on sustainability from 2017 to 2020 using the GDELT news database as a source of information, which contains 295,093 viral news stories (high-volume news). The results showed notable differences between the American and European market reactions. The American market was much more optimistic, particularly during the pandemic. At the same time, the European market was more negative, showing declines in prices even in the face of positive news about nuclear and renewable energy. Nevertheless, both markets agreed that nuclear power was still on investors’ agenda. Finally, fossil fuels were less penalised by investors following negative or neutral news than other types of energy and were equally or more rewarded following positive news. So, it could be concluded that fossil fuel investors were less impacted by negative news about the energy market before and during COVID-19. These results could be relevant for policy makers in the context of changing the current shareholders’ narratives and incentives towards an effective sustainable energy transition through the use of new incentives/legislations.