Reports de recerca

Reports de recerca http://hdl.handle.net/2117/3943 2024-04-17T09:39:38Z The role of life course and gender in mobility patterns: a spatiotemporal sequence analysis in Barcelona http://hdl.handle.net/2117/394624 The role of life course and gender in mobility patterns: a spatiotemporal sequence analysis in Barcelona Montero Mercadé, Lídia; Mejía-Dorantes, Lucía; Barceló Bugeda, Jaime Citizens participate in various activities to fulfill their needs, advance their socio-economic status, and enhance their well-being through social and health-related engagements. However, activity participation is influenced by many factors in the built environment, such as the spatial and temporal dissemination of activities, which therefore necessitate travel to overcome physical distances. Moreover, individual attributes such as gender, daily schedules, and other socio-economic characteristics also influence mobility patterns. In this paper, we aim to investigate these factors in the specific context of the Barcelona Metropolitan Area using three different samples of residents from annual mobility surveys conducted between 2018 and 2020. To this end, we employ a sequence analysis method that examines the entire trajectory of an individual’s daily activities and travel, considering the number, order, and duration of activities. In this way, we analyse in detail how various individual characteristics and the built environment influence the fragmentation of activities. Our study yields multiple results. First, we find that even in a transport-oriented city, the fragmentation of activities is shaped by gender, especially after age 30, when major changes occur in an individual’s life course, in particular caring responsibilities and family status. Second, we observe that educational level and year of the sample also play a central role in shaping mobility patterns. Finally, our paper makes a methodological contribution by defining sequence distances, after projecting the original space onto the factorial one defined by the Multiple Correspondence Analysis. This study shows that mobility policies should not focus solely on transport aspects, but also consider the built environment, dwelling location, gender, equity, and individual lifetime characteristics in an integrated manner. 2023-10-04T14:17:54Z Montero Mercadé, Lídia Mejía-Dorantes, Lucía Barceló Bugeda, Jaime Citizens participate in various activities to fulfill their needs, advance their socio-economic status, and enhance their well-being through social and health-related engagements. However, activity participation is influenced by many factors in the built environment, such as the spatial and temporal dissemination of activities, which therefore necessitate travel to overcome physical distances. Moreover, individual attributes such as gender, daily schedules, and other socio-economic characteristics also influence mobility patterns. In this paper, we aim to investigate these factors in the specific context of the Barcelona Metropolitan Area using three different samples of residents from annual mobility surveys conducted between 2018 and 2020. To this end, we employ a sequence analysis method that examines the entire trajectory of an individual’s daily activities and travel, considering the number, order, and duration of activities. In this way, we analyse in detail how various individual characteristics and the built environment influence the fragmentation of activities. Our study yields multiple results. First, we find that even in a transport-oriented city, the fragmentation of activities is shaped by gender, especially after age 30, when major changes occur in an individual’s life course, in particular caring responsibilities and family status. Second, we observe that educational level and year of the sample also play a central role in shaping mobility patterns. Finally, our paper makes a methodological contribution by defining sequence distances, after projecting the original space onto the factorial one defined by the Multiple Correspondence Analysis. This study shows that mobility policies should not focus solely on transport aspects, but also consider the built environment, dwelling location, gender, equity, and individual lifetime characteristics in an integrated manner. Interpreting machine learning models for survival analysis: a study of cutaneous melanoma using the SEER database http://hdl.handle.net/2117/390006 Interpreting machine learning models for survival analysis: a study of cutaneous melanoma using the SEER database Hernández Pérez, Carlos; Pachón García, Cristian; Delicado Useros, Pedro Francisco; Vilaplana Besler, Verónica In this study, we train and compare three types of machine learning algorithms for Survival Analysis: Random Survival Forest, DeepSurv and DeepHit, using the SEER database to model cutaneous malignant melanoma. Additionally, we employ SurvLIMEpy library, a Python package designed to provide explainability for survival machine learning models, to analyse feature importance. The results demonstrate that machine learning algorithms outperform the Cox Proportional Hazards Model. Our work underscores the importance of explainability methods for interpreting black-box models and provides insights into important features related to melanoma prognosis. 2023-06-30T11:28:18Z Hernández Pérez, Carlos Pachón García, Cristian Delicado Useros, Pedro Francisco Vilaplana Besler, Verónica In this study, we train and compare three types of machine learning algorithms for Survival Analysis: Random Survival Forest, DeepSurv and DeepHit, using the SEER database to model cutaneous malignant melanoma. Additionally, we employ SurvLIMEpy library, a Python package designed to provide explainability for survival machine learning models, to analyse feature importance. The results demonstrate that machine learning algorithms outperform the Cox Proportional Hazards Model. Our work underscores the importance of explainability methods for interpreting black-box models and provides insights into important features related to melanoma prognosis. Analysing gender equality in Barcelona through (spatiotemporal) segmentation http://hdl.handle.net/2117/371407 Analysing gender equality in Barcelona through (spatiotemporal) segmentation Montero Mercadé, Lídia; Mejía-Dorantes, Lucía; Barceló Bugeda, Jaime Citizens take part in different activities to satisfy their needs, to invest in their socio-economic progress, participate in social and health activities that improve their well-being. However, activity participation is influenced by many factors in the built environment, but also individual’s attributes. Herein we analyze activity participation and travel through sequence analysis. This method explores sequences of daily activity and travel employing techniques from the sequencing of events in the life course of individuals. Studying sequences of daily episodes (each activity and each trip) considers the entire trajectory of a person’s activity during a day while at the same time considering the number of activities, order of activities in a day, and their durations jointly. We applied this method to a sample of residents in the Metropolitan Area of Barcelona (RMB) in the 2018, 2019 and 2020 EMEF Travel Surveys. The EMEF2020 deserves a particular analysis since activity patterns are expected to vary compared to pre-COVID19 spread. We have focused on that fragmentation in activity participation over the mean among persons in specific gender, age, activity and transportation mode. 2022-07-28T06:48:02Z Montero Mercadé, Lídia Mejía-Dorantes, Lucía Barceló Bugeda, Jaime Citizens take part in different activities to satisfy their needs, to invest in their socio-economic progress, participate in social and health activities that improve their well-being. However, activity participation is influenced by many factors in the built environment, but also individual’s attributes. Herein we analyze activity participation and travel through sequence analysis. This method explores sequences of daily activity and travel employing techniques from the sequencing of events in the life course of individuals. Studying sequences of daily episodes (each activity and each trip) considers the entire trajectory of a person’s activity during a day while at the same time considering the number of activities, order of activities in a day, and their durations jointly. We applied this method to a sample of residents in the Metropolitan Area of Barcelona (RMB) in the 2018, 2019 and 2020 EMEF Travel Surveys. The EMEF2020 deserves a particular analysis since activity patterns are expected to vary compared to pre-COVID19 spread. We have focused on that fragmentation in activity participation over the mean among persons in specific gender, age, activity and transportation mode. Alimentación de un modelo de simulación mediante una conexión entre un sistema de información, R y SDLPS http://hdl.handle.net/2117/371281 Alimentación de un modelo de simulación mediante una conexión entre un sistema de información, R y SDLPS Leiva Olmos, Jorge Rodrigo; Fonseca Casas, Pau; Ocaña Rebull, Jordi Este trabajo desarrolla una metodología para alimentar de forma más automatizada un modelo de simulación. Para ello se generó un código de programación hecho con el lenguaje R que: 1) se conecta con base de datos; 2) valida los datos y 3) alimenta un modelo de simulación descrito en SDL e implementado en el software de simulación SDLPS, el cual lee la información generada en R. Este desarrollo se aplicó a un proceso de un hospital chileno. Los principales beneficios son: Es aplicable a distintas áreas y procesos; aumenta la oportunidad de la verificación y validación operacional del modelo; facilita el monitoreo del sistema en periodos más cortos y permite la experimentación más temprana de distintos escenarios para evaluar y planificar soluciones ante eventuales problemas. Redacción del paper: alimentación de un modelo de simulación mediante una conexión entre un sistema de información, R y SDLPS 2022-07-27T10:05:43Z Leiva Olmos, Jorge Rodrigo Fonseca Casas, Pau Ocaña Rebull, Jordi Este trabajo desarrolla una metodología para alimentar de forma más automatizada un modelo de simulación. Para ello se generó un código de programación hecho con el lenguaje R que: 1) se conecta con base de datos; 2) valida los datos y 3) alimenta un modelo de simulación descrito en SDL e implementado en el software de simulación SDLPS, el cual lee la información generada en R. Este desarrollo se aplicó a un proceso de un hospital chileno. Los principales beneficios son: Es aplicable a distintas áreas y procesos; aumenta la oportunidad de la verificación y validación operacional del modelo; facilita el monitoreo del sistema en periodos más cortos y permite la experimentación más temprana de distintos escenarios para evaluar y planificar soluciones ante eventuales problemas. 5È. Audit clínic de l'ictus. Catalunya 2018/19 http://hdl.handle.net/2117/364727 5È. Audit clínic de l'ictus. Catalunya 2018/19 Salvat Plana, Mercè; Pérez de la Ossa, Natalia; Cortés Martínez, Jordi; Ayesta, Mercè; Gallofré, Guillem S’han auditat 4.008 casos ingressats per ictus agut entre 2018 i 2019. El període d’estudi s’ha ampliat a 6 mesos, en períodes de 1 mes i mig al llarg de 12 mesos, similars per a tots els centres. L’obtenció prospectiva de dades s’ha realitzat majoritàriament per infermeres de cada hospital. La mediana del temps entre l’inici dels símptomes i l’arribada a urgències va ser de 2,1 hores. El 72% dels casos van arribar a l’hospital dins les primeres quatre hores i mitja. Respecte al 4t Audit: * Augmenten els ingressos en Unitat d’ictus agut (44,2% a 61,3%), les activacions del Codi Ictus (42,9 a 61,4; realitzades pel SEM de 43,4 a 67,8%) i els tractaments de reperfusió (16% a 30% dels ictus isquèmics) * Augmenta el nombre de pacients en els que es diagnostica durant l’ingrés una fibril·lació auricular no coneguda prèviament (7% a 18,8%). * Hi ha un lleuger augment de les pneumònies(6% a 8%)ibaixa la mor talitat intrahospitalària (12% a 9%). * Sis indicadors de qualitat milloren significativament, tres 3 indicadors es mantenen i 3 indicadors empitjoren. Destaca una important millora en alguns indicadors de qualitat rellevants com són la realització del test de disfàgia, l’avaluació del perfil lipídic, l’educació sanitària als pacients i familiars, el registre de l’etiologia de l’ictus i la utilització d’escales neurològiques. Es necessiten accions de milloradels indicadors següents: pauta d’antitrombòtics abans de 48 hores, mobilització precoç i avaluació de l’estat d’ànim. L’estat d’ànim s’avalua en un baix percentatge i es fa servir una gran variabilitat d’eines de mesura. La millora i el manteniment continu de la qualitat de l’atenció als malalts amb ictus agut requereix una avaluació periòdica de la pràctica clínica. Els Audits de l’Ictus són l’instrument avaluatiu del PDMVC. La millora dels seus resultats pretén garantir la millora dels resultats dels pacients. 2022-03-22T11:43:41Z Salvat Plana, Mercè Pérez de la Ossa, Natalia Cortés Martínez, Jordi Ayesta, Mercè Gallofré, Guillem S’han auditat 4.008 casos ingressats per ictus agut entre 2018 i 2019. El període d’estudi s’ha ampliat a 6 mesos, en períodes de 1 mes i mig al llarg de 12 mesos, similars per a tots els centres. L’obtenció prospectiva de dades s’ha realitzat majoritàriament per infermeres de cada hospital. La mediana del temps entre l’inici dels símptomes i l’arribada a urgències va ser de 2,1 hores. El 72% dels casos van arribar a l’hospital dins les primeres quatre hores i mitja. Respecte al 4t Audit: * Augmenten els ingressos en Unitat d’ictus agut (44,2% a 61,3%), les activacions del Codi Ictus (42,9 a 61,4; realitzades pel SEM de 43,4 a 67,8%) i els tractaments de reperfusió (16% a 30% dels ictus isquèmics) * Augmenta el nombre de pacients en els que es diagnostica durant l’ingrés una fibril·lació auricular no coneguda prèviament (7% a 18,8%). * Hi ha un lleuger augment de les pneumònies(6% a 8%)ibaixa la mor talitat intrahospitalària (12% a 9%). * Sis indicadors de qualitat milloren significativament, tres 3 indicadors es mantenen i 3 indicadors empitjoren. Destaca una important millora en alguns indicadors de qualitat rellevants com són la realització del test de disfàgia, l’avaluació del perfil lipídic, l’educació sanitària als pacients i familiars, el registre de l’etiologia de l’ictus i la utilització d’escales neurològiques. Es necessiten accions de milloradels indicadors següents: pauta d’antitrombòtics abans de 48 hores, mobilització precoç i avaluació de l’estat d’ànim. L’estat d’ànim s’avalua en un baix percentatge i es fa servir una gran variabilitat d’eines de mesura. On solving large-scale multistage stochastic problems with a new specialized interior-point approach http://hdl.handle.net/2117/359313 On solving large-scale multistage stochastic problems with a new specialized interior-point approach Castro Pérez, Jordi; Escudero Bueno, Laureano F.; Monge Ivars, Juan Francisco A novel approach based on a specialized interior-point method (IPM) is presented for solving large-scale stochastic multistage continuous optimization problems, which represent the uncertainty in strategic multistage and operational two-stage scenario trees, the latter being rooted at the strategic nodes. This new solution approach considers a split-variable formulation of the strategic and operational structures, for which copies are made of the strategic nodes and the structures are rooted in the form of nested strategic-operational two-stage trees. The specialized IPM solves the normal equations of the problem’s Newton system by combining Cholesky factorizations with preconditioned conjugate gradients, doing so for, respectively, the constraints of the stochastic formulation and those that equate the split-variables. We show that, for multistage stochastic problems, the preconditioner (i) is a block-diagonal matrix composed of as many shifted tridiagonal matrices as the number of nested strategicoperational two-stage trees, thus allowing the efficient solution of systems of equations; (ii) its complexity in a multistage stochastic problem is equivalent to that of a very large-scale two-stage problem. A broad computational experience is reported for large multistage stochastic supply network design (SND) and revenue management (RM) problems; the mathematical structures vary greatly for those two application types. Some of the most difficult instances of SND had 5 stages, 839 million variables, 13 million quadratic variables, 21 million constraints, and 3750 scenario tree nodes; while those of RM had 8 stages, 278 million variables, 100 million constraints, and 100,000 scenario tree nodes. For those problems, the proposed approach obtained the solution in 2.3 days using 167 gigabytes of memory for SND, and in 1.7 days using 83 gigabytes for RM; while the state-of-the-art solver CPLEX v20.1 required more than 24 days and 526 gigabytes for SND, and more than 19 days and 410 gigabytes for RM 2022-01-11T14:25:38Z Castro Pérez, Jordi Escudero Bueno, Laureano F. Monge Ivars, Juan Francisco A novel approach based on a specialized interior-point method (IPM) is presented for solving large-scale stochastic multistage continuous optimization problems, which represent the uncertainty in strategic multistage and operational two-stage scenario trees, the latter being rooted at the strategic nodes. This new solution approach considers a split-variable formulation of the strategic and operational structures, for which copies are made of the strategic nodes and the structures are rooted in the form of nested strategic-operational two-stage trees. The specialized IPM solves the normal equations of the problem’s Newton system by combining Cholesky factorizations with preconditioned conjugate gradients, doing so for, respectively, the constraints of the stochastic formulation and those that equate the split-variables. We show that, for multistage stochastic problems, the preconditioner (i) is a block-diagonal matrix composed of as many shifted tridiagonal matrices as the number of nested strategicoperational two-stage trees, thus allowing the efficient solution of systems of equations; (ii) its complexity in a multistage stochastic problem is equivalent to that of a very large-scale two-stage problem. A broad computational experience is reported for large multistage stochastic supply network design (SND) and revenue management (RM) problems; the mathematical structures vary greatly for those two application types. Some of the most difficult instances of SND had 5 stages, 839 million variables, 13 million quadratic variables, 21 million constraints, and 3750 scenario tree nodes; while those of RM had 8 stages, 278 million variables, 100 million constraints, and 100,000 scenario tree nodes. For those problems, the proposed approach obtained the solution in 2.3 days using 167 gigabytes of memory for SND, and in 1.7 days using 83 gigabytes for RM; while the state-of-the-art solver CPLEX v20.1 required more than 24 days and 526 gigabytes for SND, and more than 19 days and 410 gigabytes for RM New interior-point approach for one- and two-class linear support vector machines using multiple variable splitting http://hdl.handle.net/2117/359311 New interior-point approach for one- and two-class linear support vector machines using multiple variable splitting Castro Pérez, Jordi Multiple variable splitting is a general technique for decomposing problems by using copies of variables and additional linking constraints that equate their values. The resulting large optimization problem can be solved with a specialized interior-point method that exploits the problem structure and computes the Newton direction with a combination of direct and iterative solvers (i.e., Cholesky factorizations and preconditioned conjugate gradients for linear systems related to, respectively, subproblems and new linking constraints). The present work applies this method to solving real-world binary classification and novelty (or outlier) detection problems by means of, respectively, two-class and one-class linear support vector machines (SVMs). Unlike previous interior-point approaches for SVMs, which were practical only with low-dimensional points, the new proposal can also deal with high-dimensional data. The new method is compared with state-of-the-art solvers for SVMs, that are based on either interior-point algorithms (such as SVM-OOPS) or specific algorithms developed by the machine learning community (such as LIBSVM and LIBLINEAR). The computational results show that, for two-class SVMs, the new proposal is competitive not only against previous interior-point methods—and much more efficient than they are with high-dimensional data—but also against LIBSVM; whereas LIBLINEAR generally outperformed the proposal. For one-class SVMs, the new method consistently outperformed all other approaches, in terms of either solution time or solution quality 2022-01-11T14:20:31Z Castro Pérez, Jordi Multiple variable splitting is a general technique for decomposing problems by using copies of variables and additional linking constraints that equate their values. The resulting large optimization problem can be solved with a specialized interior-point method that exploits the problem structure and computes the Newton direction with a combination of direct and iterative solvers (i.e., Cholesky factorizations and preconditioned conjugate gradients for linear systems related to, respectively, subproblems and new linking constraints). The present work applies this method to solving real-world binary classification and novelty (or outlier) detection problems by means of, respectively, two-class and one-class linear support vector machines (SVMs). Unlike previous interior-point approaches for SVMs, which were practical only with low-dimensional points, the new proposal can also deal with high-dimensional data. The new method is compared with state-of-the-art solvers for SVMs, that are based on either interior-point algorithms (such as SVM-OOPS) or specific algorithms developed by the machine learning community (such as LIBSVM and LIBLINEAR). The computational results show that, for two-class SVMs, the new proposal is competitive not only against previous interior-point methods—and much more efficient than they are with high-dimensional data—but also against LIBSVM; whereas LIBLINEAR generally outperformed the proposal. For one-class SVMs, the new method consistently outperformed all other approaches, in terms of either solution time or solution quality Transport analytics approaches to the dynamic origin-destination estimation problem http://hdl.handle.net/2117/344273 Transport analytics approaches to the dynamic origin-destination estimation problem Ros Roca, Xavier; Montero Mercadé, Lídia; Barceló Bugeda, Jaime; Noëkel, Klaus; Gentile, Guido Dynamic traffic models require dynamic inputs, and one of the main inputs are the Dynamic Origin-Destinations (OD) matrices describing the variability over time of the trip patterns across the network. The Dynamic OD Matrix Estimation (DODME) is a hard problem since no direct full observations are available, and therefore one should resort to indirect estimation approaches. Among the most efficient approaches, the one that formulates the problem in terms of a bilevel optimization problem has been widely used. This formulation solves at the upper level a nonlinear optimization that minimizes some distance measures between observed and estimated link flow counts at certain counting stations located in a subset of links in the network, and at the lower level a traffic assignment that estimates these link flow counts assigning the current estimated matrix. The variants of this formulation differ in the analytical approaches that estimate the link flows in terms of the assignment and their time dependencies. Since these estimations are based on a traffic assignment at the lower level, these analytical approaches, although numerically efficient, imply a high computational cost. The advent of ICT applications has made available new sets of traffic related measurements enabling new approaches; under certain conditions, the data collected on used paths could be interpreted as an, de facto, estimated assignment observed . This allows extracting empirically the same information provided by an assignment that is used in the analytical approaches. This research report explores how to extract such information from the recorded data. The Dynamic OD Matrix Estimation (DODME) is a hard problem since no direct full observations are available, and therefore one should resort to indirect estimation approaches. This formulation solves at the upper level a nonlinear optimization that minimizes some distance measures between observed and estimated link flow counts at certain counting stations located in a subset of links in the network, and at the lower level a traffic assignment that estimates these link flow counts assigning the current estimated matrix. Since these estimations are based on a traffic assignment at the lower level, these analytical approaches, although numerically efficient, imply a high computational cost. The advent of ICT applications has made available new sets of traffic related measurements enabling new approaches. This research report explores how to extract such information from the recorded data. 2021-04-23T11:23:37Z Ros Roca, Xavier Montero Mercadé, Lídia Barceló Bugeda, Jaime Noëkel, Klaus Gentile, Guido Dynamic traffic models require dynamic inputs, and one of the main inputs are the Dynamic Origin-Destinations (OD) matrices describing the variability over time of the trip patterns across the network. The Dynamic OD Matrix Estimation (DODME) is a hard problem since no direct full observations are available, and therefore one should resort to indirect estimation approaches. Among the most efficient approaches, the one that formulates the problem in terms of a bilevel optimization problem has been widely used. This formulation solves at the upper level a nonlinear optimization that minimizes some distance measures between observed and estimated link flow counts at certain counting stations located in a subset of links in the network, and at the lower level a traffic assignment that estimates these link flow counts assigning the current estimated matrix. The variants of this formulation differ in the analytical approaches that estimate the link flows in terms of the assignment and their time dependencies. Since these estimations are based on a traffic assignment at the lower level, these analytical approaches, although numerically efficient, imply a high computational cost. The advent of ICT applications has made available new sets of traffic related measurements enabling new approaches; under certain conditions, the data collected on used paths could be interpreted as an, de facto, estimated assignment observed . This allows extracting empirically the same information provided by an assignment that is used in the analytical approaches. This research report explores how to extract such information from the recorded data. An algorithm for the microaggregation problem using column generation http://hdl.handle.net/2117/335179 An algorithm for the microaggregation problem using column generation Gentile, Claudio; Spagnolo Arrizabalaga, Enric; Castro Pérez, Jordi The field of Statistical Disclosure Control aims at reducing the risk of re-identification of an individualwhen disseminating data, and it is one of the main concerns of national statistical agencies. OperationsResearch (OR) techniques were widely used in the past for the protection of tabular data, but not formicrodata (i.e., files of individuals and attributes). This work presents (as far as we know, for the firsttime) an application of OR techniques for the microaggregation problem, which is considered one thebest methods for microdata protection and it is known to be NP-hard.The new heuristic approach is based on a column generation scheme and, unlike previous (primal)heuristics for microaggregation, it also provides a lower bound on the optimal microaggregation. Com-putational results on real data typically used in the literature show that solutions with small gaps areoften achieved and that dramatic improvements are obtained with respect to the most popular heuristicsin the literature. 2021-01-12T13:51:32Z Gentile, Claudio Spagnolo Arrizabalaga, Enric Castro Pérez, Jordi The field of Statistical Disclosure Control aims at reducing the risk of re-identification of an individualwhen disseminating data, and it is one of the main concerns of national statistical agencies. OperationsResearch (OR) techniques were widely used in the past for the protection of tabular data, but not formicrodata (i.e., files of individuals and attributes). This work presents (as far as we know, for the firsttime) an application of OR techniques for the microaggregation problem, which is considered one thebest methods for microdata protection and it is known to be NP-hard.The new heuristic approach is based on a column generation scheme and, unlike previous (primal)heuristics for microaggregation, it also provides a lower bound on the optimal microaggregation. Com-putational results on real data typically used in the literature show that solutions with small gaps areoften achieved and that dramatic improvements are obtained with respect to the most popular heuristicsin the literature. KLASS: estudi d'un sistema d'ajuda al tractament estadístic de grans bases de dades (Master Thesis) http://hdl.handle.net/2117/329557 KLASS: estudi d'un sistema d'ajuda al tractament estadístic de grans bases de dades (Master Thesis) Gibert, Karina 2020-09-30T12:13:22Z Gibert, Karina