Articles de revistahttp://hdl.handle.net/2117/36872024-03-29T14:58:19Z2024-03-29T14:58:19ZLONG-REMI: An AI-based technological application to promote healthy mental longevity grounded in reminiscence therapyNebot Castells, M. ÀngelaDomenech Pou, SaraAlbino-Pires, NatáliaMúgica Álvarez, FranciscoBenali, AnassPorta, XèniaNebot Mugica, OriolSantos, Pedro M.http://hdl.handle.net/2117/3682182022-06-12T15:42:14Z2022-06-09T12:42:28ZLONG-REMI: An AI-based technological application to promote healthy mental longevity grounded in reminiscence therapy
Nebot Castells, M. Àngela; Domenech Pou, Sara; Albino-Pires, Natália; Múgica Álvarez, Francisco; Benali, Anass; Porta, Xènia; Nebot Mugica, Oriol; Santos, Pedro M.
Reminiscence therapy (RT) consists of thinking about one’s own experiences through the presentation of memory-facilitating stimuli, and it has as its fundamental axis the activation of emotions. An innovative way of offering RT involves the use of technology-assisted applications, which must also satisfy the needs of the user. This study aimed to develop an AI-based computer application that recreates RT in a personalized way, meeting the characteristics of RT guided by a therapist or a caregiver. The material guiding RT focuses on intangible cultural heritage. The application incorporates facial expression analysis and reinforcement learning techniques, with the aim of identifying the user’s emotions and, with them, guiding the computer system that emulates RT dynamically and in real time. A pilot study was carried out at five senior centers in Barcelona and Portugal. The results obtained are very positive, showing high user satisfaction. Moreover, the results indicate that the high frequency of positive emotions increased in the participants at the end of the intervention, while the low frequencies of negative emotions were maintained at the end of the intervention.
2022-06-09T12:42:28ZNebot Castells, M. ÀngelaDomenech Pou, SaraAlbino-Pires, NatáliaMúgica Álvarez, FranciscoBenali, AnassPorta, XèniaNebot Mugica, OriolSantos, Pedro M.Reminiscence therapy (RT) consists of thinking about one’s own experiences through the presentation of memory-facilitating stimuli, and it has as its fundamental axis the activation of emotions. An innovative way of offering RT involves the use of technology-assisted applications, which must also satisfy the needs of the user. This study aimed to develop an AI-based computer application that recreates RT in a personalized way, meeting the characteristics of RT guided by a therapist or a caregiver. The material guiding RT focuses on intangible cultural heritage. The application incorporates facial expression analysis and reinforcement learning techniques, with the aim of identifying the user’s emotions and, with them, guiding the computer system that emulates RT dynamically and in real time. A pilot study was carried out at five senior centers in Barcelona and Portugal. The results obtained are very positive, showing high user satisfaction. Moreover, the results indicate that the high frequency of positive emotions increased in the participants at the end of the intervention, while the low frequencies of negative emotions were maintained at the end of the intervention.Tracking a well diversified portfolio with maximum entropy in the meanArratia Quesada, Argimiro AlejandroGzyl, HenrykMayoral Blaya, Silviahttp://hdl.handle.net/2117/3635052022-03-13T19:59:46Z2022-03-07T09:18:56ZTracking a well diversified portfolio with maximum entropy in the mean
Arratia Quesada, Argimiro Alejandro; Gzyl, Henryk; Mayoral Blaya, Silvia
In this work we address the following problem: Having chosen a well diversified portfolio, we show how to improve on its return, maintaining the diversification. In order to achieve this boost on return we construct a neighborhood of the well diversified portfolio and find a portfolio that maximizes the return in that neighborhood. For that we use the method of maximum entropy in the mean to find a portfolio that yields any possible return up to the maximum return within the neighborhood. The implicit bonus of the method is that if the benchmark portfolio has acceptable risk and diversification, the portfolio of maximum return in that neighborhood will also have acceptable risk and diversification
2022-03-07T09:18:56ZArratia Quesada, Argimiro AlejandroGzyl, HenrykMayoral Blaya, SilviaIn this work we address the following problem: Having chosen a well diversified portfolio, we show how to improve on its return, maintaining the diversification. In order to achieve this boost on return we construct a neighborhood of the well diversified portfolio and find a portfolio that maximizes the return in that neighborhood. For that we use the method of maximum entropy in the mean to find a portfolio that yields any possible return up to the maximum return within the neighborhood. The implicit bonus of the method is that if the benchmark portfolio has acceptable risk and diversification, the portfolio of maximum return in that neighborhood will also have acceptable risk and diversificationMisreported longitudinal data in epidemiology: review of mixture-based advances and current challengesMoriña, DavidFernandez Fontelo, AmandaCabaña Nigro, Ana AlejandraArratia Quesada, Argimiro AlejandroPuig Casado, Perehttp://hdl.handle.net/2117/3634322022-03-04T12:40:24Z2022-03-04T12:36:42ZMisreported longitudinal data in epidemiology: review of mixture-based advances and current challenges
Moriña, David; Fernandez Fontelo, Amanda; Cabaña Nigro, Ana Alejandra; Arratia Quesada, Argimiro Alejandro; Puig Casado, Pere
The problem of dealing with misreported data is very common in a wide range of contexts and for different reasons. This has been and still is an important issue for data analysts and statisticians as not accounting for it could led to biased estimates and conclusions, and in many cases that would have implications in a posterior decision making process, as we all have seen in the current worldwide Covid-19 pandemic. In the last few years, many approaches have been proposed in the literature to accomodate data presenting this issue, especially in the fields of epidemiology and public health but also in other areas as social science. In this work, a comprehensive review of the recently proposed methods based on mixture models for longitudinal data (correlated and uncorrelated) is presented and several examples of application are discussed, including several approaches to the burden of Covid-19 infection cases in Spain and different approaches to deal with underreported registries of human papillomavirus infections and genital warts in Catalunya
2022-03-04T12:36:42ZMoriña, DavidFernandez Fontelo, AmandaCabaña Nigro, Ana AlejandraArratia Quesada, Argimiro AlejandroPuig Casado, PereThe problem of dealing with misreported data is very common in a wide range of contexts and for different reasons. This has been and still is an important issue for data analysts and statisticians as not accounting for it could led to biased estimates and conclusions, and in many cases that would have implications in a posterior decision making process, as we all have seen in the current worldwide Covid-19 pandemic. In the last few years, many approaches have been proposed in the literature to accomodate data presenting this issue, especially in the fields of epidemiology and public health but also in other areas as social science. In this work, a comprehensive review of the recently proposed methods based on mixture models for longitudinal data (correlated and uncorrelated) is presented and several examples of application are discussed, including several approaches to the burden of Covid-19 infection cases in Spain and different approaches to deal with underreported registries of human papillomavirus infections and genital warts in CatalunyaForest fire forecasting using fuzzy logic modelsNebot Castells, M. ÀngelaMúgica Álvarez, Franciscohttp://hdl.handle.net/2117/3548532021-10-31T23:04:40Z2021-10-28T08:40:37ZForest fire forecasting using fuzzy logic models
Nebot Castells, M. Àngela; Múgica Álvarez, Francisco
In this study, we explored hybrid fuzzy logic modelling techniques to predict the burned area of forest fires. Fast detection is crucial for successful firefighting, and a model with an accurate prediction ability is extremely useful for optimizing fire management. Fuzzy Inductive Reasoning (FIR) and the Adaptive Neuro-Fuzzy Inference System (ANFIS) are two powerful fuzzy techniques for modelling burned areas of forests in Portugal. The results obtained from them were compared with those of other artificial intelligence techniques applied to the same datasets found in the literature.
2021-10-28T08:40:37ZNebot Castells, M. ÀngelaMúgica Álvarez, FranciscoIn this study, we explored hybrid fuzzy logic modelling techniques to predict the burned area of forest fires. Fast detection is crucial for successful firefighting, and a model with an accurate prediction ability is extremely useful for optimizing fire management. Fuzzy Inductive Reasoning (FIR) and the Adaptive Neuro-Fuzzy Inference System (ANFIS) are two powerful fuzzy techniques for modelling burned areas of forests in Portugal. The results obtained from them were compared with those of other artificial intelligence techniques applied to the same datasets found in the literature.Cumulated burden of Covid-19 in Spain from a Bayesian perspectiveMoriña, DavidFernandez Fontelo, AmandaCabaña Nigro, Ana AlejandraArratia Quesada, Argimiro AlejandroÁvalos Villaseñor, Gustavo EduardoPuig, Pedrohttp://hdl.handle.net/2117/3494122024-03-10T05:35:36Z2021-07-15T10:39:54ZCumulated burden of Covid-19 in Spain from a Bayesian perspective
Moriña, David; Fernandez Fontelo, Amanda; Cabaña Nigro, Ana Alejandra; Arratia Quesada, Argimiro Alejandro; Ávalos Villaseñor, Gustavo Eduardo; Puig, Pedro
Background
The main goal of this work is to estimate the actual number of cases of Covid-19 in Spain in the period 01-31-2020/06-01-2020 by Autonomous Communities. Based on these estimates, this work allows us to accurately re-estimate the lethality of the disease in Spain, taking into account unreported cases.
Methods
A hierarchical Bayesian model recently proposed in the literature has been adapted to model the actual number of Covid-19 cases in Spain.
Results
The results of this work show that the real load of Covid-19 in Spain in the period considered is well above the data registered by the public health system. Specifically, the model estimates show that, cumulatively until June 1st, 2020, there were 2 425 930 cases of Covid-19 in Spain with characteristics similar to those reported (95% credibility interval: 2 148 261 2 813 864), from which were actually registered only 518 664.
Conclusions
Considering the results obtained from the second wave of the Spanish seroprevalence study, which estimates 2 350 324 cases of Covid-19 produced in Spain, in the period of time considered, it can be seen that the estimates provided by the model are quite good. This work clearly shows the key importance of having good quality data to optimize decision-making in the critical context of dealing with a pandemic.
2021-07-15T10:39:54ZMoriña, DavidFernandez Fontelo, AmandaCabaña Nigro, Ana AlejandraArratia Quesada, Argimiro AlejandroÁvalos Villaseñor, Gustavo EduardoPuig, PedroBackground
The main goal of this work is to estimate the actual number of cases of Covid-19 in Spain in the period 01-31-2020/06-01-2020 by Autonomous Communities. Based on these estimates, this work allows us to accurately re-estimate the lethality of the disease in Spain, taking into account unreported cases.
Methods
A hierarchical Bayesian model recently proposed in the literature has been adapted to model the actual number of Covid-19 cases in Spain.
Results
The results of this work show that the real load of Covid-19 in Spain in the period considered is well above the data registered by the public health system. Specifically, the model estimates show that, cumulatively until June 1st, 2020, there were 2 425 930 cases of Covid-19 in Spain with characteristics similar to those reported (95% credibility interval: 2 148 261 2 813 864), from which were actually registered only 518 664.
Conclusions
Considering the results obtained from the second wave of the Spanish seroprevalence study, which estimates 2 350 324 cases of Covid-19 produced in Spain, in the period of time considered, it can be seen that the estimates provided by the model are quite good. This work clearly shows the key importance of having good quality data to optimize decision-making in the critical context of dealing with a pandemic.kernInt: A kernel framework for integrating supervised and unsupervised analyses in spatio-temporal metagenomic datasetsRamon Gurrea, EliesBelanche Muñoz, Luis AntonioMolist Gasa, FrancescQuintanilla Aguado, RaquelPérez Enciso, MiguelRamayo Caldas, Yuliaxishttp://hdl.handle.net/2117/3493972021-07-18T20:55:29Z2021-07-15T09:44:45ZkernInt: A kernel framework for integrating supervised and unsupervised analyses in spatio-temporal metagenomic datasets
Ramon Gurrea, Elies; Belanche Muñoz, Luis Antonio; Molist Gasa, Francesc; Quintanilla Aguado, Raquel; Pérez Enciso, Miguel; Ramayo Caldas, Yuliaxis
The advent of next-generation sequencing technologies allowed relative quantification of microbiome communities and their spatial and temporal variation. In recent years, supervised learning (i.e., prediction of a phenotype of interest) from taxonomic abundances has become increasingly common in the microbiome field. However, a gap exists between supervised and classical unsupervised analyses, based on computing ecological dissimilarities for visualization or clustering. Despite this, both approaches face common challenges, like the compositional nature of next-generation sequencing data or the integration of the spatial and temporal dimensions. Here we propose a kernel framework to place on a common ground the unsupervised and supervised microbiome analyses, including the retrieval of microbial signatures (taxa importances). We define two compositional kernels (Aitchison-RBF and compositional linear) and discuss how to transform non-compositional beta-dissimilarity measures into kernels. Spatial data is integrated with multiple kernel learning, while longitudinal data is evaluated by specific kernels. We illustrate our framework through a single point soil dataset, a human dataset with a spatial component, and a previously unpublished longitudinal dataset concerning pig production. The proposed framework and the case studies are freely available in the kernInt package at https://github.com/elies-ramon/kernInt.
2021-07-15T09:44:45ZRamon Gurrea, EliesBelanche Muñoz, Luis AntonioMolist Gasa, FrancescQuintanilla Aguado, RaquelPérez Enciso, MiguelRamayo Caldas, YuliaxisThe advent of next-generation sequencing technologies allowed relative quantification of microbiome communities and their spatial and temporal variation. In recent years, supervised learning (i.e., prediction of a phenotype of interest) from taxonomic abundances has become increasingly common in the microbiome field. However, a gap exists between supervised and classical unsupervised analyses, based on computing ecological dissimilarities for visualization or clustering. Despite this, both approaches face common challenges, like the compositional nature of next-generation sequencing data or the integration of the spatial and temporal dimensions. Here we propose a kernel framework to place on a common ground the unsupervised and supervised microbiome analyses, including the retrieval of microbial signatures (taxa importances). We define two compositional kernels (Aitchison-RBF and compositional linear) and discuss how to transform non-compositional beta-dissimilarity measures into kernels. Spatial data is integrated with multiple kernel learning, while longitudinal data is evaluated by specific kernels. We illustrate our framework through a single point soil dataset, a human dataset with a spatial component, and a previously unpublished longitudinal dataset concerning pig production. The proposed framework and the case studies are freely available in the kernInt package at https://github.com/elies-ramon/kernInt.Clustering assessment in weighted networksArratia Quesada, Argimiro AlejandroRenedo Mirambell, Martíhttp://hdl.handle.net/2117/3492772023-04-16T06:17:09Z2021-07-14T11:13:56ZClustering assessment in weighted networks
Arratia Quesada, Argimiro Alejandro; Renedo Mirambell, Martí
We provide a systematic approach to validate the results of clustering methods on weighted networks, in particular for the cases where the existence of a community structure is unknown. Our validation of clustering comprises a set of criteria for assessing their significance and stability. To test for cluster significance, we introduce a set of community scoring functions adapted to weighted networks, and systematically compare their values to those of a suitable null model. For this we propose a switching model to produce randomized graphs with weighted edges while maintaining the degree distribution constant. To test for cluster stability, we introduce a non parametric bootstrap method combined with similarity metrics derived from information theory and combinatorics. In order to assess the effectiveness of our clustering quality evaluation methods, we test them on synthetically generated weighted networks with a ground truth community structure of varying strength based on the stochastic block model construction. When applying the proposed methods to these synthetic ground truth networks’ clusters, as well as to other weighted networks with known community structure, these correctly identify the best performing algorithms, which suggests their adequacy for cases where the clustering structure is not known. We test our clustering validation methods on a varied collection of well known clustering algorithms applied to the synthetically generated networks and to several real world weighted networks. All our clustering validation methods are implemented in R, and will be released in the upcoming package clustAnalytics.
2021-07-14T11:13:56ZArratia Quesada, Argimiro AlejandroRenedo Mirambell, MartíWe provide a systematic approach to validate the results of clustering methods on weighted networks, in particular for the cases where the existence of a community structure is unknown. Our validation of clustering comprises a set of criteria for assessing their significance and stability. To test for cluster significance, we introduce a set of community scoring functions adapted to weighted networks, and systematically compare their values to those of a suitable null model. For this we propose a switching model to produce randomized graphs with weighted edges while maintaining the degree distribution constant. To test for cluster stability, we introduce a non parametric bootstrap method combined with similarity metrics derived from information theory and combinatorics. In order to assess the effectiveness of our clustering quality evaluation methods, we test them on synthetically generated weighted networks with a ground truth community structure of varying strength based on the stochastic block model construction. When applying the proposed methods to these synthetic ground truth networks’ clusters, as well as to other weighted networks with known community structure, these correctly identify the best performing algorithms, which suggests their adequacy for cases where the clustering structure is not known. We test our clustering validation methods on a varied collection of well known clustering algorithms applied to the synthetically generated networks and to several real world weighted networks. All our clustering validation methods are implemented in R, and will be released in the upcoming package clustAnalytics.Una aplicación para despertar recuerdos y cuidar la salud mental de los mayoresNebot Castells, M. ÀngelaBenali, AnassMúgica Álvarez, FranciscoAlbino Pires, NatáliaDomenech Pou, Sarahttp://hdl.handle.net/2117/3447232022-05-18T07:47:44Z2021-04-29T06:52:47ZUna aplicación para despertar recuerdos y cuidar la salud mental de los mayores
Nebot Castells, M. Àngela; Benali, Anass; Múgica Álvarez, Francisco; Albino Pires, Natália; Domenech Pou, Sara
El incremento de la longevidad de la población, derivado del descenso continuado de las tasas de natalidad y el aumento de la esperanza de vida, está transformando la forma de la pirámide de edad de la Unión Europea. Esta tendencia nos está llevando a una estructura de población mucho más envejecida.
El porcentaje de personas mayores en relación con la población total se incrementará considerablemente durante las próximas décadas, cuando gran parte de la generación del baby boom llegue a la edad de jubilación. Esto, a su vez, implicará un aumento de la carga sobre las personas en edad laboral. No solo deberán hacer frente al gasto social exigido por el envejecimiento de la población, sino también cuidar de los mayores de su familias.
La longevidad de la población conlleva otro reto importante. Una gran parte de las personas mayores padece alteraciones cognitivas o disminución significativa de la memoria, que reducen de forma alarmante su calidad de vida.
Con esta perspectiva sobre la mesa, es importante poner el foco de atención en este colectivo, con el objetivo de ayudar a que su vida en esta etapa sea activa, gratificante y plena. En este sentido, cualquier acción dirigida a aumentar sus capacidades cognitivas, como la atención, la memoria o la concentración es de especial relevancia.
Una persona con una capacidad cognitiva no deteriorada o con un deterioro limitado tiene, en general, una vida más estimulante, mejor rendimiento, una existencia más saludable y una predisposición más elevada a ser un miembro activo de la sociedad. Se trata, pues, de envejecer de manera óptima.
2021-04-29T06:52:47ZNebot Castells, M. ÀngelaBenali, AnassMúgica Álvarez, FranciscoAlbino Pires, NatáliaDomenech Pou, SaraEl incremento de la longevidad de la población, derivado del descenso continuado de las tasas de natalidad y el aumento de la esperanza de vida, está transformando la forma de la pirámide de edad de la Unión Europea. Esta tendencia nos está llevando a una estructura de población mucho más envejecida.
El porcentaje de personas mayores en relación con la población total se incrementará considerablemente durante las próximas décadas, cuando gran parte de la generación del baby boom llegue a la edad de jubilación. Esto, a su vez, implicará un aumento de la carga sobre las personas en edad laboral. No solo deberán hacer frente al gasto social exigido por el envejecimiento de la población, sino también cuidar de los mayores de su familias.
La longevidad de la población conlleva otro reto importante. Una gran parte de las personas mayores padece alteraciones cognitivas o disminución significativa de la memoria, que reducen de forma alarmante su calidad de vida.
Con esta perspectiva sobre la mesa, es importante poner el foco de atención en este colectivo, con el objetivo de ayudar a que su vida en esta etapa sea activa, gratificante y plena. En este sentido, cualquier acción dirigida a aumentar sus capacidades cognitivas, como la atención, la memoria o la concentración es de especial relevancia.
Una persona con una capacidad cognitiva no deteriorada o con un deterioro limitado tiene, en general, una vida más estimulante, mejor rendimiento, una existencia más saludable y una predisposición más elevada a ser un miembro activo de la sociedad. Se trata, pues, de envejecer de manera óptima.Estimating the real burden of disease under a pandemic situation: the SARS-CoV2 caseFernandez Fontelo, AmandaMoriña, DavidCabaña Nigro, Ana AlejandraArratia Quesada, Argimiro AlejandroPuig, Pedrohttp://hdl.handle.net/2117/3418502022-05-17T12:28:16Z2021-03-17T09:50:25ZEstimating the real burden of disease under a pandemic situation: the SARS-CoV2 case
Fernandez Fontelo, Amanda; Moriña, David; Cabaña Nigro, Ana Alejandra; Arratia Quesada, Argimiro Alejandro; Puig, Pedro
The present paper introduces a new model used to study and analyse the severe acute respiratory syndrome coronavirus 2 (SARS-CoV2) epidemic-reported-data from Spain. This is a Hidden Markov Model whose hidden layer is a regeneration process with Poisson immigration, Po-INAR(1), together with a mechanism that allows the estimation of the under-reporting in non-stationary count time series. A novelty of the model is that the expectation of the unobserved process’s innovations is a time-dependent function defined in such a way that information about the spread of an epidemic, as modelled through a Susceptible-Infectious-Removed dynamical system, is incorporated into the model. In addition, the parameter controlling the intensity of the under-reporting is also made to vary with time to adjust to possible seasonality or trend in the data. Maximum likelihood methods are used to estimate the parameters of the mode
2021-03-17T09:50:25ZFernandez Fontelo, AmandaMoriña, DavidCabaña Nigro, Ana AlejandraArratia Quesada, Argimiro AlejandroPuig, PedroThe present paper introduces a new model used to study and analyse the severe acute respiratory syndrome coronavirus 2 (SARS-CoV2) epidemic-reported-data from Spain. This is a Hidden Markov Model whose hidden layer is a regeneration process with Poisson immigration, Po-INAR(1), together with a mechanism that allows the estimation of the under-reporting in non-stationary count time series. A novelty of the model is that the expectation of the unobserved process’s innovations is a time-dependent function defined in such a way that information about the spread of an epidemic, as modelled through a Susceptible-Infectious-Removed dynamical system, is incorporated into the model. In addition, the parameter controlling the intensity of the under-reporting is also made to vary with time to adjust to possible seasonality or trend in the data. Maximum likelihood methods are used to estimate the parameters of the modeLeveraging data science for a personalized haemodialysisHueso, MiguelHaro Martín, Luis deCalabria, JordiDal-Re, RTebe, CGibert, KarinaCruzado, Josep MVellido Alcacena, Alfredohttp://hdl.handle.net/2117/3408052021-03-03T10:00:45Z2021-03-03T09:57:37ZLeveraging data science for a personalized haemodialysis
Hueso, Miguel; Haro Martín, Luis de; Calabria, Jordi; Dal-Re, R; Tebe, C; Gibert, Karina; Cruzado, Josep M; Vellido Alcacena, Alfredo
The 2019 Science for Dialysis Meeting at Bellvitge University Hospital was devoted to the challenges and opportunities posed by the use of data science to facilitate precision and personalized medicine in nephrology, and to describe new approaches and technologies. The meeting included separate sections for issues in data collection and data analysis. As part of data collection, we presented the institutional ARGOS e-health project, which provides a common model for the standardization of clinical practice. We also pay specific attention to the way in which randomized controlled trials offer data that may be critical to decision-making in the real world. The opportunities of open source software (OSS) for data science in clinical practice were also discussed. <b><i>Summary:</i></b> Precision medicine aims to provide the right treatment for the right patients at the right time and is deeply connected to data science. Dialysis patients are highly dependent on technology to live, and their treatment generates a huge volume of data that has to be analysed. Data science has emerged as a tool to provide an integrated approach to data collection, storage, cleaning, processing, analysis, and interpretation from potentially large volumes of information. This is meant to be a perspective article about data science based on the experience of the experts invited to the Science for Dialysis Meeting and provides an up-to-date perspective of the potential of data science in kidney disease and dialysis. <b><i>Key messages:</i></b> Healthcare is quickly becoming data-dependent, and data science is a discipline that holds the promise of contributing to the development of personalized medicine, although nephrology still lags behind in this process. The key idea is to ensure that data will guide medical decisions based on individual patient characteristics rather than on averages over a whole population usually based on randomized controlled trials that excluded kidney disease patients. Furthermore, there is increasing interest in obtaining data about the effectiveness of available treatments in current patient care based on pragmatic clinical trials. The use of data science in this context is becoming increasingly feasible in part thanks to the swift developments in OSS.
2021-03-03T09:57:37ZHueso, MiguelHaro Martín, Luis deCalabria, JordiDal-Re, RTebe, CGibert, KarinaCruzado, Josep MVellido Alcacena, AlfredoThe 2019 Science for Dialysis Meeting at Bellvitge University Hospital was devoted to the challenges and opportunities posed by the use of data science to facilitate precision and personalized medicine in nephrology, and to describe new approaches and technologies. The meeting included separate sections for issues in data collection and data analysis. As part of data collection, we presented the institutional ARGOS e-health project, which provides a common model for the standardization of clinical practice. We also pay specific attention to the way in which randomized controlled trials offer data that may be critical to decision-making in the real world. The opportunities of open source software (OSS) for data science in clinical practice were also discussed. <b><i>Summary:</i></b> Precision medicine aims to provide the right treatment for the right patients at the right time and is deeply connected to data science. Dialysis patients are highly dependent on technology to live, and their treatment generates a huge volume of data that has to be analysed. Data science has emerged as a tool to provide an integrated approach to data collection, storage, cleaning, processing, analysis, and interpretation from potentially large volumes of information. This is meant to be a perspective article about data science based on the experience of the experts invited to the Science for Dialysis Meeting and provides an up-to-date perspective of the potential of data science in kidney disease and dialysis. <b><i>Key messages:</i></b> Healthcare is quickly becoming data-dependent, and data science is a discipline that holds the promise of contributing to the development of personalized medicine, although nephrology still lags behind in this process. The key idea is to ensure that data will guide medical decisions based on individual patient characteristics rather than on averages over a whole population usually based on randomized controlled trials that excluded kidney disease patients. Furthermore, there is increasing interest in obtaining data about the effectiveness of available treatments in current patient care based on pragmatic clinical trials. The use of data science in this context is becoming increasingly feasible in part thanks to the swift developments in OSS.