Show simple item record

dc.contributor.authorVilardell, Mireia
dc.contributor.authorBuxó, Maria
dc.contributor.authorClèries, Ramon
dc.contributor.authorMartínez Martínez, José Miguel
dc.contributor.authorAmeijide, Alberto
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament d'Estadística i Investigació Operativa
dc.identifier.citationVilardell, M. [et al.]. Missing data imputation and synthetic data simulation through modeling graphical probabilistic dependencies between variables (ModGraProDep): An application to breast cancer survival. "Artificial intelligence in medicine", 1 Juliol 2020, vol. 107, p. 1018757/1-101875/11.
dc.description.abstractBackground Two common issues may arise in certain population-based breast cancer (BC) survival studies: I) missing values in a survivals’ predictive variable, such as “Stage” at diagnosis, and II) small sample size due to “imbalance class problem” in certain subsets of patients, demanding data modeling/simulation methods. Methods We present a procedure, ModGraProDep, based on graphical modeling (GM) of a dataset to overcome these two issues. The performance of the models derived from ModGraProDep is compared with a set of frequently used classification and machine learning algorithms (Missing Data Problem) and with oversampling algorithms (Synthetic Data Simulation). For the Missing Data Problem we assessed two scenarios: missing completely at random (MCAR) and missing not at random (MNAR). Two validated BC datasets provided by the cancer registries of Girona and Tarragona (northeastern Spain) were used. Results In both MCAR and MNAR scenarios all models showed poorer prediction performance compared to three GM models: the saturated one (GM.SAT) and two with penalty factors on the partial likelihood (GM.K1 and GM.TEST). However, GM.SAT predictions could lead to non-reliable conclusions in BC survival analysis. Simulation of a “synthetic” dataset derived from GM.SAT could be the worst strategy, but the use of the remaining GMs models could be better than oversampling. Conclusion Our results suggest the use of the GM-procedure presented for one-variable imputation/prediction of missing data and for simulating “synthetic” BC survival datasets. The “synthetic” datasets derived from GMs could be also used in clinical applications of cancer survival data such as predictive risk analysis.
dc.rightsAttribution-NonCommercial-NoDerivs 3.0 Spain
dc.subjectÀrees temàtiques de la UPC::Informàtica::Intel·ligència artificial
dc.subject.lcshArtificial intelligence
dc.subject.otherBreast cancer
dc.subject.otherGraphical models
dc.subject.otherMissing data
dc.titleMissing data imputation and synthetic data simulation through modeling graphical probabilistic dependencies between variables (ModGraProDep): An application to breast cancer survival
dc.subject.lemacMama -- Càncer
dc.subject.lemacIntel·ligència artificial
dc.contributor.groupUniversitat Politècnica de Catalunya. ADBD - Anàlisi de Dades Complexes per a les Decisions Empresarials
dc.rights.accessRestricted access - publisher's policy
dc.description.versionPostprint (published version)
local.citation.authorVilardell, M.; Buxó, M.; Clèries, R.; Martinez, JM.; Ameijide, A.
local.citation.publicationNameArtificial intelligence in medicine

Files in this item


This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 3.0 Spain
Except where otherwise noted, content on this work is licensed under a Creative Commons license : Attribution-NonCommercial-NoDerivs 3.0 Spain