A benchmark of synthetic transcriptomic cancer data reconstruction

dc.contributor.authorProl Castelo, Guillermo
dc.contributor.authorCirillo, Davide
dc.contributor.authorValencia, Alfonso
dc.date.accessioned2025-06-27T14:39:58Z
dc.date.available2025-06-27T14:39:58Z
dc.date.issued2024-05
dc.description.abstractCancer is the second most common cause of death worldwide, and its incidence is increasing [1]. Some methodologies have been developed to study cancer. For instance, PAM50, a collection of 50 genes important for cancer characterization, has helped categorize cancer subtypes [2]. However, the rapid growth of sequenced biological data, or omics, has made acquiring much larger amounts of genes possible. Still, the number of samples available in studies tends to be low. This combination of small sample size and high dimensionality, known as the curse of dimensionality, renders significant data analyses less efficient. Hence, there are limitations to deep learning implementations on omics data generally and cancer data in particular [3]. In the former, the curse of dimensionality has hindered the application of deep learning, given its data-hungry nature. In the latter, our current understanding of the impact of molecular mechanisms of cancer progression challenges our interpretation of the application of deep learning algorithms to omics information [4]. In order to circumvent both of these issues, we aim to learn a low-dimensional representation of the real data, use this representation to augment the original data with improved fidelity in reconstruction and obtain meaningful insights on cancer progression along the way. The Auto Encoder (AE) [5] is a deep learning technique that reduces data dimensionality. In this study, we define and use three types of Auto Encoders: vanilla Auto Encoder [5], Variational Autoencoder (VAE) [6], and Conditional Variational Autoencoder (CVAE) [7]. We discuss how we can learn from real cancer data, such as that provided by The Cancer Genome Atlas (TCGA), reconstruct the original data, and generate new data in silico, i.e., synthetic data.
dc.format.extent2 p.
dc.identifier.citationProl Castelo, G.; Cirillo, D.; Valencia, A. A benchmark of synthetic transcriptomic cancer data reconstruction. A: 11th BSC Doctoral Symposium, 7th - 8th May, 2024. Barcelona: Barcelona Supercomputing Center, 2024,
dc.identifier.urihttps://hdl.handle.net/2117/433061
dc.languageen
dc.language.isoeng
dc.publisherBarcelona Supercomputing Center
dc.rights.accessOpen Access
dc.rights.licensenameAttribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subjectÀrees temàtiques de la UPC::Informàtica::Arquitectura de computadors
dc.subject.lcshHigh performance computing
dc.subject.lemacCàlcul intensiu (Informàtica)
dc.subject.otherCancer
dc.subject.otherTranscriptomics
dc.subject.otherSynthetic Patients
dc.subject.otherDeep Learning
dc.subject.otherAutoencoders
dc.titleA benchmark of synthetic transcriptomic cancer data reconstruction
dc.typeConference report
dspace.entity.typePublication
local.citation.contributor11th BSC Doctoral Symposium, 7th - 8th May, 2024
local.citation.pubplaceBarcelona

Fitxers

Paquet original

Mostrant 1 - 1 de 1
Carregant...
Miniatura
Nom:
SODS11-2024-35.pdf
Mida:
2.43 MB
Format:
Adobe Portable Document Format
Descripció: