A benchmark of synthetic transcriptomic cancer data reconstruction
| dc.contributor.author | Prol Castelo, Guillermo |
| dc.contributor.author | Cirillo, Davide |
| dc.contributor.author | Valencia, Alfonso |
| dc.date.accessioned | 2025-06-27T14:39:58Z |
| dc.date.available | 2025-06-27T14:39:58Z |
| dc.date.issued | 2024-05 |
| dc.description.abstract | Cancer is the second most common cause of death worldwide, and its incidence is increasing [1]. Some methodologies have been developed to study cancer. For instance, PAM50, a collection of 50 genes important for cancer characterization, has helped categorize cancer subtypes [2]. However, the rapid growth of sequenced biological data, or omics, has made acquiring much larger amounts of genes possible. Still, the number of samples available in studies tends to be low. This combination of small sample size and high dimensionality, known as the curse of dimensionality, renders significant data analyses less efficient. Hence, there are limitations to deep learning implementations on omics data generally and cancer data in particular [3]. In the former, the curse of dimensionality has hindered the application of deep learning, given its data-hungry nature. In the latter, our current understanding of the impact of molecular mechanisms of cancer progression challenges our interpretation of the application of deep learning algorithms to omics information [4]. In order to circumvent both of these issues, we aim to learn a low-dimensional representation of the real data, use this representation to augment the original data with improved fidelity in reconstruction and obtain meaningful insights on cancer progression along the way. The Auto Encoder (AE) [5] is a deep learning technique that reduces data dimensionality. In this study, we define and use three types of Auto Encoders: vanilla Auto Encoder [5], Variational Autoencoder (VAE) [6], and Conditional Variational Autoencoder (CVAE) [7]. We discuss how we can learn from real cancer data, such as that provided by The Cancer Genome Atlas (TCGA), reconstruct the original data, and generate new data in silico, i.e., synthetic data. |
| dc.format.extent | 2 p. |
| dc.identifier.citation | Prol Castelo, G.; Cirillo, D.; Valencia, A. A benchmark of synthetic transcriptomic cancer data reconstruction. A: 11th BSC Doctoral Symposium, 7th - 8th May, 2024. Barcelona: Barcelona Supercomputing Center, 2024, |
| dc.identifier.uri | https://hdl.handle.net/2117/433061 |
| dc.language | en |
| dc.language.iso | eng |
| dc.publisher | Barcelona Supercomputing Center |
| dc.rights.access | Open Access |
| dc.rights.licensename | Attribution-NonCommercial-NoDerivatives 4.0 International |
| dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ |
| dc.subject | Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors |
| dc.subject.lcsh | High performance computing |
| dc.subject.lemac | Càlcul intensiu (Informàtica) |
| dc.subject.other | Cancer |
| dc.subject.other | Transcriptomics |
| dc.subject.other | Synthetic Patients |
| dc.subject.other | Deep Learning |
| dc.subject.other | Autoencoders |
| dc.title | A benchmark of synthetic transcriptomic cancer data reconstruction |
| dc.type | Conference report |
| dspace.entity.type | Publication |
| local.citation.contributor | 11th BSC Doctoral Symposium, 7th - 8th May, 2024 |
| local.citation.pubplace | Barcelona |
Fitxers
Paquet original
1 - 1 de 1
Carregant...
- Nom:
- SODS11-2024-35.pdf
- Mida:
- 2.43 MB
- Format:
- Adobe Portable Document Format
- Descripció:



