A benchmark of synthetic transcriptomic cancer data reconstruction

Carregant...
Miniatura
El pots comprar en digital a:
El pots comprar en paper a:

Projectes de recerca

Unitats organitzatives

Número de la revista

Títol de la revista

ISSN de la revista

Títol del volum

Cita com:

Col·laborador

Editor

Tribunal avaluador

Realitzat a/amb

Càtedra / Departament / Institut

Tipus de document

Text en actes de congrés

Data publicació

Editor

Barcelona Supercomputing Center

Part de

Condicions d'accés

Accés obert

Llicència

Creative Commons
Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Llevat que s'hi indiqui el contrari, els seus continguts estan subjectes a la llicència de Creative Commons: Reconeixement-NoComercial-SenseObraDerivada 4.0 Internacional

Assignatures relacionades

Assignatures relacionades

Datasets relacionats

Datasets relacionats

Projecte CCD

Abstract

Cancer is the second most common cause of death worldwide, and its incidence is increasing [1]. Some methodologies have been developed to study cancer. For instance, PAM50, a collection of 50 genes important for cancer characterization, has helped categorize cancer subtypes [2]. However, the rapid growth of sequenced biological data, or omics, has made acquiring much larger amounts of genes possible. Still, the number of samples available in studies tends to be low. This combination of small sample size and high dimensionality, known as the curse of dimensionality, renders significant data analyses less efficient. Hence, there are limitations to deep learning implementations on omics data generally and cancer data in particular [3]. In the former, the curse of dimensionality has hindered the application of deep learning, given its data-hungry nature. In the latter, our current understanding of the impact of molecular mechanisms of cancer progression challenges our interpretation of the application of deep learning algorithms to omics information [4]. In order to circumvent both of these issues, we aim to learn a low-dimensional representation of the real data, use this representation to augment the original data with improved fidelity in reconstruction and obtain meaningful insights on cancer progression along the way. The Auto Encoder (AE) [5] is a deep learning technique that reduces data dimensionality. In this study, we define and use three types of Auto Encoders: vanilla Auto Encoder [5], Variational Autoencoder (VAE) [6], and Conditional Variational Autoencoder (CVAE) [7]. We discuss how we can learn from real cancer data, such as that provided by The Cancer Genome Atlas (TCGA), reconstruct the original data, and generate new data in silico, i.e., synthetic data.

Descripció

Document relacionat

Citació

Prol Castelo, G.; Cirillo, D.; Valencia, A. A benchmark of synthetic transcriptomic cancer data reconstruction. A: 11th BSC Doctoral Symposium, 7th - 8th May, 2024. Barcelona: Barcelona Supercomputing Center, 2024,

Ajut

Forma part

DOI

Dipòsit legal

ISBN

ISSN

Versió de l'editor

Altres identificadors

Referències