Multidimensional framework for analysing next-generation sequencing data in a clinical diagnostic environment

View/Open
Cita com:
hdl:2117/384368
Document typeMaster thesis
Date2023-01-24
Rights accessOpen Access
All rights reserved. This work is protected by the corresponding intellectual and industrial
property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public
communication or transformation of this work are prohibited without permission of the copyright holder
Abstract
Next-generation sequencing (NGS), also called massively parallel sequencing, is a high-throughput technology that allows the determination of the nucleotide sequences of entire or specific regions of the genome. The application of this technology in a clinical environment enables personalized diagnostics for patients, for instance, allowing the identification of variants that might cause a disease. In this sense, clinical diagnostic laboratories are responsible for providing a robust and appropriate workflow that enables the obtention of genomic information ready to be interpreted by a clinician. The Molecular Biology CORE Laboratory in the Hospital Clinic de Barcelona performs hundreds of analyses each year, providing service to several diagnostic laboratories. Be sides, with the increasing number of NGS applications in clinical diagnostics, the number of analyses is expected to keep growing in the following years. Quality data is generated from different sources in each of these NGS analyses, including laboratory procedures, DNA sequencing, and bioinformatics analyses. These quality data must be carefully evaluated and validated to ensure the results' reliability. Moreover, the accumulation of quality data from each analysis can be used to assess the performance of the laboratory and to identify potential sources of technical artefacts that might lower the quality of the experiments. Hence, a database is needed to store and manage quality data for easy accessibility and analysis over time. In this thesis, we aim to develop a data warehouse to analyze and monitor NGS quality data coming from different data sources. To do that, we will perform the following steps: 1) design a multidimensional data model to ensure that data will be efficiently stored; 2) data extraction from different sources; 3) database loading; 4) design a visualization tool to enable descriptive analyses of the quality data. The designed tool will allow the historical exploration of quality parameters, as well as the evaluation of an experiment's quality metrics compared to the rest. With this tool, we are enabling the identification of areas of improvement by discovering sources of variation that might affect the quality of clinical NGS data.
DegreeMÀSTER UNIVERSITARI EN INNOVACIÓ I RECERCA EN INFORMÀTICA (Pla 2012)
Files | Description | Size | Format | View |
---|---|---|---|---|
175932.pdf | 4,071Mb | View/Open |