Hardware/software co-design for data-intensive genomics workloads

Cadenelli, Luca

doi:10.5821/dissertation-2117-175258

dc.contributor	Carrera Pérez, David
dc.contributor	Polo Bardés, Jordà
dc.contributor.author	Cadenelli, Luca
dc.contributor.other	Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.date.accessioned	2020-01-20T01:00:50Z
dc.date.available	2020-01-20T01:00:50Z
dc.date.issued	2019-12-19
dc.identifier.citation	Cadenelli, L. Hardware/software co-design for data-intensive genomics workloads. Tesi doctoral, UPC, Departament d'Arquitectura de Computadors, 2019. DOI 10.5821/dissertation-2117-175258.
dc.identifier.uri	http://hdl.handle.net/2117/175258
dc.description.abstract	Since the last decade, the main components of computer systems have been evolving, diversifying, to overcome their physical limits and to minimize their energy footprint. Hardware specialization and heterogeneity have become key to design more efficient systems and tackle ever-important problems with ever-larger volumes of data. However, to fully take advantage of the new hardware, a tighter integration between hardware and software, called hardware/software co-design, is also needed. Hardware/software co-design is a time-consuming process that poses its challenges, such as code and performance portability. Despite its challenges and considerable costs, it is an effort that is crucial for data-intensive applications that run at scale. Such applications span across different fields, such as engineering, chemistry, life sciences, astronomy, high energy physics, earth sciences, et cetera. Another scientific field where hardware/software co-design is fundamental is genomics. Here, modern DNA sequencing technologies reduced the sequencing time and made its cost orders of magnitude cheaper than it was just a few years ago. This breakthrough, together with novel genomics methods, will eventually enable the long-awaited personalized medicine. Personalized medicine selects appropriate and optimal therapies based on the context of a patient’s genome, and it has the potential to change medical treatments as we know them today. However, the broad adoption of genomics methods is limited by their capital and operational costs. In fact, genomics pipelines consist of complex algorithms with execution times of many hours per each patient and vast intermediate data structures stored in main memory for good performance. To satisfy the main memory requirement genomics applications are usually scaled-out to multiple compute nodes. Therefore, these workloads require infrastructures of enterprise-class servers, with entry and running costs that that most labs, clinics, and hospitals cannot afford. Due to these reasons, co-designing genomics workloads to lower their total cost of ownership is essential and worth investigating. This thesis demonstrates that hardware/software co-design allows migrating data-intensive genomics applications to inexpensive desktop-class machines to reduce the total cost of ownership when compared to traditional cluster deployments. Firstly, the thesis examines algorithmic improvements to ease co-design and to reduce workload footprint, using NVMs as a memory extension, and so to be able to run in one single node. Secondly, it investigates how data-intensive algorithms can offload computation to programmable accelerators (i.e., GPUs and FPGAs) to reduce the execution time and the energy-to-solution. Thirdly, it explores and proposes techniques to substantially reduce the memory footprint through the adoption of flash memory to the point that genomics methods can run on one affordable desktop-class machine. Results on SMUFIN, a state-of-the-art real-world genomics method prove that hardware/software co-design allows significant reductions in the total cost of ownership of data-intensive genomics methods, easing their adoption on large repositories of genomes and also on the field.
dc.format.extent	115 p.
dc.language.iso	eng
dc.publisher	Universitat Politècnica de Catalunya
dc.rights	L'accés als continguts d'aquesta tesi queda condicionat a l'acceptació de les condicions d'ús establertes per la següent llicència Creative Commons: http://creativecommons.org/licenses/by-nc-sa/4.0/
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/4.0/
dc.source	TDX (Tesis Doctorals en Xarxa)
dc.subject	Àrees temàtiques de la UPC::Informàtica
dc.title	Hardware/software co-design for data-intensive genomics workloads
dc.type	Doctoral thesis
dc.identifier.doi	10.5821/dissertation-2117-175258
dc.rights.access	Open Access
dc.description.version	Postprint (published version)
dc.identifier.tdx	http://hdl.handle.net/10803/668250

Fitxers d'aquest items

Nom:: TNC1de1.pdf
Mida:: 12,44Mb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Departament d'Arquitectura de Computadors [361]
Totes les tesis [5.459]

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

Hardware/software co-design for data-intensive genomics workloads

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora