Hardware/software co-design for data-intensive genomics workloads

Carregant...
Miniatura
El pots comprar en digital a:
El pots comprar en paper a:

Projectes de recerca

Unitats organitzatives

Número de la revista

Títol de la revista

ISSN de la revista

Títol del volum

Director

Codirector

Tutor

Tribunal avaluador

Realitzat a/amb

Tipus de document

Tesi

Data de defensa

Editor

Universitat Politècnica de Catalunya

Condicions d'accés

Accés obert

item.page.rightslicense

Creative Commons
Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Llevat que s'hi indiqui el contrari, els seus continguts estan subjectes a la llicència de Creative Commons: Reconeixement-NoComercial-CompartirIgual 4.0 Internacional

Assignatures relacionades

Assignatures relacionades

Publicacions relacionades

Datasets relacionats

Datasets relacionats

Projecte CCD

Abstract

Since the last decade, the main components of computer systems have been evolving, diversifying, to overcome their physical limits and to minimize their energy footprint. Hardware specialization and heterogeneity have become key to design more efficient systems and tackle ever-important problems with ever-larger volumes of data. However, to fully take advantage of the new hardware, a tighter integration between hardware and software, called hardware/software co-design, is also needed. Hardware/software co-design is a time-consuming process that poses its challenges, such as code and performance portability. Despite its challenges and considerable costs, it is an effort that is crucial for data-intensive applications that run at scale. Such applications span across different fields, such as engineering, chemistry, life sciences, astronomy, high energy physics, earth sciences, et cetera. Another scientific field where hardware/software co-design is fundamental is genomics. Here, modern DNA sequencing technologies reduced the sequencing time and made its cost orders of magnitude cheaper than it was just a few years ago. This breakthrough, together with novel genomics methods, will eventually enable the long-awaited personalized medicine. Personalized medicine selects appropriate and optimal therapies based on the context of a patient’s genome, and it has the potential to change medical treatments as we know them today. However, the broad adoption of genomics methods is limited by their capital and operational costs. In fact, genomics pipelines consist of complex algorithms with execution times of many hours per each patient and vast intermediate data structures stored in main memory for good performance. To satisfy the main memory requirement genomics applications are usually scaled-out to multiple compute nodes. Therefore, these workloads require infrastructures of enterprise-class servers, with entry and running costs that that most labs, clinics, and hospitals cannot afford. Due to these reasons, co-designing genomics workloads to lower their total cost of ownership is essential and worth investigating. This thesis demonstrates that hardware/software co-design allows migrating data-intensive genomics applications to inexpensive desktop-class machines to reduce the total cost of ownership when compared to traditional cluster deployments. Firstly, the thesis examines algorithmic improvements to ease co-design and to reduce workload footprint, using NVMs as a memory extension, and so to be able to run in one single node. Secondly, it investigates how data-intensive algorithms can offload computation to programmable accelerators (i.e., GPUs and FPGAs) to reduce the execution time and the energy-to-solution. Thirdly, it explores and proposes techniques to substantially reduce the memory footprint through the adoption of flash memory to the point that genomics methods can run on one affordable desktop-class machine. Results on SMUFIN, a state-of-the-art real-world genomics method prove that hardware/software co-design allows significant reductions in the total cost of ownership of data-intensive genomics methods, easing their adoption on large repositories of genomes and also on the field.

Descripció

Programa de doctorat

DOCTORAT EN ARQUITECTURA DE COMPUTADORS (Pla 2012)

Document relacionat

Citació

Cadenelli, L. Hardware/software co-design for data-intensive genomics workloads. Tesi doctoral, UPC, Departament d'Arquitectura de Computadors, 2019. DOI 10.5821/dissertation-2117-175258 .

Ajut

Forma part

Dipòsit legal

ISBN

ISSN

Versió de l'editor

Altres identificadors

Referències