Evaluation of genome alignment workflows on HPC processors
Document typeMaster thesis
Rights accessOpen Access
Precision medicine holds promise for improving healthcare by leveraging genomic information. Due to the steep decrease in genome sequencing costs in recent years, the amount of data to be processed is increasing dramatically, leading to a significant computation and storage challenge. High-throughput sequencing systems produce a large amount of reads that need to be post-processed. These reads will typically be used for several genomic studies based on sequence alignment pipelines. Most of these software packages require several CPU hours to perform each of these studies. Seed and extend algorithm is widely used in sequence alignment. This algorithm looks for exact matches of a portion of the sequence in a reference genome, and then extend the alignment around that allowing differences. FM-Index is typically used during the seeding process, and the extend process is done with dynamic programming based on Smith-Waterman algorithms. Algorithms based on the FM-Index show an irregular memory access pattern, resulting in a memory bound problem. We analyze a recent implementation of the FM-Index and highlight existing throughput-memory trade-offs, showing that memory requirements limit the implementation of large k-steps. We propose a compressed FM-Index for large k-steps, that enables a 15-step FM-Index requiring less than 16 GB for a human genome reference of 3 giga base pairs (Gbp). An algorithm based on this new layout is evaluated on both a Knights Landing (KNL) and a Skylake-based system (SKX). We achieve average speed-ups of 1.46X and 1.39X, respectively, with respect to an state-of-the-art FM-Index implementation that is already well optimized. BWA-MEM2 is a well-known sequence alignment application. As a second contribution, we port BWA-MEM2 from x86_64 to Armv8 with SVE. The porting efforts have focused on the Smith-Waterman (SW) algorithm as it is the one with vectorization potential. We translate SW to SVE, the new vector extension of Arm. SVE is still under development, and only a few processors have implemented it. Arm is making a significant push to develop the ecosystem to introduce SVE into the high performance computing market. The first CPU that implements SVE is Fujitsu's A64FX, a server oriented CPU for high performance computing. The A64FX is used in Fugaku, the top ranked supercomputer in the top500 lists of June and November 2020. We highlight the key points of the A64FX in comparison with other server CPUs. Finally, we evaluate BWA-MEM2 on an A64FX and an x86_64 system to compare their performance.
DegreeMÀSTER UNIVERSITARI EN INNOVACIÓ I RECERCA EN INFORMÀTICA (Pla 2012)
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder