Adaptively Banded Smith-Waterman Algorithm for Long Reads and its Hardware Accelerator, Liao et. al.
- Sai Gautham Ravipati EE19B053
- Shashank Nag EE19B118
- Vishnu Varma V EE19B059
The genome sequence is the sequence of base pairs that constitute the DNA strands. The sequencers that read the DNA strands generate short-read sequences, which have to be subsequently aligned to a reference genome. Several algorithms are prevalent, which perform this task of aligning two sequences, and the Smith-Waterman algorithm is one of the most commmon one. A regular Smith-Waterman algorithm based alignment performs the computation with a time complexity of O(mn) - where m & n are the sequence lengths. A variant of this is the Banded Smith-Waterman algorithm, which computes the score matrix along a band around the diagonal, which provides optimizations in terms of computation and backtracing. The algorithm aligns the read subsequences to the target subsequences, using a seed and extend approach - i.e., small subsequences are aligned at a time.
In this project, we review the paper by Liao et.al., which claims to be one of the first to propose a hardware design for the BandedSmithWaterman algorithm, and to implement the traceback phase on hardware. We implement the algorithm given in the paper in Python, and analyze it to identify the bottlenecks. Subsequently, we built the accelerator in Verilog, following the architecture from source, to parallely compute the score matrices, and to implement the traceback phase. We integrate this accelerator with the picorv32 RISC-V processor to analyze the access dependencies and the cycle counts so involved.
A complete software baseline Python code was developed and profiled using the Google Colab platform. We used this primarily to identify the bottlenecks, and not as a metric of comparison with the hardware implementation.
More details and the corresponding code can be found in the folder Baseline_codes.
A hardware accelerator was developed based on the architecture proposed in the paper. The standalone accelerator and its testbench can be found in the folder Hardware_Accelerator.
The developed hardware accelerator is integrated with the picorv32 processor to profile and analyze the dependencies involved. More details on this can be found in the folder Accelerator_Integrated.
- Aligning two sequences within a specified diagonal band, Chao et al.
- Adaptively Banded Smith Waterman algorithm for long reads and its hardware accelerator, Liao et al.
- Darwin: A hardware-acceleration framework for genomic sequence alignment, Turakhia et al.
- PipeBSW: A Two-Stage Pipeline Structure for Banded Smith-Waterman Algorithm on FPGA, Li et al.
- A part of the baseline Python code for BandedSW was adapted from here
- BLAST: Basic Local Alognment Search Tool