Releases: NVIDIA-Genomics-Research/GenomeWorks
Releases · NVIDIA-Genomics-Research/GenomeWorks
v2021.02.2
Updated tqdm requirements for pygenomeworks.
v2021.02.1
Updated numpy and cython requirements for pygenomeworks.
v2021.02.0
-
CUDA Mapper
- Created a new interface to work with Indices. Allow grouping Indices together through public API.
- Added support for Index from descriptor objects and use that throughout.
- Added API support for outputting sequences/alignments in SAM/BAM format
- Determining positions of reads' minimizers on device instead of on host
- Correctly skipping reads that are too short to fit in at least one window
- #557 and #562 correct errors in the evaluate_paf script that could result in incorrect counts of matched starts / ends of records or precision and recall values that were greater than 1.
-
CUDA Aligner
- Fixed a bug in banded Myers which could lead to an out-of-bounds access for non-optimal alignments. If the backtrace in the Needleman-Wunsch matrix touched the border of the band at a specific point it may lead to an out-of-bounds access.
- Added support for the extended CIGAR format, which distinguishes between matches
=
and mismatchesX
. - Improved performance for batches with very varying alignment lengths
- Added a FixedBandAligner base class (as specialization of Aligner) for aligners that operate on a diagonal band. These aligners provide a aligner->reset_bandwidth(new_bandwidth) function now.
- Fixed the memory requirements of Hirschberg-Myers aligner. It can now process significantly larger batches at once.
- The default aligners returned by
create_aligner()
are the banded Myers aligner (a FixedBandAligner) for thecreate_aligner()
function that does specify a bandwidth and Hirschberg-Myers aligner (a Aligner) for thecreate_aligner()
call that does not specify a bandwidth. The API for the latter case is deprecated in will be replaced by a differentcreate_aligner()
function.
-
CUDA Partial Order Aligner (CUDA POA)
- #551 Adds GFA output of the alignment graph generated by cudaPOA.
- Applied various changes to optimize performance of kernels for banded alignments.
- Added option
-s
for CUDA POA API to allow managing allocated memory for adaptive score matrix - Introduced
static_band_traceback
as a new alignment mode in CUDA POA. This mode can potentially improve performance for processing long-read batches. - Introduced
adaptive_band_traceback
for long-read batches. Different banded versions of Needleman-Wunsch kernels were unified. - Added new CI tests to validate results of static/adaptive-band and static/adaptive-band with traceback against full-band Needleman-Wunsch kernel.
- Added description and hints to CUDA POA error codes.
- Added support for caching device allocations to reduce time spent allocating device memory.
-
CUDA Extender
- New Added new C++ module for CUDA-accelerated ungapped seed-extension algorithm that uses seed positions in encoded input strands to extend and compute the alignment between the strands, adapted from SegAlign's Ungapped Extender - S. Goenka, Y. Turakhia, B. Paten and M. Horowitz, "SegAlign: A Scalable GPU-Based Whole Genome Aligner," in 2020 SC20: International Conference for High Performance Computing, Networking, Storage and Analysis (SC), Atlanta, GA, US, 2020 pp. 540-552.
-
Pygenomeworks
- CIGAR strings for pairwise alignments can now be visualized.
-
Other
- System requirements are updated.
- GenomeWorks semantic versioning is changed to calendar versioning.
DevicePreallocatedAllocator
allocates exactly the amount of memory requested- Fixed silent execution errors, which could occur at GPU kernel launch under certain conditions.
v0.5.3
v0.5.2
v0.5.1
GenomeWorks v0.5.0
Release v0.5.0 brings major performance and functionality updates to all modules with a focus on improved handling of long read sequences.
- Clara Genomics Analysis -> GenomeWorks
- As of this release, the repository name and associated project and package names have been updated to
GenomeWorks
. The python bindings package is now available on PyPI asgenomeworks
.
- CUDA Partial Order Aligner (CUDA POA)
- Novel adaptive banding implementation for partial order alignment achieves measurably better accuracy than default static band parameters with marginal drop in performance.
- Support for consensus and MSA of long read sequences in all accuracy modes (full matrix, adaptive band and static band).
- Several bug fixes and general stability improvements.
- Backwards incompatible API changes.
- CUDA Aligner
- New banded Myers algorithm with compact memory footprint and adjustable band size accelerates
cudaaligner
performance of long read global alignment (10-15kb sequences) by ~3x over previous implementations with comparable accuracy. Narrow bands may lead to non-optimal alignments. - Updated
Alignment
object provides edit distance for each alignment and a new flag to signal optimal vs non-optimal alignments. - Bug fixes and improved test coverage.
- Backward incompatible API changes.
- CUDA Mapper
- New conditions for fusing overlaps and changes to default parameters improve accuracy on small genomes without genomic repeats (NG50, mismatches, and indel accuracy matching minimap2 for E. coli and S. aureus).
- Saving copies of indices in host memory and transferring them to device memory on demand avoids both additional index generations and leaves sufficient device memory for matcher and overlapper
- Using CUB-based search algorithm gets better performance than Thrust because it leverages additional information about the data that is to be sorted.
- New sample showcasing the use of
cudamapper
APIs to build a GPU-accelerated, minimizer based mapper.
ClaraGenomicsAnalysis Release 0.4.4
- Bug fix : Updated
cmake
default installation path mechanism
ClaraGenomicsAnalysis 0.5.0 - Release Candidate 1
Release candidate for v0.5.0 focuses primarily on improving the performance of the GPU accelerated mapper, cudamapper . Here are some highlights of this release so far -
- CUDA Mapper
- Enabled filtering of high frequency sketch elements in index (
-F
option. More details highlighted in help message) - Host and device caching for improved indexer performance (
-C
and-c
) - Smart cached memory allocator for improved device memory initialization time
- Re-architected matcher/overlapper components to improve end to end performance
- Algorithmic improvements to overlapper leader to higher accuracy of overlaps
Application | dataset | GPU | v0.4.3 | v0.5.0-rc1 | Acceleration |
---|---|---|---|---|---|
cudamapper all-vs-all | ONT E. Coli 150x | 1x GV100 32GB | 72.3s | 23.2s | 3.11x |
ClaraGenomicsAnalysis Release 0.4.3
Release 0.4.3 is a hotfix for wheel package generation in Puyclaragenomics.
- Python 3.5 was Added to PyPI classifiers.
- Fixed
liblogging.so
linkage error occurred in Python 3.5.