Releases: ComparativeGenomicsToolkit/cactus
Cactus 2.9.3 2024-11-18
Cactus 2.9.3 is available in the following forms:
- Docker Image:
quay.io/comparative-genomics-toolkit/cactus:v2.9.3
GPU-accelerated Docker Image:quay.io/comparative-genomics-toolkit/cactus:v2.9.3-gpu
Install instructions in README.md - Pre-compiled Binaries Linux Tarball: cactus-bin-v2.9.3.tar.gz
Install instructions in BIN-INSTALL.md - Pre-compiled Binaries For Older CPU Architectures Linux Tarball: cactus-bin-legacy-v2.9.3.tar.gz
Install instructions in BIN-INSTALL.md - Source Tarball: cactus-v2.9.3.tar.gz
Install instructions in README.md
WARNING: do not use the github automatically generated source files (Source code (zip)
or Source code (tar.gz)
), these are not correct.
The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (and, since version 2.3.1, support Cactus's pangenome pipeline).
Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.
Release Notes
This release adds some new options to the pangenome pipeline, and hopefully improves robustness overall
- Faster path normalization (
vg paths -n
) for pangenomes via vg upgrade to v1.61.0 - Sanity checks added to better detect corrupted intermediate FASTA files
- Switch off abPOA's progressive mode unless input sequences have same length (otherwise sort by length)
--lastTrain / --scoresFile
options added to learn and/or use custom scoring models for multiple alignment usinglast-train
.- Update to latest
vcflib
. Also addvcflib
installation command as option toBIN-INSTALL
instructions - Make
--maxLen
default value consistent betweencactus-align --pangenome
andcactus-pangenome
. Previously it was 100X bigger in the former, which made it very easy to have wildly different performance between the all-at-once and step-by-step versions of the pipeline - Fix bug where
--binariesMode singularity
could potentially attempt to write temporary files outside specified workDir - Tighten disk usage estimate for
tile_alignments
job - Patch
mafTools
to fix a bug wheretaffy
normalization incactus-hal2maf
would crash if 1-character genome names were present in the input
Cactus 2.9.2 2024-10-14
Cactus 2.9.2 is available in the following forms:
- Docker Image:
quay.io/comparative-genomics-toolkit/cactus:v2.9.2
GPU-accelerated Docker Image:quay.io/comparative-genomics-toolkit/cactus:v2.9.2-gpu
Install instructions in README.md - Pre-compiled Binaries Linux Tarball: cactus-bin-v2.9.2.tar.gz
Install instructions in BIN-INSTALL.md - Pre-compiled Binaries For Older CPU Architectures Linux Tarball: cactus-bin-legacy-v2.9.2.tar.gz
Install instructions in BIN-INSTALL.md - Source Tarball: cactus-v2.9.2.tar.gz
Install instructions in README.md
WARNING: do not use the github automatically generated source files (Source code (zip)
or Source code (tar.gz)
), these are not correct.
The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (and, since version 2.3.1, support Cactus's pangenome pipeline).
Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.
Release Notes
This release patches a couple bugs
- fix broken
--collapse
option - give Toil exact disk requirement for merge_aligments job, fixing a potential over-estimate
- update abpoa to latest release (v1.5.3)
Cactus 2.9.1 2024-09-25
Cactus 2.9.1 is available in the following forms:
- Docker Image:
quay.io/comparative-genomics-toolkit/cactus:v2.9.1
GPU-accelerated Docker Image:quay.io/comparative-genomics-toolkit/cactus:v2.9.1-gpu
Install instructions in README.md - Pre-compiled Binaries Linux Tarball: cactus-bin-v2.9.1.tar.gz
Install instructions in BIN-INSTALL.md - Pre-compiled Binaries For Older CPU Architectures Linux Tarball: cactus-bin-legacy-v2.9.1.tar.gz
Install instructions in BIN-INSTALL.md - Source Tarball: cactus-v2.9.1.tar.gz
Install instructions in README.md
WARNING: do not use the github automatically generated source files (Source code (zip)
or Source code (tar.gz)
), these are not correct.
The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (and, since version 2.3.1, support Cactus's pangenome pipeline).
Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.
Release Notes
This release updates the pangenome pipeline, and adds KegAlign
to progressive cactus.
- GPU lastz implementation changed from
SegAlign
toKegAlign
, and should be more robust and better supported as a result. - Path normalization added to pangenome pipeline to make sure no two equivalent alleles through any site have different paths.
AT
fields in VCF should now always be consistent with the graph as a result. - Always chop nodes to 1024bp by default in pangenome pipeline. This ensures that all outputs (gfa, gbz, vcf etc) have compatible node ids. Before, only GBZ and downstream graphs were chopped which was too confusing. Old logic can be re-activated using the config XML though.
- Fix recent bug where using the
--mgMemory
option would crashcactus-pangenome
- (Experimental)
--collapse
option added to pangenome pipeline to incorporate nearby self-alignments, including on the reference path. - Left shifting VCFs (
bcftools norm -f
) no longer run by default (except onvcfwave
-normalized outputs), since it can cause conflicts with PanGenie by writing multiple variants at the same location.
Cactus 2.9.0 2024-07-29
Cactus 2.9.0 is available in the following forms:
- Docker Image:
quay.io/comparative-genomics-toolkit/cactus:v2.9.0
GPU-accelerated Docker Image:quay.io/comparative-genomics-toolkit/cactus:v2.9.0-gpu
Install instructions in README.md - Pre-compiled Binaries Linux Tarball: cactus-bin-v2.9.0.tar.gz
Install instructions in BIN-INSTALL.md - Pre-compiled Binaries For Older CPU Architectures Linux Tarball: cactus-bin-legacy-v2.9.0.tar.gz
Install instructions in BIN-INSTALL.md - Source Tarball: cactus-v2.9.0.tar.gz
Install instructions in README.md
WARNING: do not use the github automatically generated source files (Source code (zip)
or Source code (tar.gz)
), these are not correct.
The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (and, since version 2.3.1, support Cactus's pangenome pipeline).
Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.
Release Notes
This release addresses two important scaling issues in the pangenome pipeline.
- The haplotype sampling index (
--haplo
) can now be built without giraffe indexes (--giraffe
). This significantly reduces peak memory consumption when using--haplo
, especially for big diverse pangenomes. - Previously, you could not align more than roughly 500 samples with Minigraph-Cactus, no matter how small the input genomes were. This bottleneck has been removed: you can now align as many genomes as your system resources allow. For very small genomes, this could be well into the tens of thousands.
- Two bugs were recently found in
vcfwave
, which can be run with the--vcfwave
option since v2.8.2. First theAT
field is wrong in the output. Second, and more seriously, genotypes can be incorrect. The latter seems specific to multiallelic sites (but I'm not sure). This release now stripsAT
fields (they are not relevant after re-alignment anyway). It also splits multiallelic sites before runningvcfwave
in an attempt to work around the genotyping bug.
Cactus 2.8.4 2024-06-21
Cactus 2.8.4 is available in the following forms:
- Docker Image:
quay.io/comparative-genomics-toolkit/cactus:v2.8.4
GPU-accelerated Docker Image:quay.io/comparative-genomics-toolkit/cactus:v2.8.4-gpu
Install instructions in README.md - Pre-compiled Binaries Linux Tarball: cactus-bin-v2.8.4.tar.gz
Install instructions in BIN-INSTALL.md - Pre-compiled Binaries For Older CPU Architectures Linux Tarball: cactus-bin-legacy-v2.8.4.tar.gz
Install instructions in BIN-INSTALL.md - Source Tarball: cactus-v2.8.4.tar.gz
Install instructions in README.md
WARNING: do not use the github automatically generated source files (Source code (zip)
or Source code (tar.gz)
), these are not correct.
The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (and, since version 2.3.1, support Cactus's pangenome pipeline).
Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.
Release Notes
This release updates vcfbub
in order to fix a longstanding issue where this tool can produce invalid VCFs.
vcfbub
updated tov0.1.1
which resolves a bub where records could be missing columns in the presence of.
genotypes- run
bcftools view
as sanity check on generated VCFs to prevent various normalization steps from ever silently producing invalid output.
Cactus 2.8.3 2024-06-12
Cactus 2.8.3 is available in the following forms:
- Docker Image:
quay.io/comparative-genomics-toolkit/cactus:v2.8.3
GPU-accelerated Docker Image:quay.io/comparative-genomics-toolkit/cactus:v2.8.3-gpu
Install instructions in README.md - Pre-compiled Binaries Linux Tarball: cactus-bin-v2.8.3.tar.gz
Install instructions in BIN-INSTALL.md - Pre-compiled Binaries For Older CPU Architectures Linux Tarball: cactus-bin-legacy-v2.8.3.tar.gz
Install instructions in BIN-INSTALL.md - Source Tarball: cactus-v2.8.3.tar.gz
Install instructions in README.md
WARNING: do not use the github automatically generated source files (Source code (zip)
or Source code (tar.gz)
), these are not correct.
Note: The gpu docker image was built using this patch that bumps the Ubuntu version from 20.04 to 22.04.
The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (and, since version 2.3.1, support Cactus's pangenome pipeline).
Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.
Important note about installing on Python3.8: You may need to run python3 -m pip install backports.zoneinfo
Release Notes
This release fixes some bugs and updates to the latest Toil.
- Fix broken
--restart
option incactus-graphmap
- Raise Toil job memory requirement for
filter-paf-deletions
- Update to
vg
v1.57.0 - Update
Toil
to v7.0 - Fix bug where trim-outgroups job could requeset way too little memory when there are no outgroups
- Fix typo that broke
cactus-maf2bigmaf
on uncompressed inputs - More robust implementation of
vcfwave
- Fix bug where RED preprocessing crashed
awk
returned a number in scientific notation
Cactus 2.8.2 2024-05-09
Cactus 2.8.2 is available in the following forms:
- Docker Image:
quay.io/comparative-genomics-toolkit/cactus:v2.8.2
GPU-accelerated Docker Image:quay.io/comparative-genomics-toolkit/cactus:v2.8.2-gpu
Install instructions in README.md - Pre-compiled Binaries Linux Tarball: cactus-bin-v2.8.2.tar.gz
Install instructions in BIN-INSTALL.md - Pre-compiled Binaries For Older CPU Architectures Linux Tarball: cactus-bin-legacy-v2.8.2.tar.gz
Install instructions in BIN-INSTALL.md - Source Tarball: cactus-v2.8.2.tar.gz
Install instructions in README.md
WARNING: do not use the github automatically generated source files (Source code (zip)
or Source code (tar.gz)
), these are not correct.
The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (and, since version 2.3.1, support Cactus's pangenome pipeline).
Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.
Release Notes
This release fixes some bugs and adds a (docker-only) vcfwave
normalization option for pangenomes.
- Use correct
bigChain.as
that allows chain scores to be huge (instead of capping them) - Update
odgi
,vg
,abPOA
andtaffy
to their latest releases - Fix
cactus-hal2maf
andcactus-pangenome --odgi
to work with--binariesMode docker
bcftools norm -f
now run by default on all non-raw VCF outputs (toggle off in the config)vcfwave
normalization option added for pangenomes (to mimic what was done for release HPRC graphs). Note thatvcfwave
is not included in the binary release -- you need to use the Cactus docker or build it yourself.- Minigraph fasta file renamed from
.gfa.fa
to.sv.gfa.fa
to be less confusing - Gap and empty MAF block filtering moved from
cactus-hal2maf
tocactus-maf2bigmaf
. So MAF output will now have a reference base for every position. - Fix
cactus-preprocess
to do only RED masking by default (there was previously a bug where it ran RED then lastz after). The--maskMode
option is also fixed to work properly. - Update to Toil 6.1.0
Cactus 2.8.1 2024-04-04
Cactus 2.8.1 is available in the following forms:
- Docker Image:
quay.io/comparative-genomics-toolkit/cactus:v2.8.1
GPU-accelerated Docker Image:quay.io/comparative-genomics-toolkit/cactus:v2.8.1-gpu
Install instructions in README.md - Pre-compiled Binaries Linux Tarball: cactus-bin-v2.8.1.tar.gz
Install instructions in BIN-INSTALL.md - Pre-compiled Binaries For Older CPU Architectures Linux Tarball: cactus-bin-legacy-v2.8.1.tar.gz
Install instructions in BIN-INSTALL.md - Source Tarball: cactus-v2.8.1.tar.gz
Install instructions in README.md
WARNING: do not use the github automatically generated source files (Source code (zip)
or Source code (tar.gz)
), these are not correct.
The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (and, since version 2.3.1, support Cactus's pangenome pipeline).
Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.
Release Notes
This release patches some recent bugs, including a major bug introduced in cactus-hal2maf
in v2.7.2 that could produce negative-stranded (and out of order) reference rows.
- Do not apply RED masker to contigs that are likely to crash it (tiny contigs and extremely low information contigs)
- Add
--coverage
option tocactus-hal2maf
to include table of coverage statistics in the output - Fix bug where
:start-end
contig suffixes caused the pangenome pipeline to crash. They are now correctly handled as subranges - Turn off
abPOA
seeding by default, after finding (what must be a fairly rare) case where it doesn't work. - Improve
cactus-hal2chains
interface - Add range support to
cactus-hal2maf
via--start/--length
or--bedRanges
- Deprecate
cactus-maf2bigmaf --chromSizes
(use--halFile
instead, as it handles "."s in genome names properly) - Fix bug where reference row could be lost in
cactus-hal2maf
MAF due to sorting error.
Cactus 2.8.0 2024-03-13
Cactus 2.8.0 is available in the following forms:
- Docker Image:
quay.io/comparative-genomics-toolkit/cactus:v2.8.0
GPU-accelerated Docker Image:quay.io/comparative-genomics-toolkit/cactus:v2.8.0-gpu
Install instructions in README.md - Pre-compiled Binaries Linux Tarball: cactus-bin-v2.8.0.tar.gz
Install instructions in BIN-INSTALL.md - Pre-compiled Binaries For Older CPU Architectures Linux Tarball: cactus-bin-legacy-v2.8.0.tar.gz
Install instructions in BIN-INSTALL.md - Source Tarball: cactus-v2.8.0.tar.gz
Install instructions in README.md
WARNING: do not use the github automatically generated source files (Source code (zip)
or Source code (tar.gz)
), these are not correct.
The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (and, since version 2.3.1, support Cactus's pangenome pipeline).
Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.
Release Notes
This release significantly changes the preprocessor step of Progressive Cactus in order to be more robust and efficient in the presence of unmasked repeats, something that seems more prevalent with newer, T2T assemblies.
- Replace lastz repeatmasking with REepeat Detector (RED) in the Progressive Cactus preprocessor. RED is more sensitive and orders of magnitude faster than the old lastz masking pipeline. Crucially, it is able to mask regions that would slip by RepeatMasker/WindowMasker/lastz in new T2T ape genomes that would otherwise break Cactus downstream. Tests so far show this change to make Cactus much faster and more robust. The old lastz pipeline can still be toggled back on in the config.
- Delete many unneeded files that previously collected in the jobstore directory until the end of execution. This was a particular issue in large
cactus-pangenome
runs where the jobstore would creep up to several terabytes for HPRC-sized inputs. - No longer require manually editing the blast chunksize in the config when running on Slurm (to reduce the number of jobs). It's now scaled up automatically on slurm environments (by a factor controlled in the config).
- Fix bug introduced in last release where Cactus would not work on AWS/MESOS clusters unless
--defaultMemory
and--maxMemory
options were specified (and in bytes). - Update to the latest
taffy
andvg
Cactus 2.7.2 2024-02-23
2024/03/11 NOTE: this version does not work on AWS clusters -- use the previous or next release if specifying --batchSystem mesos --provisioner aws
Cactus 2.7.2 is available in the following forms:
- Docker Image:
quay.io/comparative-genomics-toolkit/cactus:v2.7.2
GPU-accelerated Docker Image:quay.io/comparative-genomics-toolkit/cactus:v2.7.2-gpu
Install instructions in README.md - Pre-compiled Binaries Linux Tarball: cactus-bin-v2.7.2.tar.gz
Install instructions in BIN-INSTALL.md - Pre-compiled Binaries For Older CPU Architectures Linux Tarball: cactus-bin-legacy-v2.7.2.tar.gz
Install instructions in BIN-INSTALL.md - Source Tarball: cactus-v2.7.2.tar.gz
Install instructions in README.md
WARNING: do not use the github automatically generated source files (Source code (zip)
or Source code (tar.gz)
), these are not correct.
The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (and, since version 2.3.1, support Cactus's pangenome pipeline).
Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.
Release Notes
This release improves MAF output, along with some other fixes
--maxMemory
option given more teeth. It is now used to clamp most large Toil jobs. On single-machine it defaults to system memory. This should prevent errors where Toil requrests more memory than available, halting the pipeline in an un-resumable state.- Update to latest
taffy
and use newer MAF normalization. This should result in larger blocks and fewer gaps. MAF rows will now be sorted phylogenetically rather than alphabetically - Better handle
.
characters in genome names during MAF processing. Previously neither duplicate filtering nor bigmaf summary creation could handle dots, but that should be fixed now. - Duplicate filtering now done automatically in
cactus-maf2bigmaf
. - Disable support for multifurcations (aka polytomies or internal nodes with more than 2 children) in Progressive Cactus. I'm doing this because I got spooked by a drop in coverage I noticed recently in a 4-child alignment. This regression appears to be linked to the new PAF chaining logic that's been added over the past several months. Until that's resolved, Cactus will exit with an error if it sees degree > 2 in the tree. This behaviour can, however, be overridden in the XML configuration file.