Skip to content

Releases: ComparativeGenomicsToolkit/cactus

Cactus 2.9.3 2024-11-18

18 Nov 23:00
20488ae
Compare
Choose a tag to compare

Cactus 2.9.3 is available in the following forms:

WARNING: do not use the github automatically generated source files (Source code (zip) or Source code (tar.gz)), these are not correct.

The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (and, since version 2.3.1, support Cactus's pangenome pipeline).

Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.

Release Notes

This release adds some new options to the pangenome pipeline, and hopefully improves robustness overall

  • Faster path normalization (vg paths -n) for pangenomes via vg upgrade to v1.61.0
  • Sanity checks added to better detect corrupted intermediate FASTA files
  • Switch off abPOA's progressive mode unless input sequences have same length (otherwise sort by length)
  • --lastTrain / --scoresFile options added to learn and/or use custom scoring models for multiple alignment using last-train.
  • Update to latest vcflib. Also add vcflib installation command as option to BIN-INSTALL instructions
  • Make --maxLen default value consistent between cactus-align --pangenome and cactus-pangenome. Previously it was 100X bigger in the former, which made it very easy to have wildly different performance between the all-at-once and step-by-step versions of the pipeline
  • Fix bug where --binariesMode singularity could potentially attempt to write temporary files outside specified workDir
  • Tighten disk usage estimate for tile_alignments job
  • Patch mafTools to fix a bug where taffy normalization in cactus-hal2maf would crash if 1-character genome names were present in the input

Cactus 2.9.2 2024-10-14

14 Oct 23:51
8b4e8a9
Compare
Choose a tag to compare

Cactus 2.9.2 is available in the following forms:

WARNING: do not use the github automatically generated source files (Source code (zip) or Source code (tar.gz)), these are not correct.

The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (and, since version 2.3.1, support Cactus's pangenome pipeline).

Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.

Release Notes

This release patches a couple bugs

  • fix broken --collapse option
  • give Toil exact disk requirement for merge_aligments job, fixing a potential over-estimate
  • update abpoa to latest release (v1.5.3)

Cactus 2.9.1 2024-09-25

25 Sep 13:13
feff5d6
Compare
Choose a tag to compare

Cactus 2.9.1 is available in the following forms:

WARNING: do not use the github automatically generated source files (Source code (zip) or Source code (tar.gz)), these are not correct.

The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (and, since version 2.3.1, support Cactus's pangenome pipeline).

Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.

Release Notes

This release updates the pangenome pipeline, and adds KegAlign to progressive cactus.

  • GPU lastz implementation changed from SegAlign to KegAlign, and should be more robust and better supported as a result.
  • Path normalization added to pangenome pipeline to make sure no two equivalent alleles through any site have different paths. AT fields in VCF should now always be consistent with the graph as a result.
  • Always chop nodes to 1024bp by default in pangenome pipeline. This ensures that all outputs (gfa, gbz, vcf etc) have compatible node ids. Before, only GBZ and downstream graphs were chopped which was too confusing. Old logic can be re-activated using the config XML though.
  • Fix recent bug where using the --mgMemory option would crash cactus-pangenome
  • (Experimental) --collapse option added to pangenome pipeline to incorporate nearby self-alignments, including on the reference path.
  • Left shifting VCFs (bcftools norm -f) no longer run by default (except on vcfwave-normalized outputs), since it can cause conflicts with PanGenie by writing multiple variants at the same location.

Cactus 2.9.0 2024-07-29

29 Jul 17:39
2d706d9
Compare
Choose a tag to compare

Cactus 2.9.0 is available in the following forms:

WARNING: do not use the github automatically generated source files (Source code (zip) or Source code (tar.gz)), these are not correct.

The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (and, since version 2.3.1, support Cactus's pangenome pipeline).

Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.

Release Notes

This release addresses two important scaling issues in the pangenome pipeline.

  • The haplotype sampling index (--haplo) can now be built without giraffe indexes (--giraffe). This significantly reduces peak memory consumption when using --haplo, especially for big diverse pangenomes.
  • Previously, you could not align more than roughly 500 samples with Minigraph-Cactus, no matter how small the input genomes were. This bottleneck has been removed: you can now align as many genomes as your system resources allow. For very small genomes, this could be well into the tens of thousands.
  • Two bugs were recently found in vcfwave, which can be run with the --vcfwave option since v2.8.2. First the AT field is wrong in the output. Second, and more seriously, genotypes can be incorrect. The latter seems specific to multiallelic sites (but I'm not sure). This release now strips AT fields (they are not relevant after re-alignment anyway). It also splits multiallelic sites before running vcfwave in an attempt to work around the genotyping bug.

Cactus 2.8.4 2024-06-21

21 Jun 19:14
d62e175
Compare
Choose a tag to compare

Cactus 2.8.4 is available in the following forms:

WARNING: do not use the github automatically generated source files (Source code (zip) or Source code (tar.gz)), these are not correct.

The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (and, since version 2.3.1, support Cactus's pangenome pipeline).

Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.

Release Notes

This release updates vcfbub in order to fix a longstanding issue where this tool can produce invalid VCFs.

  • vcfbub updated to v0.1.1 which resolves a bub where records could be missing columns in the presence of . genotypes
  • run bcftools view as sanity check on generated VCFs to prevent various normalization steps from ever silently producing invalid output.

Cactus 2.8.3 2024-06-12

12 Jun 23:06
d37fce4
Compare
Choose a tag to compare

Cactus 2.8.3 is available in the following forms:

WARNING: do not use the github automatically generated source files (Source code (zip) or Source code (tar.gz)), these are not correct.

Note: The gpu docker image was built using this patch that bumps the Ubuntu version from 20.04 to 22.04.

The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (and, since version 2.3.1, support Cactus's pangenome pipeline).

Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.

Important note about installing on Python3.8: You may need to run python3 -m pip install backports.zoneinfo

Release Notes

This release fixes some bugs and updates to the latest Toil.

  • Fix broken --restart option in cactus-graphmap
  • Raise Toil job memory requirement for filter-paf-deletions
  • Update to vg v1.57.0
  • Update Toil to v7.0
  • Fix bug where trim-outgroups job could requeset way too little memory when there are no outgroups
  • Fix typo that broke cactus-maf2bigmaf on uncompressed inputs
  • More robust implementation of vcfwave
  • Fix bug where RED preprocessing crashed awk returned a number in scientific notation

Cactus 2.8.2 2024-05-09

09 May 13:29
2126683
Compare
Choose a tag to compare

Cactus 2.8.2 is available in the following forms:

WARNING: do not use the github automatically generated source files (Source code (zip) or Source code (tar.gz)), these are not correct.

The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (and, since version 2.3.1, support Cactus's pangenome pipeline).

Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.

Release Notes

This release fixes some bugs and adds a (docker-only) vcfwave normalization option for pangenomes.

  • Use correct bigChain.as that allows chain scores to be huge (instead of capping them)
  • Update odgi, vg, abPOA and taffy to their latest releases
  • Fix cactus-hal2maf and cactus-pangenome --odgi to work with --binariesMode docker
  • bcftools norm -f now run by default on all non-raw VCF outputs (toggle off in the config)
  • vcfwave normalization option added for pangenomes (to mimic what was done for release HPRC graphs). Note that vcfwave is not included in the binary release -- you need to use the Cactus docker or build it yourself.
  • Minigraph fasta file renamed from .gfa.fa to .sv.gfa.fa to be less confusing
  • Gap and empty MAF block filtering moved from cactus-hal2maf to cactus-maf2bigmaf. So MAF output will now have a reference base for every position.
  • Fix cactus-preprocess to do only RED masking by default (there was previously a bug where it ran RED then lastz after). The --maskMode option is also fixed to work properly.
  • Update to Toil 6.1.0

Cactus 2.8.1 2024-04-04

04 Apr 20:47
28a6e2c
Compare
Choose a tag to compare

Cactus 2.8.1 is available in the following forms:

WARNING: do not use the github automatically generated source files (Source code (zip) or Source code (tar.gz)), these are not correct.

The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (and, since version 2.3.1, support Cactus's pangenome pipeline).

Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.

Release Notes

This release patches some recent bugs, including a major bug introduced in cactus-hal2maf in v2.7.2 that could produce negative-stranded (and out of order) reference rows.

  • Do not apply RED masker to contigs that are likely to crash it (tiny contigs and extremely low information contigs)
  • Add --coverage option to cactus-hal2maf to include table of coverage statistics in the output
  • Fix bug where :start-end contig suffixes caused the pangenome pipeline to crash. They are now correctly handled as subranges
  • Turn off abPOA seeding by default, after finding (what must be a fairly rare) case where it doesn't work.
  • Improve cactus-hal2chains interface
  • Add range support to cactus-hal2maf via --start/--length or --bedRanges
  • Deprecate cactus-maf2bigmaf --chromSizes (use --halFile instead, as it handles "."s in genome names properly)
  • Fix bug where reference row could be lost in cactus-hal2maf MAF due to sorting error.

Cactus 2.8.0 2024-03-13

13 Mar 17:16
7286b49
Compare
Choose a tag to compare

Cactus 2.8.0 is available in the following forms:

WARNING: do not use the github automatically generated source files (Source code (zip) or Source code (tar.gz)), these are not correct.

The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (and, since version 2.3.1, support Cactus's pangenome pipeline).

Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.

Release Notes

This release significantly changes the preprocessor step of Progressive Cactus in order to be more robust and efficient in the presence of unmasked repeats, something that seems more prevalent with newer, T2T assemblies.

  • Replace lastz repeatmasking with REepeat Detector (RED) in the Progressive Cactus preprocessor. RED is more sensitive and orders of magnitude faster than the old lastz masking pipeline. Crucially, it is able to mask regions that would slip by RepeatMasker/WindowMasker/lastz in new T2T ape genomes that would otherwise break Cactus downstream. Tests so far show this change to make Cactus much faster and more robust. The old lastz pipeline can still be toggled back on in the config.
  • Delete many unneeded files that previously collected in the jobstore directory until the end of execution. This was a particular issue in large cactus-pangenome runs where the jobstore would creep up to several terabytes for HPRC-sized inputs.
  • No longer require manually editing the blast chunksize in the config when running on Slurm (to reduce the number of jobs). It's now scaled up automatically on slurm environments (by a factor controlled in the config).
  • Fix bug introduced in last release where Cactus would not work on AWS/MESOS clusters unless --defaultMemory and --maxMemory options were specified (and in bytes).
  • Update to the latest taffy and vg

Cactus 2.7.2 2024-02-23

23 Feb 18:08
41f4a3d
Compare
Choose a tag to compare

2024/03/11 NOTE: this version does not work on AWS clusters -- use the previous or next release if specifying --batchSystem mesos --provisioner aws

Cactus 2.7.2 is available in the following forms:

WARNING: do not use the github automatically generated source files (Source code (zip) or Source code (tar.gz)), these are not correct.

The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (and, since version 2.3.1, support Cactus's pangenome pipeline).

Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.

Release Notes

This release improves MAF output, along with some other fixes

  • --maxMemory option given more teeth. It is now used to clamp most large Toil jobs. On single-machine it defaults to system memory. This should prevent errors where Toil requrests more memory than available, halting the pipeline in an un-resumable state.
  • Update to latest taffy and use newer MAF normalization. This should result in larger blocks and fewer gaps. MAF rows will now be sorted phylogenetically rather than alphabetically
  • Better handle . characters in genome names during MAF processing. Previously neither duplicate filtering nor bigmaf summary creation could handle dots, but that should be fixed now.
  • Duplicate filtering now done automatically in cactus-maf2bigmaf.
  • Disable support for multifurcations (aka polytomies or internal nodes with more than 2 children) in Progressive Cactus. I'm doing this because I got spooked by a drop in coverage I noticed recently in a 4-child alignment. This regression appears to be linked to the new PAF chaining logic that's been added over the past several months. Until that's resolved, Cactus will exit with an error if it sees degree > 2 in the tree. This behaviour can, however, be overridden in the XML configuration file.