Releases: alekseyzimin/masurca
MaSuRCA v4.1.2
This release contained bug fixes and improvements, primarily to chromosome scaffolder -- the component that performs reference-based scaffolding of an assembly.
Please install MaSuRCA from the attached archive MaSuRCA-4.1.2.tar.gz. Do not use the Source files.
MaSuRCA v4.1.1
This release contained bug fixes and improvements. There is a new option for running mega-reads on the grid: GRID_ENGINE=MANUAL. This option will produce a script to run mega-reads correction jobs on multiple servers manually, and provide instructions on how to execute the jobs and restart the assembly.
Please install MaSuRCA from the attached archive MaSuRCA-4.1.1.tar.gz. Do not use the Source files.
MaSuRCA 4.1.0
This release introduces multiple improvements and compatibility fixes:
- Eugene annotation pipeline (eugene.sh), based on Maker software was improved significantly,
- SAMBA scaffolder's performance and accuracy improved,
- MaSuRCA assembler code added compatibility fixes that prevented it from running on some systems that do not support numactl
- close_scaffold_gaps.sh, a wrapper for SAMBA scaffolder aimed at closing gaps in existing scaffolds was improved
MaSuRCA 4.0.9
This release has major improvements to SAMBA scaffolder, and minor improvements to POLCA polisher and reference-based chromosome scaffolder.
Detection of misassemblies in SAMBA is improved, along with accuracy of gap-filling consensus sequences and structural quality of the output contigs. If scaffolds with gaps are given to SAMBA, it will now not consider gaps misassemblies and will avoid splitting at or near gaps. SAMBA runs automatically as the last step in MaSuRCA assembler resulting in more contiguous and correct assemblies.
POLCA polisher now outputs the QV value. POLCA can be also used as an integrated variant calling/assembly evaluation pipeline. With "-n" switch it will not make any changes in the assembly, it will produce a vcf file with all variant calls in the reads against the assembly, and output evaluation of consensus quality.
The close_scaffold_gaps.sh wrapper to SAMBA has been improved as well, and this script can be used to effectively close gaps in scaffolds with another assembly (or a reference genome for closely related species) or additional long-read data. Usage: close_scaffold_gaps.sh -h.
Performance, stability and accuracy of the chromosome scaffolder tool (chromosome_scaffolder.sh) has been improved.
MINOR UPDATE 04/29/2022: removed deprecated sys/sysctl.h header from CA8. The header was deprecated in glibc 2.32, and its presence prevented compilation on newer systems.
MaSuRCA 4.0.8
This release fixes a bug in SAMBA that resulted in failure in nucmer alignment step on some data sets.
SAMBA can now use gzipped fasta file for scaffolding sequences. The sequences to be scaffolded have to be in fasta format, not gzipped.
This release also improves usage messages.
MaSuRCA 4.0.7
This release has significant improvements to SAMBA scaffolder, in error rates, output contiguity, and consensus quality. Since SAMBA now is part of default MaSuRCA assembly pipeline, the quality and contiguity of the MaSuRCA assemblies improves as well.
I also added assembly QV computation to POLCA, QV for the assembly is now reported in .report file, along with the other metrics. Note that POLCA polisher has -n option that allows it to run in "evaluation" mode where it outputs number or errors it detects in the assembly, but does not make any corrections. After that one can rerun the pipeline without -n switch to make corrections. Also -n option is useful for efficiently producing VCF file containing variant calls made by freebayes.
MaSuRCA 4.0.6
The 4.0.6 release introduces code cleanup and performance improvements in MaSuRCA assembly pipeline, POLCA error correction/assembly evaluation tool and SAMBA scaffolder.
In response to the several issues raised by the users with use of POLCA and chromosome scaffolder, I recommend that users install MaSuRCA in a separate folder with the provided install.sh script as opposed to installing it globally into /usr/local/bin. MaSuRCA is self-contained and it does not require root privileges to compile, install and run. Many components of MaSuRCA depend on having appropriate versions of binaries such as samtools, mummer and jellyfish, that are provided with MaSuRCA and may produce errors is the system attempts to use different versions of these tools available on the $PATH. For these specific versions MaSuRCA will always first look to use the binaries installed under /path-to/MaSuRCA-x.x.x/bin/.
MaSuRCA 4.0.5
This is a maintenance release that improves the stability of masurca scaffolder (soon to be published as SAMBA tool), and improves speed and consensus quality of the hybrid assemblies.
Major changes:
- upgraded swig headers to version 4.0.2
- fixed occasional division by zero bug in masurca scaffolder
- removed the step of k-mer size reduction for the super-reads, it is not needed with the improvements that has been made recently
- the masurca_scaffolder.sh tool has been renamed to samba.sh tool (manuscript in preparation)
- The SAMBA tool can be used to close intrascaffold gaps when invoked through close_scaffold_gaps.sh script.
MaSuRCA 4.0.4
This release adds MaSuRCA scaffolder to the code. The MaSuRCA scaffolder scaffolds and gapfills existing assembled contigs or scaffolds with long reads from PacBio or Oxford Nanopore technologies, or contigs or scaffolds from another assembly of the same genome. The gapfill sequence is computed from PacBio or Nanopore consensus. The scaffoldder runs automatically as post-processor for assembly internally, resulting in significant improvement in assembly contiguity. The scaffolder script is called masurca_scaffold.sh and is invoked as follows:
masurca_scaffold.sh -r -q -t -o <maximum overhang, default 1000> -m <minimum matching length, default 5000>
Also I updated the code of several submodules to make it compatible with C++10.
MaSuRCA 4.0.3
This release adds a new quick-run option for small projects that allows to skip editing a configuration file and specify the data on the command line.
If your project uses data from a single Illumina run that produced either on file of single-end reads or two files for paired end reads, and optionally a single file containing long Nanopore or PacBio reads, you can skip creating a configuration file and use simple command-line interface to run MaSuRCA. The options are described in the usage message that invokes using -h or --help switch. There are three command line switches, -i, -t and -r. -t specifies the number of threads to use, -i specifies the names and paths to Illumina paired end reads files and -r specifies the name and the path to the long reads file. For example:
/path_to_MaSuRCA/bin/masurca -t 32 -i /path_to/pe_R1.fa,/path_to/pe_R2.fa
will run assembly with only Illumina paired end reads from files path_to/pe_R1.fa (forward) and path_to/pe_R2.fa (reverse). An example of the hybrid assembly:
/path_to_MaSuRCA/bin/masurca -t 32 -i /path_to/pe_R1.fa,/path_to/pe_R2.fa -r /path_to/nanopore.fastq.gz
This command will run a hybrid assembly, correcting nanopore reads with Illumina data first. Ilumina paired end reads files must be fastq, can be gzipped, and Nanopore/PacBio data files for the -r option can be fasta or fastq and can be gzipped.