Skip to content

Releases: etal/cnvkit

Version 0.7.1

30 Sep 18:25
Compare
Choose a tag to compare

This is primarily a bugfix release. Many more unit test cases were added to the automated test suite. Code coverage is now monitored at Codecov (thanks @stevepeak).

export nexus-basic:

  • New optional argument -v/--vcf extracts SNV b-allele frequencies from the given VCF file, matches them to the bins in the .cnr file, and prints an additional "baf" column in the output table. These allele frequencies can then be viewed in Nexus Copy Number, similar to a SNP array.

call:

  • Fixed a bug in the threshold method where the copy number of haploid chromosomes was twice what it should be. The clonal method already handled these chromosomes properly. (#49)

reference:

  • Handle blank/empty antitarget BED and coverage (.cnn) files. This was a regression from earlier releases in v0.7.0. (#51)
  • When calculating GC and RepeatMasker values, catch invalid BED ranges that extend beyond the length of the chromosome and raise an informative error. This would error before, too (in ngfrills.faidx), but the message would be baffling.

fix:

  • Catch duplicated target ranges, e.g. the exact same bait labeled with two different gene names, and report those ranges in the error message. The target command's --split option should usually fix these, but sometimes it's not used.

Version 0.7.0

09 Sep 05:34
Compare
Choose a tag to compare

CNVkit now depends on pandas, SciPy, and PyVCF. The internals were largely rewritten, so please report any bugs or other regressions you find.

Documentation is much improved.

export:

  • VCF format is supported (#5, #41). The generated VCFs are compatible with many third-party tools, including development versions of MetaSV. (Thanks @chapmanb)
  • Removed the "freebayes" sub-command; use "export bed" instead.

segment:

  • The names of genes (or other targeted loci) covered by each segment are now included in the output .cns file.
  • The p-value or q-value threshold (depending on the method) can now be specified with -t/--threshold.
  • The "haar" method works properly now (#6). This segmentation algorithm is implemented in Python and does not require R to run. It is a bit faster than CBS, but not as accurate.

loh:

  • Plot variant allele frequencies (VAFs) as their actual values, 0 to 1, instead of the mirrored b-allele frequency (0.5 to 1). Draw segment mean allele frequencies separately above and below 0.5. This matches how the equivalent SNP array data are typically viewed.

antitarget:

  • Generate off-target bins for all chromosomes present in the "access" BED file, not just those where targeted regions occur. (#37)

coverage:

  • A minimum read mapping quality (MAPQ) value can now be specified with -q/--min-mapq. The default value is 0, i.e. reads are no longer excluded for low MAPQ or ambiguous mapping location. This should generally improve calling accuracy and avoid some spurious deletion calls.

Version 0.6.1

17 Jul 22:09
Compare
Choose a tag to compare

Small fixes in segmentation, affecting the output of segment and preventing crashes in segmetrics:

  • Exclude fewer low-coverage bins from segmentation (using a lower minimum coverage threshold).
  • In case the first or last bins on a chromosome were excluded from segmentation, adjust the first and last segments on each chromosome so that their endpoints match the first and last bins.
  • If no bins on a chromosome passed the coverage filter, instead of omitting the chromosome from segmentation output, generate a single segment covering the full chromosome, with segment log2 ratio 0.0. (So, all chromosomes in the .cnr file will be present in the .cns file, too.)

Version 0.6.0

11 Jul 00:36
Compare
Choose a tag to compare

Added two new commands, call and segmetrics, and a new export format, BED.

segmetrics:

  • Calculates summary statistics of the residual bin-level log2 ratio estimates from the segment means, similar to the existing metrics command, but for each segment individually. Results are output in the same format as the CNVkit segmentation file (.cns), with the stat names and calculated values printed in the "gene" column.
  • Supported stats:
    • standard deviation, median absolute deviation, inter-quartile range, Tukey's biweight midvariance (as in metrics);
    • confidence interval, estimated by bootstrap;
    • prediction interval, estimated by the range between the 2.5-97.5 percentiles of bin-level log2 ratio values within the segment.
  • Thanks to @mjafin for suggesting this feature (#28).

call:

  • Given segmented log2 ratio estimates (.cns file), round the copy ratio estimates to integer values using either:
    • A list of threshold log2 values for each copy number state, or
    • Some algebra, given known tumor cell fraction and normal ploidy. (This was previously available through the export freebayes command, see below.)
  • The output is another .cns file, where the values in the log2 column are still log2-transformed, but represent integers in log2 scale -- e.g. a neutral diploid state is represented as "0.0", not the integer 2. These output files are still compatible with the other CNVkit commands that accept .cns files, and can be plotted the same way.

export bed:

  • New bed format supporting the same features as export freebayes that were not moved into the call command (see above). The output BED file is still compatible with the FreeBayes --cnv-map option. In addition, export bed has the new option --show-neutral to also output neutral-CN segments/regions, in addition to the CNV regions output by default.
  • The export freebayes sub-command is deprecated but still available in this release; it will be removed in the next release. This command supported the tumor-purity adjustment now implemented in the call command. The recommended approach is to instead run call first on each .cns file, and then export bed on all the adjusted .cns files to get an equivalent BED file compatible with FreeBayes --cnv-map option.

Smaller changes:

  • gainloss: Reduced the default log2 ratio threshold from .5 to .2
  • import-picard: Use the un-normalized mean coverage instead of the normalized coverage of each target as the log2 coverage values in the output .cnn file. This matches the output of the coverage command; CNVkit normalizes coverages later in the pipeline.
  • Some internal refactoring. Please report any bugs, real or perceived, on our GitHub issue tracker.

Version 0.5.1

19 Jun 19:06
Compare
Choose a tag to compare

Bug fixes for two edge cases in whole genome analyses (thanks @chapmanb):

  • reference: Merging target and antitarget .cnn files where antitargets are empty
  • diagram: Avoid trying to plot segements over the start or end of chromosomes

Version 0.5.0

24 May 19:02
Compare
Choose a tag to compare

This release includes a variety of improvements to CNVkit's calling accuracy and robustness. All CNVkit files built with previous versions will continue to work with this version, but for best results, I recommend rebuilding your reference.cnn file(s) from the targetcoverage.cnn and antitargetcoverage.cnn files.

coverage:

  • Output target/antitarget coverage (.cnn) files are no longer median-centered. Read depths in each bin are still log2-scaled, but the observed read depth can now be easily recovered from .cnn files.

reference, fix:

  • Include a "flat pseudocount" in addition to the given normals, making paired tumor-normal calling much more robust and accurate.
  • Perform bias corrections on the input normal samples before calculating the average and spread of log2 values.

fix:

  • Do bias corrections before subtracting the reference, instead of after, because the reference already includes bias corrections now.
  • In addition to weighting bins by spread (which can only be observed with a pooled reference), also weight by bin size and deviation of reference log2 values in each bin from the global median. So, useful bin weights are now derived from "flat" and single-normal-sample references, too.

segment:

  • Recalculate CBS segment means using bin weights (in the R library this simply the mean, arguably a bug).
  • Set CBS segment start/end positions to match the underlying bin start/end positions.
  • Improved centromere detection -- only exclude one "large gap", if any, from each chromosome.
  • Tuned CBS calling parameters to improve accuracy (see benchmarks in the repo etal/cnvkit-examples).

diagram:

  • Label genes using the same criteria as the gainloss command: if segments are given, use the segment value at each gene, otherwise calculate the weighted average of bin-level log2 values within each gene.
  • New option -m/--min-probes to match gainloss.
  • Guess gender from chrX more reliably, so that the same gender is called from the bin-level (.cnr) and segmented (.cns) values given.

scatter, loh:

  • When plotting allele frequencies from a VCF, if segments are given (.cns), also apply those segments to allele frequencies to show LOH regions that match CNVs.
  • Skip somatic variants identified in a VCF, and try to retain only germline variants, when plotting LOH. (This is not very well standardized across callers, so please watch for bad behavior from callers other than FreeBayes and MuTect, and let me know about it!)
  • scatter only: Added options --y-min, --y-max to set y-axis limits on the plot.
  • Removed the deprecated -r option. Use -c instead.

The long-deprecated cbs command has been removed. Use segment instead.

Bugs in parsing and writing empty and 1-line VCF, BED and CNVkit files, and other VCF quirks, have now been fixed (Thanks @chapmanb!)

Version 0.4.1

02 May 00:17
Compare
Choose a tag to compare

New features:

  • scatter command:
    Option -c can now take coordinate ranges like -r, so -r is deprecated and will be removed in the next release.
  • genome2access.py script:
    New -x option to exclude additional regions. Added a new file "data/access-5k-mappable.hg19.bed" which used this option to exclude the Encode "Duke" and "Dac" low-mappability regions.

Also:

  • Improved the help/usage messages for several commands. Added a "version" command that prints the current CNVkit version. (Thanks @HenrikBengtsson)
  • Tuned CBS calling parameters to improve segmentation accuracy according to some benchmarks.
  • Sped up a few slow functions identified by profiling. In particular, metrics is much faster now.
  • Fixed bugs/incompatibilities in plotting commands and cleaned up the source code (Thanks @chapmanb and @roryk)

CNVkit can now be obtained and run as a Docker container:
https://registry.hub.docker.com/u/etal/cnvkit/

Version 0.4.0

09 Apr 19:03
Compare
Choose a tag to compare

New features:

  • Plotting ( scatter and loh commands):
    • Support VCFs from more callers, including MuTect, VarScan and FreeBayes. Support multi-sample VCFs; the sample in the VCF can be selected by name with the -i option, and will also be shown as the plot title. Thanks to Brad Chapman (@chapmanb) for this contribution. (#11)
    • Enable highlighting of selected regions other than genes using the -r and -w options. The plot title (sample ID) can also be specified with -i/--sample-id. Thanks to Brad Chapman (@chapmanb) for this contribution. (#9)
    • New -l/--range-list option to plot a BED file of regions, each in its own plot, and combine the generated plots into a single multi-page PDF file. Thanks to Rory Kirchner (@roryk) for this contribution. (#21)
  • FreeBayes export format can now handle multiple samples (.cns files).

Changes:

  • Renamed --male-normal option to --male-reference (but kept -y alias) in all commands that had it.
  • export options: Specify sample name with -i/--sample-id option instead of -n.
  • scatter plotting command: added --min-variant-depth option to match loh. (#10)
  • The loh plot command does not attempt significance testing anymore; we're working on a better solution. (#10, #18)

Bug fixes:

  • Handle empty BED/region/interval_list files, so that an empty "antitarget" file can be used when analyzing WGS or targeted amplicon capture datasets. (#19)
  • Ignore "." labels for genes, the same way we already ignore "-" labels, for better interoperability with BEDtools. Thanks to Brad Chapman (@chapmanb) for this contribution. (#12)
  • Accept "sample.bai" as index for "sample.bam". (#8)
  • SEG import: The option --from-log10 now works to convert log10 ratio values to log2 scale.

Documentation has also improved substantially, including the installation instructions. The built-in help text for each command now shows default values for each option, where applicable.

v0.3.3

25 Feb 06:26
Compare
Choose a tag to compare
  • Aesthetic improvements to plots
  • Fixed an edge case where a very small BAM file could mistakenly appear to be
    unsorted.

Release v0.3.0:

19 Dec 06:32
Compare
Choose a tag to compare
  • Enable batch to be run without specifying tumor samples, in order to only
    create a reference.
  • Copy ratios are now re-centered at the smoothed "mode" (peak density) rather
    than median, for better behavior on samples with many large-scale losses.
  • Minor fixes and improvements to several safety checks in response to feedback
    from users.