Skip to content

Latest commit

 

History

History

merqury

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

Merqury

Merqury was run three times using Illumina HiFi and hybrid 21-mer databases during evaluation. We recommend using the hybrid k-mer db for evaluating CHM13 assemblies.

Dependencies

Pre-built CHM13 databases are downloadable from here:

Extract with tar -xzf, download Merqury and Meryl. No installation is required to run Merqury and a binary release is available for Meryl.

Quick start

$tools/merqury/merqury.sh $read.meryl asm.fasta out-prefix

asm_only.bed file contains k-mers in the assembly not found in the given $read.meryl.

Use bedtools to merge and provide for obtaining low coverage associated with consensus base error.

Generating k-mer databases

1. Illumina and/or HiFi

In general, any k-mer databases can be obtained with

meryl count k=21 reads.fastq output reads.meryl

2. Hybrid

While evaluating T2T-CHM13v0.9, we noticed sequencing biases affecting k-mers when estimating base accuracy (QV). Therefore, we built a hybrid database to exclude low frequency erroneous k-mers and include reliable k-mers found either in Illumina or HiFi. For more details, refer to McCartney et al, 2021.

The hybrid k-mer database was generated by first excluding k-mers occurring only once in each database:

meryl greater-than 1 IlluminaPCRfree.k21.meryl output illm.gt1.meryl
meryl greater-than 1 hifi20k.k21.meryl output hifi.gt1.meryl

Then, union the two dbs

meryl union-sum illm.gt1.meryl hifi.gt1.meryl output hybrid.meryl

In one command line:

meryl union-sum [ greater-than 1 IlluminaPCRfree.k21.meryl ] [ greater-than 1 hifi20k.k21.meryl ] output hybrid.meryl

Several tweaks were applied to match the coverage differences in the two dbs, which is however no longer recommended.
Below are kept as a legacy for records, as was taken in the McCartney et al. paper.

After counting, count frequencies were adjusted to match the diploid (2-copy) peak to 35x:

meryl divide-round 3 illm.gt1.meryl output illm.gt1.div3.meryl
meryl increase 4 hifi.gt1.meryl output hifi.gt1.add4.meryl

Then, the union of the two dbs were performed, setting the frequency to the maximum observed:

meryl union-max illm.gt1.div3.meryl hifi.gt1.add4.meryl output hybrid.meryl