Skip to content

Latest commit

 

History

History
66 lines (48 loc) · 2.95 KB

README.md

File metadata and controls

66 lines (48 loc) · 2.95 KB

Merqury

Merqury was run three times using Illumina HiFi and hybrid 21-mer databases during evaluation. We recommend using the hybrid k-mer db for evaluating CHM13 assemblies.

Dependencies

Pre-built CHM13 databases are downloadable from here:

Extract with tar -xzf, download Merqury and Meryl. No installation is required to run Merqury and a binary release is available for Meryl.

Quick start

$tools/merqury/merqury.sh $read.meryl asm.fasta out-prefix

asm_only.bed file contains k-mers in the assembly not found in the given $read.meryl.

Use bedtools to merge and provide for obtaining low coverage associated with consensus base error.

Generating k-mer databases

1. Illumina and/or HiFi

In general, any k-mer databases can be obtained with

meryl count k=21 reads.fastq output reads.meryl

2. Hybrid

While evaluating T2T-CHM13v0.9, we noticed sequencing biases affecting k-mers when estimating base accuracy (QV). Therefore, we built a hybrid database to exclude low frequency erroneous k-mers and include reliable k-mers found either in Illumina or HiFi. For more details, refer to McCartney et al, 2021.

The hybrid k-mer database was generated by first excluding k-mers occurring only once in each database:

meryl greater-than 1 IlluminaPCRfree.k21.meryl output illm.gt1.meryl
meryl greater-than 1 hifi20k.k21.meryl output hifi.gt1.meryl

Then, union the two dbs

meryl union-sum illm.gt1.meryl hifi.gt1.meryl output hybrid.meryl

In one command line:

meryl union-sum [ greater-than 1 IlluminaPCRfree.k21.meryl ] [ greater-than 1 hifi20k.k21.meryl ] output hybrid.meryl

Several tweaks were applied to match the coverage differences in the two dbs, which is however no longer recommended.
Below are kept as a legacy for records, as was taken in the McCartney et al. paper.

After counting, count frequencies were adjusted to match the diploid (2-copy) peak to 35x:

meryl divide-round 3 illm.gt1.meryl output illm.gt1.div3.meryl
meryl increase 4 hifi.gt1.meryl output hifi.gt1.add4.meryl

Then, the union of the two dbs were performed, setting the frequency to the maximum observed:

meryl union-max illm.gt1.div3.meryl hifi.gt1.add4.meryl output hybrid.meryl