Merqury was run three times using Illumina
HiFi
and hybrid
21-mer databases during evaluation. We recommend using the hybrid
k-mer db for evaluating CHM13 assemblies.
Pre-built CHM13 databases are downloadable from here:
- IlluminaPCRfree.k21.meryl: 21-mers from Illumina PCR-free library
- hifi20k.k21.meryl: 21-mers from HiFi 20 kb library
- hybrid.k21.meryl: 21-mers from hybrid Illumina and HiFi data
Extract with tar -xzf
, download Merqury and Meryl. No installation is required to run Merqury and a binary release is available for Meryl.
$tools/merqury/merqury.sh $read.meryl asm.fasta out-prefix
asm_only.bed
file contains k-mers in the assembly not found in the given $read.meryl
.
Use bedtools to merge and provide for obtaining low coverage associated with consensus base error.
In general, any k-mer databases can be obtained with
meryl count k=21 reads.fastq output reads.meryl
While evaluating T2T-CHM13v0.9, we noticed sequencing biases affecting k-mers when estimating base accuracy (QV). Therefore, we built a hybrid database to exclude low frequency erroneous k-mers and include reliable k-mers found either in Illumina or HiFi. For more details, refer to McCartney et al, 2021.
The hybrid k-mer database was generated by first excluding k-mers occurring only once in each database:
meryl greater-than 1 IlluminaPCRfree.k21.meryl output illm.gt1.meryl
meryl greater-than 1 hifi20k.k21.meryl output hifi.gt1.meryl
Then, union the two dbs
meryl union-sum illm.gt1.meryl hifi.gt1.meryl output hybrid.meryl
In one command line:
meryl union-sum [ greater-than 1 IlluminaPCRfree.k21.meryl ] [ greater-than 1 hifi20k.k21.meryl ] output hybrid.meryl
Several tweaks were applied to match the coverage differences in the two dbs, which is however no longer recommended.
Below are kept as a legacy for records, as was taken in the McCartney et al. paper.
After counting, count frequencies were adjusted to match the diploid (2-copy) peak to 35x:
meryl divide-round 3 illm.gt1.meryl output illm.gt1.div3.meryl
meryl increase 4 hifi.gt1.meryl output hifi.gt1.add4.meryl
Then, the union of the two dbs were performed, setting the frequency to the maximum observed:
meryl union-max illm.gt1.div3.meryl hifi.gt1.add4.meryl output hybrid.meryl