Question about Depth Coverage parameter #128

nbargues · 2022-08-03T13:31:35Z

Hello , I use clair3 on my Nanopore data.

I have a question about the Depth.
I determine the overall depth of my bam file with MosDepth (https://github.com/brentp/mosdepth).
And when I look up to the Depth of a specific variant (parameter DP on the VCF file) on the clair3 VCF output, I see that the Depth is significantly lower that the overall depth ( for example a sample with a overall Depth of 900X will have only 100X on certain variant ).

My question are, how to justify this fluctuation of Depth ? what are the criteria of filtering used by clair3 ?
and I see also if I compare duplicat from two different run that I found also a big fluctuation for the depth. Why ?

Thanks in advance

aquaskyline · 2022-08-03T15:26:07Z

Clair3 downsamples input to either 144 or 89-fold coverage if the input coverage is higher than 144.

nbargues · 2022-08-03T15:33:48Z

Thanks for the quick answer. In this image you can see that some variant of around 900X and some around 100X :

So how can I explain that difference ? Why some variant are downsamples and other don't ?

Sorry if I understant incorrectly your response

aquaskyline · 2022-08-04T06:33:27Z

Even though it is not totally impossible, this level of fluctuation is not normal. Would you be able to show us an IGV screenshot of these sites, or simply send me a bam of this small region so I could look into the details?

nbargues · 2022-08-04T07:22:44Z

Here is a first snp with a DP in the VCF of 184 :

And here a second snp with a DP in the VCF of 185 :

I have also another information : I also use at the same time LongShot caller (https://github.com/pjedge/longshot) and for the same variant, it founds around 900X.

aquaskyline · 2022-08-06T04:34:40Z

It could be because of mapping quality. Clair3 doesn't consider alignments with MQ<5 by default.

nbargues · 2022-08-08T13:43:53Z

After some QC, it is not a quality problem ( all alignment have a MQ > 5 ). Do you have another lead that can explain my issue ? Thanks

aquaskyline · 2022-08-09T03:15:40Z

We need a bam of a few concerned positions to tell.

nbargues · 2022-08-09T07:41:29Z

bc01_minimap2_ont.bam.gz

Here is the BAM that correspond with the table of SNP shown above. Thanks

aquaskyline · 2022-08-11T07:53:50Z

Thanks for your patience. I confirmed that it's a defect of reporting DP in the r11 version only. Before r11, in which we switched to using C implementation, both pileup-calling and full-alignment-calling were reporting DP as the coverage "after subsampling". But in r11, pileup-calling still reports "after subsampling" while full-alignment-calling reports "raw coverage". We are fixing the problem and preparing to release r12 in the next week to resume reporting "coverage after subsampling" for both pileup-calling and full-alignment-calling in the new release. Before the new version, we can confirm that in r11, despite the reported coverages having discrepancies, the other characteristics of the called variants should be correct.

nbargues · 2022-08-11T12:24:13Z

Thanks @aquaskyline for your feedback. I will wait for the new release. When the new release are up, the link of the docker image (docker://hkubal/clair3:latest) will be updated ?

aquaskyline · 2022-08-11T12:45:51Z

Correct, all installation options will be updated in sync.

jftaly · 2022-08-17T11:31:54Z

Hello @aquaskyline,

I am working with Nicolas on this issue.
Sorry if I ask naive questions as I don't have a proper knowledge of your tool.
I would like to be sure I understood correctly what would be the VCF output after the bug correction.

Is it correct that in the exemple Nicolas gave, the variants showing a DP ~ 1000X were called with a full-aligment model and therefore they should show a DP ~ 180X, as the other variants (called with pileup), in version r12?

Therfore, whatever the amount of reads with give to Clair3, the DP values in the VCF file should be in the range of 100X-200X?

Many thanks for your help

aquaskyline · 2022-08-17T13:30:56Z

@jftaly thanks for asking. To facilitate my following explanation, I use your case as an example, and I define 100x~200x as "subsampled coverage", and ~1000x as "raw coverage". In my previous reply to Nicolas, I said I propose to report "subsampled coverage" for both pileup- and full-alignment-based variant calls in r12. But after discussion with my team, and some experiments, we found that reporting "raw coverage" for both pileup- and full-alignment-based is doable without having any changes to the variant calling results (it is important for us to maintain the result consistency between versions except for sufficiently-benchmarked performance upgrades). So our new plan is to output "raw coverage" for all variant calls. We believe the new plan matches more users' expectations of the DP tag.

In short, in your example, using the upcoming r12, the DP values in the VCF file should be in the range of ~1000x.

nbargues · 2022-08-17T14:38:30Z

Thanks @aquaskyline for your response.

I have another question regarding the fact taht Clair3 downsamples input to either 144 or 89-fold coverage if the input coverage is higher than 144.

As you can see on the table in the beginning of this topic, the depth is around 100x~200x but you state that it's maximum 144. So why we have certain value such as 185 ?

Thanks in advance for the clarification

aquaskyline · 2022-08-17T14:48:31Z

In pileup calling, subsampling is for lowering the coverage to a value that has ample representatives during model training. The subsampling in pileup calling sets the target average coverage at 144. But because of the fluctuation of coverage along the sequence, both above and below 144 are possible.

In full-alignment calling, the height of the input is limited to 89, thus after subsampling, the coverage can only be 89 or below.

aquaskyline · 2022-08-20T03:20:52Z

v0.1-r12 is now released.

nbargues · 2022-08-25T08:41:08Z

Thanks for the release.

I have a question regarding your method of subsampling ; is it a random subsampling or do you have a specific method for the downsampling ?

And when you said that you have a fluctuation in the coverage ( around 144 ), do you have a max/min coverage where the coverage can't be above/below ?

aquaskyline · 2022-08-25T08:45:14Z

For subsampling, we used samtools view -s.

No. using samtools view -s, the subsampling is rescaled by a factor f, the original min/max will become min*f/max*f.

nbargues · 2022-08-25T11:45:51Z

Ok. And when you use the parameter -s, do you use a random seed or a harcoded seed ?

Because it seems that when I do a repetability analysis on the same BAM file with clair3, I obtain exactly the same VCF.

aquaskyline · 2022-08-25T13:04:31Z

We used a hardcoded seed, thus you should expect the same VCF output. But I need to add one more thing, samtools view -s was used in r11 and the versions before. In r12, in order to retain the DP information, we implemented the "rescaled by a factor f" ourselves, thus not relying on samtools view -s anymore. Nonetheless, the subsampling logic remains the same.

aquaskyline added the enhancement New feature or request label Aug 19, 2022

aquaskyline closed this as completed Sep 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about Depth Coverage parameter #128

Question about Depth Coverage parameter #128

nbargues commented Aug 3, 2022

aquaskyline commented Aug 3, 2022

nbargues commented Aug 3, 2022 •

edited

Loading

aquaskyline commented Aug 4, 2022

nbargues commented Aug 4, 2022 •

edited

Loading

aquaskyline commented Aug 6, 2022

nbargues commented Aug 8, 2022

aquaskyline commented Aug 9, 2022

nbargues commented Aug 9, 2022

aquaskyline commented Aug 11, 2022 •

edited

Loading

nbargues commented Aug 11, 2022

aquaskyline commented Aug 11, 2022

jftaly commented Aug 17, 2022

aquaskyline commented Aug 17, 2022 •

edited

Loading

nbargues commented Aug 17, 2022

aquaskyline commented Aug 17, 2022

aquaskyline commented Aug 20, 2022

nbargues commented Aug 25, 2022

aquaskyline commented Aug 25, 2022

nbargues commented Aug 25, 2022

aquaskyline commented Aug 25, 2022

Question about Depth Coverage parameter #128

Question about Depth Coverage parameter #128

Comments

nbargues commented Aug 3, 2022

aquaskyline commented Aug 3, 2022

nbargues commented Aug 3, 2022 • edited Loading

aquaskyline commented Aug 4, 2022

nbargues commented Aug 4, 2022 • edited Loading

aquaskyline commented Aug 6, 2022

nbargues commented Aug 8, 2022

aquaskyline commented Aug 9, 2022

nbargues commented Aug 9, 2022

aquaskyline commented Aug 11, 2022 • edited Loading

nbargues commented Aug 11, 2022

aquaskyline commented Aug 11, 2022

jftaly commented Aug 17, 2022

aquaskyline commented Aug 17, 2022 • edited Loading

nbargues commented Aug 17, 2022

aquaskyline commented Aug 17, 2022

aquaskyline commented Aug 20, 2022

nbargues commented Aug 25, 2022

aquaskyline commented Aug 25, 2022

nbargues commented Aug 25, 2022

aquaskyline commented Aug 25, 2022

nbargues commented Aug 3, 2022 •

edited

Loading

nbargues commented Aug 4, 2022 •

edited

Loading

aquaskyline commented Aug 11, 2022 •

edited

Loading

aquaskyline commented Aug 17, 2022 •

edited

Loading