-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about Depth Coverage parameter #128
Comments
Clair3 downsamples input to either 144 or 89-fold coverage if the input coverage is higher than 144. |
Even though it is not totally impossible, this level of fluctuation is not normal. Would you be able to show us an IGV screenshot of these sites, or simply send me a bam of this small region so I could look into the details? |
Here is a first snp with a DP in the VCF of 184 : I have also another information : I also use at the same time LongShot caller (https://github.com/pjedge/longshot) and for the same variant, it founds around 900X. |
It could be because of mapping quality. Clair3 doesn't consider alignments with MQ<5 by default. |
After some QC, it is not a quality problem ( all alignment have a MQ > 5 ). Do you have another lead that can explain my issue ? Thanks |
We need a bam of a few concerned positions to tell. |
Here is the BAM that correspond with the table of SNP shown above. Thanks |
Thanks for your patience. I confirmed that it's a defect of reporting DP in the r11 version only. Before r11, in which we switched to using C implementation, both pileup-calling and full-alignment-calling were reporting DP as the coverage "after subsampling". But in r11, pileup-calling still reports "after subsampling" while full-alignment-calling reports "raw coverage". We are fixing the problem and preparing to release r12 in the next week to resume reporting "coverage after subsampling" for both pileup-calling and full-alignment-calling in the new release. Before the new version, we can confirm that in r11, despite the reported coverages having discrepancies, the other characteristics of the called variants should be correct. |
Thanks @aquaskyline for your feedback. I will wait for the new release. When the new release are up, the link of the docker image (docker://hkubal/clair3:latest) will be updated ? |
Correct, all installation options will be updated in sync. |
Hello @aquaskyline, I am working with Nicolas on this issue. Is it correct that in the exemple Nicolas gave, the variants showing a DP ~ 1000X were called with a full-aligment model and therefore they should show a DP ~ 180X, as the other variants (called with pileup), in version r12? Therfore, whatever the amount of reads with give to Clair3, the DP values in the VCF file should be in the range of 100X-200X? Many thanks for your help |
@jftaly thanks for asking. To facilitate my following explanation, I use your case as an example, and I define 100x~200x as "subsampled coverage", and ~1000x as "raw coverage". In my previous reply to Nicolas, I said I propose to report "subsampled coverage" for both pileup- and full-alignment-based variant calls in r12. But after discussion with my team, and some experiments, we found that reporting "raw coverage" for both pileup- and full-alignment-based is doable without having any changes to the variant calling results (it is important for us to maintain the result consistency between versions except for sufficiently-benchmarked performance upgrades). So our new plan is to output "raw coverage" for all variant calls. We believe the new plan matches more users' expectations of the DP tag. In short, in your example, using the upcoming r12, the DP values in the VCF file should be in the range of ~1000x. |
Thanks @aquaskyline for your response. I have another question regarding the fact taht Clair3 downsamples input to either 144 or 89-fold coverage if the input coverage is higher than 144. As you can see on the table in the beginning of this topic, the depth is around 100x~200x but you state that it's maximum 144. So why we have certain value such as 185 ? Thanks in advance for the clarification |
In pileup calling, subsampling is for lowering the coverage to a value that has ample representatives during model training. The subsampling in pileup calling sets the target average coverage at 144. But because of the fluctuation of coverage along the sequence, both above and below 144 are possible. In full-alignment calling, the height of the input is limited to 89, thus after subsampling, the coverage can only be 89 or below. |
v0.1-r12 is now released. |
Thanks for the release. I have a question regarding your method of subsampling ; is it a random subsampling or do you have a specific method for the downsampling ? And when you said that you have a fluctuation in the coverage ( around 144 ), do you have a max/min coverage where the coverage can't be above/below ? |
For subsampling, we used No. using |
Ok. And when you use the parameter Because it seems that when I do a repetability analysis on the same BAM file with clair3, I obtain exactly the same VCF. |
We used a hardcoded seed, thus you should expect the same VCF output. But I need to add one more thing, |
Hello , I use clair3 on my Nanopore data.
I have a question about the Depth.
I determine the overall depth of my bam file with MosDepth (https://github.com/brentp/mosdepth).
And when I look up to the Depth of a specific variant (parameter DP on the VCF file) on the clair3 VCF output, I see that the Depth is significantly lower that the overall depth ( for example a sample with a overall Depth of 900X will have only 100X on certain variant ).
My question are, how to justify this fluctuation of Depth ? what are the criteria of filtering used by clair3 ?
and I see also if I compare duplicat from two different run that I found also a big fluctuation for the depth. Why ?
Thanks in advance
The text was updated successfully, but these errors were encountered: