Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use QUAL for filtering in addition to GQ #117

Closed
dnil opened this issue May 10, 2023 · 6 comments
Closed

Use QUAL for filtering in addition to GQ #117

dnil opened this issue May 10, 2023 · 6 comments
Assignees

Comments

@dnil
Copy link
Collaborator

dnil commented May 10, 2023

Some callers, e.g. TNScope, will not output GQ FORMAT tags by default. It would be convenient to be able to use the QUAL tag to quality filter them before upload.

@karlnyr
Copy link
Contributor

karlnyr commented Mar 19, 2024

Just recently we had a case where the GQ seemed to be missing for somatic whole genome cases. This also led to the deletion of the case in loqusdb, perhaps reasonably so, but is it related to this issue? Adding the trace down below and tagging @ivadym who is going to look into possibilities that this originates in CG as well. To me it seems like this is not an issue with the SNV's but an issue with the SV's?

Call ['loqusdb', '--config', 'loqusdb-somatic.yaml', 'load', '--case-id', 'selectbison', '--variant-file', 'SNV.somatic.selectbison.tnscope.clinical.filtered.pass.vcf.gz', '--sv-variants', 'SV.somatic.selectbison.svdb.clinical.filtered.pass.vcf.gz', '--qual-gq'] exit with a non zero exit code
2024-03-19 07:32:54 hasta.scilifelab.se loqusdb.commands.cli[36938] INFO Running loqusdb version 2.7.1
2024-03-19 07:32:54 hasta.scilifelab.se mongo_adapter.client[36938] INFO Connecting to uri:mongodb://loqusdb_writer:******@cg-mongo1-prod.scilifelab.se:XXXXX,cg-mongo2-prod.scilifelab.se:XXXXX,cg-mongo3-prod.scilifelab.se:27019
2024-03-19 07:32:54 hasta.scilifelab.se mongo_adapter.client[36938] INFO Connection established
2024-03-19 07:32:54 hasta.scilifelab.se mongo_adapter.adapter[36938] INFO Use database loqusdb-somatic
2024-03-19 07:32:54 hasta.scilifelab.se loqusdb.utils.vcf[36938] INFO Check if vcf is on correct format...
2024-03-19 07:32:54 hasta.scilifelab.se loqusdb.utils.vcf[36938] INFO Vcf file SNV.somatic.selectbison.tnscope.clinical.filtered.pass.vcf.gz looks fine
2024-03-19 07:32:54 hasta.scilifelab.se loqusdb.utils.vcf[36938] INFO Nr of variants in vcf: 2080
2024-03-19 07:32:54 hasta.scilifelab.se loqusdb.utils.vcf[36938] INFO Type of variants in vcf: snv
2024-03-19 07:32:54 hasta.scilifelab.se loqusdb.utils.vcf[36938] INFO Check if vcf is on correct format...
[W::hts_idx_load3] The index file is older than the data file: SV.somatic.selectbison.svdb.clinical.filtered.pass.vcf.gz.tbi
2024-03-19 07:33:01 hasta.scilifelab.se loqusdb.utils.vcf[36938] INFO Vcf file SV.somatic.selectbison.svdb.clinical.filtered.pass.vcf.gz looks fine
2024-03-19 07:33:01 hasta.scilifelab.se loqusdb.utils.vcf[36938] INFO Nr of variants in vcf: 2202
2024-03-19 07:33:01 hasta.scilifelab.se loqusdb.utils.vcf[36938] INFO Type of variants in vcf: sv
[W::hts_idx_load3] The index file is older than the data file: SV.somatic.selectbison.svdb.clinical.filtered.pass.vcf.gz.tbi
2024-03-19 07:33:01 hasta.scilifelab.se loqusdb.utils.load[36938] WARNING int() argument must be a string, a bytes-like object or a number, not 'NoneType'

@karlnyr karlnyr reopened this Mar 19, 2024
@ivadym
Copy link

ivadym commented Mar 19, 2024

So it fails when parsing QUAL missing scores in the VCF. I'm not sure why it only fails for the upload of this case, since we have NoneType values for this field in previous uploaded cases...

/loqusdb/build_models/variant.py", line 193, in build_variant
    gq = int(variant.QUAL)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

@dnil
Copy link
Collaborator Author

dnil commented Mar 19, 2024

Feel free to ping with a case: was it missing both GQ and QUAL?

@ivadym
Copy link

ivadym commented Mar 19, 2024

selectbison

But we don't have this QUAL/GQ threshold logic for SVs, right? And all the variants in that VCF are SVs

if sv:
found_variant = True
else:
found_variant = False
for ind_obj in case_obj["individuals"]:
ind_id = ind_obj["ind_id"]
# Get the index position for the individual in the VCF
ind_pos = ind_obj["ind_index"]
if gq_qual:
gq = int(variant.QUAL)

@dnil
Copy link
Collaborator Author

dnil commented Mar 20, 2024

Thanks! So based on what @ivadym wrote, it should not really be the same issue. But the NoneType issue is similar. I can have a look if you haven't already: it sure looks like this repo could use a little attention. 😂

@dnil
Copy link
Collaborator Author

dnil commented Mar 20, 2024

It is related, but an additional complication. This case has a variant with neither type of quality for some arcane reason. VCF is a rather open format I guess, and all caller authors do not agree on classic convention I suppose. 😄 Will fix.

 [daniel.nilsson@hasta:/home/proj/production/housekeeper-bundles/selectbison/2024-03-13] [S_loqusdb] 3s $ zcat SNV.somatic.selectbison.tnscope.clinical.filtered.pass.vcf.gz |grep -v \# |cut -f 6 |sort |head -5
.
100
100.5
100.6
100.8
[daniel.nilsson@hasta:/home/proj/production/housekeeper-bundles/selectbison/2024-03-13] [S_loqusdb] 3s $ zgrep '\.[[:space:]]PASS' SNV.somatic.selectbison.tnscope.clinical.filtered.pass.vcf.gz
21	10793484	.	C	A	.	PASS	ECNT=1;FS=6.371;HCNT=4;MAX_ED=.;MIN_ED=.;ML_PROB=1;NLOD=26.6;NLODF=8.38;PV=0;PV2=0;SOR=1.284;TLOD=77.28;CADD=7.033;CSQ=A|intergenic_variant|MODIFIER|||||||||||||||||||SNV||||||||||||||||||||||||||21:g.10793484C>A|||||||||||||||||||||||||||||||||||	GT:AD:AF:AFDP:ALTHC:ALT_F1R2:ALT_F2R1:BaseQRankSumPS:ClippingRankSumPS:DPHC:FOXOG:MQRankSumPS:NBQPS:QSS:REF_F1R2:REF_F2R1:ReadPosEndDistPS:ReadPosRankSumPS	0/1:151,36:0.187:182:36:21:15:-0.816:0:188:0.417:0:29.173:4415,1051:86:65:38.257:-0.49	0/0:94,1:0.011:92:0:1:0:-1.731:0:89:0:0:24.091:2744,9:64:30:35.179:-0.403

To me it looks like a bug in Sentieon TNscope, but that in turn looks like closed source corporate stuff? I don't quite approve, but I guess it is fast...

@dnil dnil closed this as completed in 4d137b7 Mar 20, 2024
dnil added a commit that referenced this issue Mar 20, 2024
Fix #117 corollary - if QUAL is . set to 0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants