Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #117 - add a flag to allow QUAL filter instead of GQ #118

Merged
merged 4 commits into from
May 26, 2023
Merged

Conversation

dnil
Copy link
Collaborator

@dnil dnil commented May 16, 2023

This PR adds | fixes:

  • Flag to allow QUAL instead of GQ FORMAT tag filtering.

How to prepare for test:

  • ssh to hasta
  • Install on stage:
    bash servers/resources/SERVER.scilifelab.se/update-[THIS_TOOL]-stage.sh [THIS-BRANCH-NAME]

How to test:

  • test loading a tnscope somatic snv vcf (which has no GQ values)
  • notice only SVs loading
  • apply patch, and repeat with --qual-gq flag enabled

Expected outcome:

  • notice both SNVs and SVs loading with --qual-gq flag enabled

Review:

  • Code approved by
  • Tests executed by DN
  • "Merge and deploy" approved by

This version is a:

  • MAJOR - when you make incompatible API changes
  • MINOR - when you add functionality in a backwards compatible manner
  • PATCH - when you make backwards compatible bug fixes or documentation/instructions

@dnil
Copy link
Collaborator Author

dnil commented May 25, 2023

❌ Before:

[daniel.nilsson@hasta:/home/proj/stage/housekeeper-bundles/fleetjay/2023-05-22] [S_base] 29s $ loqusdb-somatic load --case-id fleetjay --variant-file /home/proj/stage/housekeeper-bundles/fleetjay/2023-05-22/SNV.somatic.fleetjay.tnscope.vcf.gz --sv-variants /home/proj/stage/housekeeper-bundles/fleetjay/2023-05-22/SV.somatic.fleetjay.svdb.vcf.gz
2023-05-25 16:27:31 hasta.scilifelab.se loqusdb.commands.cli[95472] INFO Running loqusdb version 2.6.10
2023-05-25 16:27:31 hasta.scilifelab.se mongo_adapter.client[95472] INFO Connecting to uri:mongodb://loqusdb-stage:******@cg-mongo-stage.scilifelab.se:27030
2023-05-25 16:27:31 hasta.scilifelab.se mongo_adapter.client[95472] INFO Connection established
2023-05-25 16:27:31 hasta.scilifelab.se mongo_adapter.adapter[95472] INFO Use database loqusdb-somatic-stage
2023-05-25 16:27:31 hasta.scilifelab.se loqusdb.utils.vcf[95472] INFO Check if vcf is on correct format...
[W::hts_idx_load3] The index file is older than the data file: /home/proj/stage/housekeeper-bundles/fleetjay/2023-05-22/SNV.somatic.fleetjay.tnscope.vcf.gz.tbi
2023-05-25 16:27:33 hasta.scilifelab.se loqusdb.utils.vcf[95472] INFO Vcf file /home/proj/stage/housekeeper-bundles/fleetjay/2023-05-22/SNV.somatic.fleetjay.tnscope.vcf.gz looks fine
2023-05-25 16:27:33 hasta.scilifelab.se loqusdb.utils.vcf[95472] INFO Nr of variants in vcf: 179449
2023-05-25 16:27:33 hasta.scilifelab.se loqusdb.utils.vcf[95472] INFO Type of variants in vcf: snv
2023-05-25 16:27:33 hasta.scilifelab.se loqusdb.utils.vcf[95472] INFO Check if vcf is on correct format...
[W::hts_idx_load3] The index file is older than the data file: /home/proj/stage/housekeeper-bundles/fleetjay/2023-05-22/SV.somatic.fleetjay.svdb.vcf.gz.tbi
[W::vcf_parse] FILTER 'GERMLINE' is not defined in the header
2023-05-25 16:27:34 hasta.scilifelab.se loqusdb.utils.vcf[95472] INFO Vcf file /home/proj/stage/housekeeper-bundles/fleetjay/2023-05-22/SV.somatic.fleetjay.svdb.vcf.gz looks fine
2023-05-25 16:27:34 hasta.scilifelab.se loqusdb.utils.vcf[95472] INFO Nr of variants in vcf: 13335
2023-05-25 16:27:34 hasta.scilifelab.se loqusdb.utils.vcf[95472] INFO Type of variants in vcf: sv
[W::hts_idx_load3] The index file is older than the data file: /home/proj/stage/housekeeper-bundles/fleetjay/2023-05-22/SNV.somatic.fleetjay.tnscope.vcf.gz.tbi
[W::hts_idx_load3] The index file is older than the data file: /home/proj/stage/housekeeper-bundles/fleetjay/2023-05-22/SV.somatic.fleetjay.svdb.vcf.gz.tbi
[W::hts_idx_load3] The index file is older than the data file: /home/proj/stage/housekeeper-bundles/fleetjay/2023-05-22/SNV.somatic.fleetjay.tnscope.vcf.gz.tbi
Inserting variants  [####################################]  100%          2023-05-25 16:27:42 hasta.scilifelab.se loqusdb.utils.load[95472] INFO Inserted 0 variants of type snv
[W::hts_idx_load3] The index file is older than the data file: /home/proj/stage/housekeeper-bundles/fleetjay/2023-05-22/SV.somatic.fleetjay.svdb.vcf.gz.tbi
[W::vcf_parse] FILTER 'GERMLINE' is not defined in the header
Inserting variants  [####################################]  100%          2023-05-25 16:28:21 hasta.scilifelab.se loqusdb.utils.load[95472] INFO Inserted 13335 variants of type sv
2023-05-25 16:28:21 hasta.scilifelab.se loqusdb.commands.load[95472] INFO Nr variants inserted: 13335
2023-05-25 16:28:21 hasta.scilifelab.se loqusdb.commands.load[95472] INFO Time to insert variants: 0:00:50.497540
2023-05-25 16:28:21 hasta.scilifelab.se loqusdb.plugins.mongo.adapter[95472] INFO All indexes exists

Prepare to repeat by deleting:

[daniel.nilsson@hasta:/home/proj/stage/housekeeper-bundles/fleetjay/2023-05-22] [S_base] 29s $ loqusdb-somatic delete -c fleetjay
2023-05-25 16:30:28 hasta.scilifelab.se loqusdb.commands.cli[193428] INFO Running loqusdb version 2.6.10
2023-05-25 16:30:28 hasta.scilifelab.se mongo_adapter.client[193428] INFO Connecting to uri:mongodb://loqusdb-stage:******@cg-mongo-stage.scilifelab.se:27030
2023-05-25 16:30:28 hasta.scilifelab.se mongo_adapter.client[193428] INFO Connection established
2023-05-25 16:30:28 hasta.scilifelab.se mongo_adapter.adapter[193428] INFO Use database loqusdb-somatic-stage
2023-05-25 16:30:28 hasta.scilifelab.se loqusdb.plugins.mongo.case[193428] INFO Removing case fleetjay from database
[W::hts_idx_load3] The index file is older than the data file: /home/proj/stage/housekeeper-bundles/fleetjay/2023-05-22/SNV.somatic.fleetjay.tnscope.vcf.gz.tbi
2023-05-25 16:30:29 hasta.scilifelab.se loqusdb.utils.delete[193428] INFO deleting variants
...

Deploy this branch:

[daniel.nilsson@hasta:/home/proj/stage/housekeeper-bundles/fleetjay/2023-05-22] [S_base] 29s $ bash /home/proj/production/servers/resources/hasta.scilifelab.se/update-tool-stage.sh -e S_loqusdb -t loqusdb -b qual_gq_filter
Collecting git+https://github.com/Clinical-Genomics/loqusdb@qual_gq_filter
  Running command git clone -q https://github.com/Clinical-Genomics/loqusdb /home/daniel.nilsson/tmp/pip-req-build-_17u3f_r
  Cloning https://github.com/Clinical-Genomics/loqusdb (to revision qual_gq_filter) to /home/daniel.nilsson/tmp/pip-req-build-_17u3f_r
  Running command git checkout -b qual_gq_filter --track origin/qual_gq_filter
  Resolved https://github.com/Clinical-Genomics/loqusdb to commit 4a755eac6ecfb4020b82764adf0d99c65c83a0a7
  Switched to a new branch 'qual_gq_filter'
  Preparing metadata (setup.py): started
  Branch qual_gq_filter set up to track remote branch qual_gq_filter from origin.
  Preparing metadata (setup.py): finished with status 'done'
WARNING: You are using pip version 21.3.1; however, version 23.1.2 is available.
Building wheels for collected packages: loqusdb
You should consider upgrading via the '/home/proj/stage/bin/miniconda3/envs/S_loqusdb/bin/python3.7 -m pip install --upgrade pip' command.
  Building wheel for loqusdb (setup.py): started
  Building wheel for loqusdb (setup.py): finished with status 'done'
  Created wheel for loqusdb: filename=loqusdb-2.6.10-py3-none-any.whl size=50215 sha256=02579aa023265220fa2d20a4c4ace41ee23aeb6e5a973a952e94c31d08c88b80
  Stored in directory: /home/daniel.nilsson/tmp/pip-ephem-wheel-cache-llwx33_q/wheels/71/99/49/1c5c3a7c59dc66d94a0e5696a5bc3c16c0d4be6742126890b8
Successfully built loqusdb
Installing collected packages: loqusdb
  Attempting uninstall: loqusdb
    Found existing installation: loqusdb 2.6.10
    Uninstalling loqusdb-2.6.10:
      Successfully uninstalled loqusdb-2.6.10
Successfully installed loqusdb-2.6.10
Collecting git+https://github.com/Clinical-Genomics/loqusdb@qual_gq_filter
  Running command git clone -q https://github.com/Clinical-Genomics/loqusdb /home/daniel.nilsson/tmp/pip-req-build-epio9vrx
  Cloning https://github.com/Clinical-Genomics/loqusdb (to revision qual_gq_filter) to /home/daniel.nilsson/tmp/pip-req-build-epio9vrx
  Running command git checkout -b qual_gq_filter --track origin/qual_gq_filter
  Resolved https://github.com/Clinical-Genomics/loqusdb to commit 4a755eac6ecfb4020b82764adf0d99c65c83a0a7
  Switched to a new branch 'qual_gq_filter'
  Preparing metadata (setup.py): started
  Branch qual_gq_filter set up to track remote branch qual_gq_filter from origin.
  Preparing metadata (setup.py): finished with status 'done'
WARNING: You are using pip version 21.3.1; however, version 23.1.2 is available.
Requirement already satisfied: pytest==5.4.3 in /home/proj/stage/bin/miniconda3/envs/S_loqusdb/lib/python3.7/site-packages (from loqusdb==2.6.10) (5.4.3)
You should consider upgrading via the '/home/proj/stage/bin/miniconda3/envs/S_loqusdb/bin/python3.7 -m pip install --upgrade pip' command.
Requirement already satisfied: cyvcf2==0.30.12 in /home/proj/stage/bin/miniconda3/envs/S_loqusdb/lib/python3.7/site-packages (from loqusdb==2.6.10) (0.30.12)
Requirement already satisfied: mongomock==3.18.0 in /home/proj/stage/bin/miniconda3/envs/S_loqusdb/lib/python3.7/site-packages (from loqusdb==2.6.10) (3.18.0)
Requirement already satisfied: click==7.1.2 in /home/proj/stage/bin/miniconda3/envs/S_loqusdb/lib/python3.7/site-packages (from loqusdb==2.6.10) (7.1.2)
Requirement already satisfied: pymongo==3.7.1 in /home/proj/stage/bin/miniconda3/envs/S_loqusdb/lib/python3.7/site-packages (from loqusdb==2.6.10) (3.7.1)
Requirement already satisfied: numpy==1.21.4 in /home/proj/stage/bin/miniconda3/envs/S_loqusdb/lib/python3.7/site-packages (from loqusdb==2.6.10) (1.21.4)
Requirement already satisfied: coloredlogs==14.0 in /home/proj/stage/bin/miniconda3/envs/S_loqusdb/lib/python3.7/site-packages (from loqusdb==2.6.10) (14.0)
Requirement already satisfied: pyyaml==5.4.0 in /home/proj/stage/bin/miniconda3/envs/S_loqusdb/lib/python3.7/site-packages (from loqusdb==2.6.10) (5.4)
Requirement already satisfied: vcftoolbox==1.5 in /home/proj/stage/bin/miniconda3/envs/S_loqusdb/lib/python3.7/site-packages (from loqusdb==2.6.10) (1.5)
Requirement already satisfied: setuptools==59.2.0 in /home/proj/stage/bin/miniconda3/envs/S_loqusdb/lib/python3.7/site-packages (from loqusdb==2.6.10) (59.2.0)
Requirement already satisfied: mongo_adapter>=0.3.3 in /home/proj/stage/bin/miniconda3/envs/S_loqusdb/lib/python3.7/site-packages (from loqusdb==2.6.10) (0.3.3)
Requirement already satisfied: ped_parser in /home/proj/stage/bin/miniconda3/envs/S_loqusdb/lib/python3.7/site-packages (from loqusdb==2.6.10) (1.6.6)
Requirement already satisfied: humanfriendly>=7.1 in /home/proj/stage/bin/miniconda3/envs/S_loqusdb/lib/python3.7/site-packages (from coloredlogs==14.0->loqusdb==2.6.10) (10.0)
Requirement already satisfied: six in /home/proj/stage/bin/miniconda3/envs/S_loqusdb/lib/python3.7/site-packages (from mongomock==3.18.0->loqusdb==2.6.10) (1.16.0)
Requirement already satisfied: sentinels in /home/proj/stage/bin/miniconda3/envs/S_loqusdb/lib/python3.7/site-packages (from mongomock==3.18.0->loqusdb==2.6.10) (1.0.0)
Requirement already satisfied: attrs>=17.4.0 in /home/proj/stage/bin/miniconda3/envs/S_loqusdb/lib/python3.7/site-packages (from pytest==5.4.3->loqusdb==2.6.10) (22.1.0)
Requirement already satisfied: more-itertools>=4.0.0 in /home/proj/stage/bin/miniconda3/envs/S_loqusdb/lib/python3.7/site-packages (from pytest==5.4.3->loqusdb==2.6.10) (8.14.0)
Requirement already satisfied: py>=1.5.0 in /home/proj/stage/bin/miniconda3/envs/S_loqusdb/lib/python3.7/site-packages (from pytest==5.4.3->loqusdb==2.6.10) (1.11.0)
Requirement already satisfied: packaging in /home/proj/stage/bin/miniconda3/envs/S_loqusdb/lib/python3.7/site-packages (from pytest==5.4.3->loqusdb==2.6.10) (21.3)
Requirement already satisfied: pluggy<1.0,>=0.12 in /home/proj/stage/bin/miniconda3/envs/S_loqusdb/lib/python3.7/site-packages (from pytest==5.4.3->loqusdb==2.6.10) (0.13.1)
Requirement already satisfied: wcwidth in /home/proj/stage/bin/miniconda3/envs/S_loqusdb/lib/python3.7/site-packages (from pytest==5.4.3->loqusdb==2.6.10) (0.2.5)
Requirement already satisfied: importlib-metadata>=0.12 in /home/proj/stage/bin/miniconda3/envs/S_loqusdb/lib/python3.7/site-packages (from pytest==5.4.3->loqusdb==2.6.10) (4.12.0)
Requirement already satisfied: typing-extensions>=3.6.4 in /home/proj/stage/bin/miniconda3/envs/S_loqusdb/lib/python3.7/site-packages (from importlib-metadata>=0.12->pytest==5.4.3->loqusdb==2.6.10) (4.3.0)
Requirement already satisfied: zipp>=0.5 in /home/proj/stage/bin/miniconda3/envs/S_loqusdb/lib/python3.7/site-packages (from importlib-metadata>=0.12->pytest==5.4.3->loqusdb==2.6.10) (3.8.1)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /home/proj/stage/bin/miniconda3/envs/S_loqusdb/lib/python3.7/site-packages (from packaging->pytest==5.4.3->loqusdb==2.6.10) (3.0.9)
fatal: Not a git repository (or any parent up to mount point /home)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
repository is clean
Logging deploy ...
Getting deployer... done.
Getting last commit message and SHA... done.
Getting version of deploy scripts... /home/proj/stage/housekeeper-bundles/fleetjay/2023-05-22
done.
Log deploy... done.
loqusdb, version 2.6.10

✅ After:

θ61° [daniel.nilsson@hasta:/home/proj/stage/housekeeper-bundles/fleetjay/2023-05-22] [S_base] 29s $ loqusdb-somatic load --qual-gq --case-id fleetjay --variant-file /home/proj/stage/housekeeper-bundles/fleetjay/2023-05-22/SNV.somatic.fleetjay.tnscope.vcf.gz --sv-variants /home/proj/stage/housekeeper-bundles/fleetjay/2023-05-22/SV.somatic.fleetjay.svdb.vcf.gz
2023-05-25 17:00:03 hasta.scilifelab.se loqusdb.commands.cli[179315] INFO Running loqusdb version 2.6.10
2023-05-25 17:00:03 hasta.scilifelab.se mongo_adapter.client[179315] INFO Connecting to uri:mongodb://loqusdb-stage:******@cg-mongo-stage.scilifelab.se:27030
2023-05-25 17:00:03 hasta.scilifelab.se mongo_adapter.client[179315] INFO Connection established
2023-05-25 17:00:03 hasta.scilifelab.se mongo_adapter.adapter[179315] INFO Use database loqusdb-somatic-stage
2023-05-25 17:00:03 hasta.scilifelab.se loqusdb.utils.vcf[179315] INFO Check if vcf is on correct format...
[W::hts_idx_load3] The index file is older than the data file: /home/proj/stage/housekeeper-bundles/fleetjay/2023-05-22/SNV.somatic.fleetjay.tnscope.vcf.gz.tbi
2023-05-25 17:00:05 hasta.scilifelab.se loqusdb.utils.vcf[179315] INFO Vcf file /home/proj/stage/housekeeper-bundles/fleetjay/2023-05-22/SNV.somatic.fleetjay.tnscope.vcf.gz looks fine
2023-05-25 17:00:05 hasta.scilifelab.se loqusdb.utils.vcf[179315] INFO Nr of variants in vcf: 179449
2023-05-25 17:00:05 hasta.scilifelab.se loqusdb.utils.vcf[179315] INFO Type of variants in vcf: snv
2023-05-25 17:00:05 hasta.scilifelab.se loqusdb.utils.vcf[179315] INFO Check if vcf is on correct format...
[W::hts_idx_load3] The index file is older than the data file: /home/proj/stage/housekeeper-bundles/fleetjay/2023-05-22/SV.somatic.fleetjay.svdb.vcf.gz.tbi
[W::vcf_parse] FILTER 'GERMLINE' is not defined in the header
2023-05-25 17:00:06 hasta.scilifelab.se loqusdb.utils.vcf[179315] INFO Vcf file /home/proj/stage/housekeeper-bundles/fleetjay/2023-05-22/SV.somatic.fleetjay.svdb.vcf.gz looks fine
2023-05-25 17:00:06 hasta.scilifelab.se loqusdb.utils.vcf[179315] INFO Nr of variants in vcf: 13335
2023-05-25 17:00:06 hasta.scilifelab.se loqusdb.utils.vcf[179315] INFO Type of variants in vcf: sv
[W::hts_idx_load3] The index file is older than the data file: /home/proj/stage/housekeeper-bundles/fleetjay/2023-05-22/SNV.somatic.fleetjay.tnscope.vcf.gz.tbi
[W::hts_idx_load3] The index file is older than the data file: /home/proj/stage/housekeeper-bundles/fleetjay/2023-05-22/SV.somatic.fleetjay.svdb.vcf.gz.tbi
[W::hts_idx_load3] The index file is older than the data file: /home/proj/stage/housekeeper-bundles/fleetjay/2023-05-22/SNV.somatic.fleetjay.tnscope.vcf.gz.tbi
Inserting variants  [####################################]  100%          2023-05-25 17:00:13 hasta.scilifelab.se loqusdb.utils.load[179315] INFO Inserted 12437 variants of type snv
[W::hts_idx_load3] The index file is older than the data file: /home/proj/stage/housekeeper-bundles/fleetjay/2023-05-22/SV.somatic.fleetjay.svdb.vcf.gz.tbi
[W::vcf_parse] FILTER 'GERMLINE' is not defined in the header
Inserting variants  [####################################]  100%          2023-05-25 17:00:51 hasta.scilifelab.se loqusdb.utils.load[179315] INFO Inserted 13335 variants of type sv
2023-05-25 17:00:51 hasta.scilifelab.se loqusdb.commands.load[179315] INFO Nr variants inserted: 25772
2023-05-25 17:00:51 hasta.scilifelab.se loqusdb.commands.load[179315] INFO Time to insert variants: 0:00:48.274554
2023-05-25 17:00:51 hasta.scilifelab.se loqusdb.plugins.mongo.adapter[179315] INFO All indexes exists
Screenshot 2023-05-25 at 17 04 38

@dnil dnil requested review from karlnyr and northwestwitch May 26, 2023 06:49
Copy link
Member

@northwestwitch northwestwitch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one! 👍🏻

.github/workflows/build_and_publish.yml Show resolved Hide resolved
loqusdb/build_models/variant.py Show resolved Hide resolved
setup.py Show resolved Hide resolved
Copy link
Member

@northwestwitch northwestwitch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one! 👍🏻

@dnil dnil merged commit ca0fe92 into master May 26, 2023
@dnil dnil deleted the qual_gq_filter branch May 26, 2023 08:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants