Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: SVdb merge SV and CNV #886

Merged
merged 13 commits into from
Mar 15, 2022
Merged

feat: SVdb merge SV and CNV #886

merged 13 commits into from
Mar 15, 2022

Conversation

khurrammaqbool
Copy link
Collaborator

This PR:

Added: svdb merge SV and CNV

Review and tests:

  • Tests pass
  • Code review
  • New code is executed and covered by tests, and test approve

@khurrammaqbool khurrammaqbool self-assigned this Mar 14, 2022
@khurrammaqbool khurrammaqbool linked an issue Mar 14, 2022 that may be closed by this pull request
@codecov
Copy link

codecov bot commented Mar 14, 2022

Codecov Report

Merging #886 (880e860) into develop (f286961) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff            @@
##           develop     #886   +/-   ##
========================================
  Coverage    99.60%   99.60%           
========================================
  Files           29       29           
  Lines         1762     1765    +3     
========================================
+ Hits          1755     1758    +3     
  Misses           7        7           
Flag Coverage Δ
unittests 99.60% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
BALSAMIC/constants/workflow_params.py 100.00% <ø> (ø)
BALSAMIC/utils/models.py 100.00% <100.00%> (ø)
BALSAMIC/utils/rule.py 98.29% <100.00%> (ø)
BALSAMIC/constants/common.py 100.00% <0.00%> (ø)
BALSAMIC/constants/workflow_rules.py 100.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f286961...880e860. Read the comment docs.

Copy link
Contributor

@hassanfa hassanfa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brilliant solution. See my suggestions below.

@@ -222,16 +226,18 @@ rule svdb_merge_tumor_normal:
tumor = get_sample_type(config["samples"], "tumor"),
normal = get_sample_type(config["samples"], "normal"),
case_name = config["analysis"]["case_id"],
vcf= lambda wildcards, input:[input[index] + ":" + sv_callers[index] for index in range(0,len(input))],
svdb_priority= ",".join(sv_callers)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to have this variable named properly. e.g. svdb_sv_caller_prio or something.

@@ -222,16 +226,18 @@ rule svdb_merge_tumor_normal:
tumor = get_sample_type(config["samples"], "tumor"),
normal = get_sample_type(config["samples"], "normal"),
case_name = config["analysis"]["case_id"],
vcf= lambda wildcards, input:[input[index] + ":" + sv_callers[index] for index in range(0,len(input))],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably create a function for this for readability. A function that takes sv_callers and input vcf, and outputs the string that you want. This way, you'd avoid any potential bug if order of input/sv_callers changes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The solution above is currently the only one working without any errors!

Comment on lines 125 to 126
vcf= lambda wildcards, input:[input[index] + ":" + sv_callers[index] for index in range(0,len(input))],
svdb_priority= ",".join(sv_callers)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments as above.

@@ -75,7 +75,7 @@ def get_variant_callers(
WorkflowRunError if values are not valid
"""

valid_variant_callers = set()
valid_variant_callers = list()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing from list to set might cause redundant entries :-) Make sure you remove them and order it properly now.

Copy link
Collaborator Author

@khurrammaqbool khurrammaqbool Mar 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this and it is not generating redundant entries. The workflow does not show any changes. Using set() generates random order and this has significant impact on the results.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be ambiguous if/when orders change. Just something to keep it in mind.

BALSAMIC/workflows/balsamic.smk Outdated Show resolved Hide resolved
Copy link
Contributor

@ashwini06 ashwini06 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍🏼

@@ -222,16 +227,19 @@ rule svdb_merge_tumor_normal:
tumor = get_sample_type(config["samples"], "tumor"),
normal = get_sample_type(config["samples"], "normal"),
case_name = config["analysis"]["case_id"],
vcf= lambda wildcards, input:[input[index] + ":" + svdb_sv_callers_to_merge_prio[index] for index in range(0,len(input))],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

svdb_sv_callers_to_merge_prio - maybe this name can be simplified.

decoupling this into a separate function could simplify the readability. But maybe it will turn the difficulty of passing input vcf into the function without using wildcards. We can leave this as such for now., as it works fine

@@ -118,16 +122,17 @@ rule svdb_merge_tumor_only:
params:
tumor = get_sample_type(config["samples"], "tumor"),
case_name = config["analysis"]["case_id"],
vcf= lambda wildcards, input:[input[index] + ":" + svdb_sv_callers_to_merge_prio[index] for index in range(0,len(input))],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@ashwini06 ashwini06 self-requested a review March 14, 2022 17:07
@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

Copy link
Contributor

@hassanfa hassanfa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the function you added, you forgot to add wildcards as input, no?

otherwise, nicely done.

@@ -75,7 +75,7 @@ def get_variant_callers(
WorkflowRunError if values are not valid
"""

valid_variant_callers = set()
valid_variant_callers = list()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be ambiguous if/when orders change. Just something to keep it in mind.

@@ -222,16 +226,18 @@ rule svdb_merge_tumor_normal:
tumor = get_sample_type(config["samples"], "tumor"),
normal = get_sample_type(config["samples"], "normal"),
case_name = config["analysis"]["case_id"],
vcf= lambda wildcards, input:[input[index] + ":" + svdb_callers_prio[index] for index in range(0,len(input))],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this work?

def construct_vcf(w, i, sv):
  outp=list()
  for index in range(0,len(i)):
     outp.append(f"i[index]:sv[index]")
  return outp

Copy link
Contributor

@hassanfa hassanfa Mar 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe not. wildcard is the limiting factor, and can cause an issue. Ignore this :-)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe not. wildcard is the limiting factor, and can cause an issue. Ignore this :-)

Also, input is not a list and will not iterate in for....

@khurrammaqbool khurrammaqbool merged commit c749a3c into develop Mar 15, 2022
@khurrammaqbool khurrammaqbool deleted the feat/merge_sv_cnv branch March 15, 2022 08:30
@khurrammaqbool khurrammaqbool changed the title feat: svdb merge sv cnv feat: SVdb merge SV and CNV Apr 26, 2022
@khurrammaqbool khurrammaqbool mentioned this pull request Apr 26, 2022
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add SVDB to merge SV and CNV
3 participants