Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGNOR 3.0 parser. Updated to map SIGNOR phenotypes to GO terms when … #251

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

beasleyjonm
Copy link
Contributor

Updated to map SIGNOR phenotypes to GO terms when available from their mappings. A similar mapping process can be repeated for the complexes, protein families, and stimuli terms if SIGNOR ever updates their mappings.

@@ -78,7 +79,8 @@
PLANT_GOA: ("parsers.GOA.src.loadGOA", "PlantGOALoader"),
REACTOME: ("parsers.Reactome.src.loadReactome", "ReactomeLoader"),
SCENT: ("parsers.scent.src.loadScent", "ScentLoader"),
SGD: ("parsers.SGD.src.loadSGD", "SGDLoader"),
SGD: ("parsers.SIGNOR.src.loadSIGNOR", "SIGNORLoader"),
SIGNOR: ("parsers.SGD.src.loadSGD", "SGDLoader"),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry just lurking but noticed the constants might be reversed (signor values in sgd and the other way around)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh my bad!! Thanks for catching this.

@beasleyjonm beasleyjonm requested a review from matentzn August 10, 2024 19:24
@matentzn
Copy link

Sorry I don't know enough about your project to review, I am a colleague of @DnlRKorn from an entirely different KG land.. 😂 good luck with your work!

@eKathleenCarter
Copy link
Contributor

@beasleyjonm

Currently, it cannot be run as written.

  1. SIGNOR-ST.csv, SIGNOR-PH.csv files are not included.
  2. self.data_path not set.

Please add files and define variable

@EvanDietzMorris
Copy link
Contributor

self.data_path is actually set in the init of the SourceDataLoader interface, which all parsers inherit from, and should call with super().init so that is not an issue. Maybe not the easiest to read code design but that's how it works for all them.

if source_data_dir:
self.data_path = os.path.join(source_data_dir, "source")
if not os.path.exists(self.data_path):
os.mkdir(self.data_path)
else:
self.data_path = os.environ.get("ORION_STORAGE")

Fixed issue where SIGNOR parsing fails due to filename not being included in self.data_files list.
Deleted one unnecessary line.
@eKathleenCarter eKathleenCarter linked an issue Oct 7, 2024 that may be closed by this pull request
eKathleenCarter and others added 4 commits October 22, 2024 15:17
get_data is a function that needs to be called, not a property, so this doesn't work
the typo on the website is "Mechansims" it just happened to work anyway because the mechanism download was the default
@beasleyjonm beasleyjonm closed this Dec 3, 2024
@beasleyjonm beasleyjonm reopened this Dec 3, 2024
Copy link
Contributor

@eKathleenCarter eKathleenCarter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we can really say "affects" here. This default predicate would be used for 3 edges in the data where the effect and mechanism are unknown (the values of which define all the predicates in the data). Therefore, we don't have any information except that these two entities are related. We could imply that they affect each other, but I was trying to avoid implications.

@beasleyjonm
Copy link
Contributor Author

I'm not sure if we can really say "affects" here. This default predicate would be used for 3 edges in the data where the effect and mechanism are unknown (the values of which define all the predicates in the data). Therefore, we don't have any information except that these two entities are related. We could imply that they affect each other, but I was trying to avoid implications.

It's puzzling because I'm finding 979 edges that contain the "related_to" edge. I just picked one at random:
{"subject":"UniProtKB:Q8WXG6","predicate":"biolink:related_to","object":"UniProtKB:P20336","primary_knowledge_source":"infores:signor","knowledge_level":"knowledge_assertion","agent_type":"manual_agent","publications":["PMID:11809763"],"description":["Rab3A, a member of the Rab3 small G protein family, regulates Ca(2+)-dependent exocytosis of neurotransmitter. The cyclical activation and inactivation of Rab3A are essential for the Rab3A action in exocytosis. GDP-Rab3A is activated to GTP-Rab3A by Rab3 GDP/GTP exchange protein (Rab3 GEP), and GTP-Rab3A is inactivated to GDP-Rab3A by Rab3 GTPase-activating protein (Rab3 GAP)."],"species_context_qualifier":["NCBITaxon:10116"],"anatomical_context_qualifier":["BTO:0000142"],"subject_part_qualifier":null,"object_part_qualifier":null}

However, on the SIGNOR website, this interaction has the effect and mechanism listed:
image

Ultimately, I still prefer the "affects" predicate because it preserves the directionality of effect of the subject on the object that the original source specifies. The symmetric predicate "related_to" loses the information from directionality. This is mostly a concern for rule mining and path finding methods.

@eKathleenCarter
Copy link
Contributor

I think you stumbled on a bigger problem. I completely agree with you on this example. But take a look at this:

{"subject": "UniProtKB:Q8WXG6", "predicate": "biolink:related_to", "object": "UniProtKB:P20336", "primary_knowledge_source": "infores:signor", "knowledge_level": "knowledge_assertion", "agent_type": "manual_agent", "publications": ["PMID:11809763"], "description": ["Rab3A, a member of the Rab3 small G protein family, regulates Ca(2+)-dependent exocytosis of neurotransmitter. The cyclical activation and inactivation of Rab3A are essential for the Rab3A action in exocytosis. GDP-Rab3A is activated to GTP-Rab3A by Rab3 GDP/GTP exchange protein (Rab3 GEP), and GTP-Rab3A is inactivated to GDP-Rab3A by Rab3 GTPase-activating protein (Rab3 GAP)."], "species_context_qualifier": ["NCBITaxon:10116"], "anatomical_context_qualifier": ["BTO:0000142"], "subject_part_qualifier": null, "object_part_qualifier": null}
{"subject": "UniProtKB:Q8WXG6", "predicate": "RO:0002448", "object": "UniProtKB:P20336", "primary_knowledge_source": "infores:signor", "object_direction_qualifier": "upregulates", "object_aspect_qualifier": "activity", "knowledge_level": "knowledge_assertion", "agent_type": "manual_agent", "publications": ["PMID:11809763"], "description": ["Rab3A, a member of the Rab3 small G protein family, regulates Ca(2+)-dependent exocytosis of neurotransmitter. The cyclical activation and inactivation of Rab3A are essential for the Rab3A action in exocytosis. GDP-Rab3A is activated to GTP-Rab3A by Rab3 GDP/GTP exchange protein (Rab3 GEP), and GTP-Rab3A is inactivated to GDP-Rab3A by Rab3 GTPase-activating protein (Rab3 GAP)."], "species_context_qualifier": ["NCBITaxon:10116"], "anatomical_context_qualifier": ["BTO:0000142"], "subject_part_qualifier": null, "object_part_qualifier": null}

Some how this row of data made two edges. Something else is going on here. The second edge is the one I would expect.

Comment on lines 464 to 474
if effect in effect_mapping.keys():
# Handle edge from mechanism
if mechanism:
self.create_and_parse_edge(row, extractor, mechanism=mechanism)

for predicate in effect_mapping[effect].keys():
edge_properties = effect_mapping[effect][predicate]

# Final edge creation
self.create_and_parse_edge(row, extractor, predicate=predicate,
edge_properties=edge_properties)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like it has the potential to create multiple edges from the same row

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just submitted a commit that reverses the "related_to" -> "affects" change and also handles this section so that edges are not duplicated. Now when I parse the source it only creates 3 "biolink:related_to" edges and the duplicate case we found only creates 1 edge, as it should.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this source should be good to go for inclusion in the graph now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SIGNOR
5 participants