Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate graphs using cell sets as unifying concept #24

Merged
merged 32 commits into from
Aug 8, 2023

Conversation

ubyndr
Copy link
Collaborator

@ubyndr ubyndr commented Jul 14, 2023

Resolves #26

TODO:

  • Currently, we are using enriched_df to add cell type terms to the graph. However, we have noticed that if a cell type does not have any subClassOf relations with other cell types, those terms are missing from the graph. To address this issue, it would be better to utilize the co_annotation report for adding the cell type terms. I use the obs attribute in the anndata object to generate a cell type dictionary. This dictionary consists of cell type IDs and labels, which I then utilize to add the cell type terms to the graph. Then, we can use enriched_df specifically to incorporate the subClassOf relations between those terms. The root cause of the missing cell terms in the neo4j UI is attributed to the way I currently add the cell terms. To address this issue, I will be making updates to the enrich_rdf_graph enrich_rdf_graph method

@ubyndr ubyndr requested review from dosumis and hkir-dev July 14, 2023 13:07
Comment on lines 160 to 167
def visualize_rdf_graph(self):
nx_graph = rdflib_to_networkx_multidigraph(self.graph)
# Plot Networkx instance of RDF Graph
pos = nx.spring_layout(nx_graph, scale=2, k=2)
edge_labels = nx.get_edge_attributes(nx_graph, "r")
nx.draw_networkx_edge_labels(nx_graph, pos, edge_labels=edge_labels)
nx.draw(nx_graph, with_labels=True)
plt.show()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a placeholder; I have used OBASK to visualize graphs and examine them for validation purposes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a place where oaklib could really help. Worth talking to @anitacaron about how she uses it for visualising validation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a conversation with Anita yesterday in the office. From what I gathered, it seems that visualising the relations and neighbours of a set of terms is necessary. However, I am unsure if this is something we actually need.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know the context. I can have a closer look into oaklib.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's the OboGraph Interface

Comment on lines 122 to 132
cl_namespace = Namespace("http://purl.obolibrary.org/obo/CL_")
for curie, label in self.cell_type_dict.items():
resource = cl_namespace[curie.split(":")[-1]]
self.graph.add((resource, RDFS.label, Literal(label)))
for s, _, _ in self.graph.triples((None, self.ns["cell_type"], Literal(label))):
self.graph.add((s, self.ns["consists_of"], resource))
# add subClassOf between terms in CL enrichment
for _, row in self.enriched_df.iterrows():
for s, _, _ in self.graph.triples((None, RDFS.label, Literal(row["s_label"]))):
for o, _, _ in self.graph.triples((None, RDFS.label, Literal(row["o_label"]))):
self.graph.add((s, RDFS.subClassOf, o))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely certain if any methods, other than simple_enrichment, contribute additional information to the graph. This is because those methods may involve CL terms from a subset and context that are not utilized in annotations.
We should talk about this.

Copy link
Contributor

@dosumis dosumis Jul 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By definition all enrichment methods link to terms related to those used in annotation.

Looking at this again, I think it is clear that I have underspecified. I think the challenge is how we deal with flattening. In the pipelines Anita has worked on we use ROBOT or Souffle to strip redundancy from the flattened graph. If we're sticking with pure python we will need something similar to the the Souffle redundancy stripping algo here - which will require some thought.

For this PR I'd suggest an MVP of building a graph based on co-annotation first. We can then move folding in the enrichment graph to a second ticket/PR.

@ubyndr ubyndr force-pushed the 13-generate-graphs-using-cell-sets-as-unifying-concept branch from a221a81 to bb04f1e Compare July 25, 2023 11:44
Ismail Ugur Bayindir and others added 10 commits July 25, 2023 12:44
* Merged from main

* Updated anndata_analyzer.py

* Removed state and state.l2 from free-text annotations

* Refactored visualize_rdf_graph method

* Refactored save_rdf_graph, visualize_rdf_graph method and added transitive_reduction method

* Format changes in co_annotation_report

* Added state and state.l2 to free-text annotations
@ubyndr ubyndr merged commit c356303 into main Aug 8, 2023
ubyndr pushed a commit that referenced this pull request Aug 8, 2023
@ubyndr ubyndr linked an issue Aug 10, 2023 that may be closed by this pull request
@ubyndr ubyndr deleted the 13-generate-graphs-using-cell-sets-as-unifying-concept branch September 29, 2023 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants