-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support co annotation analysis #9
Conversation
pandasaurus_cxg/anndata_analyzer.py
Outdated
lambda row: Predicate.CLUSTER_MATCHES.value | ||
if row[text] in predicate_dict.get(row["cell_type"], []) | ||
and len(predicate_dict.get(row["cell_type"], [])) == 1 | ||
else ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bit opaque. I think this works, but looks like this only tests co-annotation with cell_type field. Should be looking at all fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All fields as in tissue, diseases and organism etc. or just all other free text cell type field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just cell type fields
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All combinations of pairs free text and ontology cell type fields.
pandasaurus_cxg/anndata_analyzer.py
Outdated
and len(predicate_dict.get(row["cell_type"], [])) == 1 | ||
else ( | ||
Predicate.SUBCLUSTER_OF.value | ||
if row[text] in predicate_dict.get(row["cell_type"], []) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the if/else logic - this looks like adds SUBCLUSTEROF if there are multiple rows with "cell_type" and text
. But that could be a subcluster_of relationship in either direction or overlap
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me try to explain my reasoning behind this;
Lets say we have the following structure in predicate_dict:
{
'endothelial cell': ['Descending Vasa Recta Endothelial Cell', 'Ascending Vasa Recta Endothelial Cell', 'Afferent / Efferent Arteriole Endothelial Cell', 'Peritubular Capilary Endothelial Cell ', 'Glomerular Capillary Endothelial Cell', 'Degenerative Peritubular Capilary Endothelial Cell', 'Cycling Endothelial Cell', 'Lymphatic Endothelial Cell', 'Degenerative Endothelial Cell'],
'podocyte': ['Podocyte', 'Degenerative Podocyte'],
'leukocyte': ['Natural Killer Cell / Natural Killer T Cell', 'M2 Macrophage', 'Neutrophil', 'Monocyte-derived Cell', 'T Cell', 'Plasma Cell', 'Cycling Mononuclear Phagocyte', 'Non-classical Monocyte', 'Classical Dendritic Cell', 'Mast Cell', 'B Cell', 'Plasmacytoid Dendritic Cell', 'Cycling Natural Killer Cell / Natural Killer T Cell']
}
I iterate through the df I have and lets say the first row[text]
is 'Descending Vasa Recta Endothelial Cell' and it corresponds to 'endothelial cell' in the cell type field. I check if 'Descending Vasa Recta Endothelial Cell' is in ['Descending Vasa Recta Endothelial Cell', 'Ascending Vasa Recta Endothelial Cell', 'Afferent / Efferent Arteriole Endothelial Cell', 'Peritubular Capilary Endothelial Cell ', 'Glomerular Capillary Endothelial Cell', 'Degenerative Peritubular Capilary Endothelial Cell', 'Cycling Endothelial Cell', 'Lymphatic Endothelial Cell', 'Degenerative Endothelial Cell']
. Since the length of the list is not 1 I infer 'Descending Vasa Recta Endothelial Cell' as subcluster_of
'endothelial cell'
Is there any way to determine the direction of this relationship with the tabular data?
I assumed that everything other than cluster_matches
and subcluster_of
should be cluster_overlaps
, not sure for 100%.
pandasaurus_cxg/anndata_analyzer.py
Outdated
if row[text] in predicate_dict.get(row["cell_type"], []) | ||
else Predicate.CLUSTER_OVERLAPS.value | ||
), # All the other cases should be marked with 'cluster_overlaps', right? | ||
axis=1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is wrong. I will add a cluster_overlaps example to the ticket.
Hi @dosumis, examples for each predicate group can be seen below;
|
fixes #7
I'm not clear about how to choose the predicates mentioned in the ticket;
I've added the table without the predicates.