Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support co annotation analysis #9

Merged
merged 12 commits into from
Jun 23, 2023
Merged

Conversation

ubyndr
Copy link
Collaborator

@ubyndr ubyndr commented Jun 16, 2023

fixes #7

I'm not clear about how to choose the predicates mentioned in the ticket;

  • set(X) cluster_overlaps set(Y)
  • set(X) cluster_matches set(Y)
  • set(x) subcluster_of set(y)

I've added the table without the predicates.

@ubyndr ubyndr requested review from dosumis and hkir-dev June 16, 2023 14:56
@ubyndr ubyndr marked this pull request as ready for review June 20, 2023 14:18
lambda row: Predicate.CLUSTER_MATCHES.value
if row[text] in predicate_dict.get(row["cell_type"], [])
and len(predicate_dict.get(row["cell_type"], [])) == 1
else (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit opaque. I think this works, but looks like this only tests co-annotation with cell_type field. Should be looking at all fields.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All fields as in tissue, diseases and organism etc. or just all other free text cell type field?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just cell type fields

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All combinations of pairs free text and ontology cell type fields.

pandasaurus_cxg/anndata_analyzer.py Show resolved Hide resolved
and len(predicate_dict.get(row["cell_type"], [])) == 1
else (
Predicate.SUBCLUSTER_OF.value
if row[text] in predicate_dict.get(row["cell_type"], [])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the if/else logic - this looks like adds SUBCLUSTEROF if there are multiple rows with "cell_type" and text. But that could be a subcluster_of relationship in either direction or overlap.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me try to explain my reasoning behind this;
Lets say we have the following structure in predicate_dict:

{
'endothelial cell': ['Descending Vasa Recta Endothelial Cell', 'Ascending Vasa Recta Endothelial Cell', 'Afferent / Efferent Arteriole Endothelial Cell', 'Peritubular Capilary Endothelial Cell ', 'Glomerular Capillary Endothelial Cell', 'Degenerative Peritubular Capilary Endothelial Cell', 'Cycling Endothelial Cell', 'Lymphatic Endothelial Cell', 'Degenerative Endothelial Cell'], 
'podocyte': ['Podocyte', 'Degenerative Podocyte'], 
'leukocyte': ['Natural Killer Cell / Natural Killer T Cell', 'M2 Macrophage', 'Neutrophil', 'Monocyte-derived Cell', 'T Cell', 'Plasma Cell', 'Cycling Mononuclear Phagocyte', 'Non-classical Monocyte', 'Classical Dendritic Cell', 'Mast Cell', 'B Cell', 'Plasmacytoid Dendritic Cell', 'Cycling Natural Killer Cell / Natural Killer T Cell']
}

I iterate through the df I have and lets say the first row[text] is 'Descending Vasa Recta Endothelial Cell' and it corresponds to 'endothelial cell' in the cell type field. I check if 'Descending Vasa Recta Endothelial Cell' is in ['Descending Vasa Recta Endothelial Cell', 'Ascending Vasa Recta Endothelial Cell', 'Afferent / Efferent Arteriole Endothelial Cell', 'Peritubular Capilary Endothelial Cell ', 'Glomerular Capillary Endothelial Cell', 'Degenerative Peritubular Capilary Endothelial Cell', 'Cycling Endothelial Cell', 'Lymphatic Endothelial Cell', 'Degenerative Endothelial Cell']. Since the length of the list is not 1 I infer 'Descending Vasa Recta Endothelial Cell' as subcluster_of 'endothelial cell'

Is there any way to determine the direction of this relationship with the tabular data?

I assumed that everything other than cluster_matches and subcluster_of should be cluster_overlaps, not sure for 100%.

if row[text] in predicate_dict.get(row["cell_type"], [])
else Predicate.CLUSTER_OVERLAPS.value
), # All the other cases should be marked with 'cluster_overlaps', right?
axis=1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is wrong. I will add a cluster_overlaps example to the ticket.

@ubyndr ubyndr requested a review from dosumis June 21, 2023 15:01
@ubyndr
Copy link
Collaborator Author

ubyndr commented Jun 22, 2023

Hi @dosumis, examples for each predicate group can be seen below;

  • cluster_matches
Value of field_name_1: dPT
Value of field_name_1_dict: ['Degenerative Proximal Tubule Epithelial Cell']
Value of field_name_2: Degenerative Proximal Tubule Epithelial Cell
Value of field_name_2_dict: ['dPT']
  • subcluster_of
Value of field_name_1: aTAL1
Value of field_name_1_dict: ['Adaptive / Maladaptive / Repairing Thick Ascending Limb Cell']
Value of field_name_2: Adaptive / Maladaptive / Repairing Thick Ascending Limb Cell
Value of field_name_2_dict: ['aTAL1', 'aTAL2']

Value of field_name_1: Cortical Collecting Duct Intercalated Cell Type A
Value of field_name_1_dict: ['C-IC-A']
Value of field_name_2: C-IC-A
Value of field_name_2_dict: ['Cortical Collecting Duct Intercalated Cell Type A', 'Connecting Tubule Intercalated Cell Type A']

Value of field_name_1: Connecting Tubule Principal Cell
Value of field_name_1_dict: ['CNT']
Value of field_name_2: CNT
Value of field_name_2_dict: ['Connecting Tubule Principal Cell', 'Connecting Tubule Cell']
  • supercluster_of
Value of field_name_1: stroma cells
Value of field_name_1_dict: ['kidney interstitial fibroblast', 'renal interstitial pericyte']
Value of field_name_2: renal interstitial pericyte
Value of field_name_2_dict: ['stroma cells']

Value of field_name_1: Adaptive / Maladaptive / Repairing Thick Ascending Limb Cell
Value of field_name_1_dict: ['aTAL1', 'aTAL2']
Value of field_name_2: aTAL2
Value of field_name_2_dict: ['Adaptive / Maladaptive / Repairing Thick Ascending Limb Cell']

Value of field_name_1: PT
Value of field_name_1_dict: ['dPT', 'aPT', 'cycPT', 'PT-S1/2', 'PT-S3']
Value of field_name_2: dPT
Value of field_name_2_dict: ['PT']
  • cluster_overlaps
Value of field_name_1: degenerative
Value of field_name_1_dict: ['PT', 'TAL', 'PC', 'FIB', 'EC', 'VSM/P', 'ATL', 'IC', 'CNT', 'POD', 'DTL', 'DCT']
Value of field_name_2: PT
Value of field_name_2_dict: ['degenerative', 'adaptive - epi', 'cycling', 'reference']

Value of field_name_1: reference
Value of field_name_1_dict: ['FIB', 'TAL', 'IMM', 'EC', 'IC', 'DTL', 'POD', 'ATL', 'PC', 'PT', 'CNT', 'DCT', 'VSM/P', 'NEU', 'PEC', 'PapE']
Value of field_name_2: FIB
Value of field_name_2_dict: ['reference', 'degenerative', 'adaptive - str', 'cycling']

Value of field_name_1: cycling
Value of field_name_1_dict: ['PT', 'IMM', 'EC', 'CNT', 'DCT', 'FIB']
Value of field_name_2: PT
Value of field_name_2_dict: ['degenerative', 'adaptive - epi', 'cycling', 'reference']

@ubyndr
Copy link
Collaborator Author

ubyndr commented Jun 22, 2023

@dosumis @hkir-dev
Can I merge this now with the recent changes?

@ubyndr ubyndr merged commit 7911370 into main Jun 23, 2023
@ubyndr ubyndr deleted the 7-support-co-annotation-analysis branch June 23, 2023 08:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support co-annotation analysis
2 participants