Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support co-annotation analysis #7

Closed
dosumis opened this issue Jun 12, 2023 · 4 comments · Fixed by #9
Closed

Support co-annotation analysis #7

dosumis opened this issue Jun 12, 2023 · 4 comments · Fixed by #9
Assignees

Comments

@dosumis
Copy link
Contributor

dosumis commented Jun 12, 2023

Multiple fields are tagged as representing cell type.

Use of each value in a cell type field defines a cell set.

We can analyse co-occurence of these values to infer relative granularity:

image

set(X) cluster_overlaps set(Y)
set(X) cluster_matches set(Y)
set(x) subcluster_of set(y)

field_name1; value1; predicate; field_name2; value2

e.g.

field_name1 value1 predicate field_name2 value2
author_category TissueResMemT cluster_matches cell_type memory T cell

To analyse co-occurence in AnnData files we can just do this:

anndata.obs[['author_cell_type', 'cell_type']].drop_duplicates()

(example looks at co-occurence of just 2 fields). Resulting dataframe can be analysed for co-occurence of key:value pairs.

See #7 (comment) for examples of co-annotation analysis to inference of relationship

@dosumis
Copy link
Contributor Author

dosumis commented Jun 12, 2023

@dosumis
Copy link
Contributor Author

dosumis commented Jun 18, 2023

print(bl.obs[['cell_type', 'cell_type_ontology_term_id', 'subclass.l1']].drop_duplicates().sort_values(by=['cell_type']).to_csv(sep='\t'))
cell_type cell_type_ontology_term_id subclass.l1
endothelial cell CL:0000115 EC
podocyte CL:0000653 POD
leukocyte CL:0000738 Immune
epithelial cell of proximal tubule CL:0002306 PT
parietal epithelial cell CL:1000452 PEC
kidney interstitial cell CL:1000500 Interstitial
kidney connecting tubule epithelial cell CL:1000768 CNT
kidney distal convoluted tubule epithelial cell CL:1000849 DCT
kidney loop of Henle thick ascending limb epithelial cell CL:1001106 TAL
kidney loop of Henle thin ascending limb epithelial cell CL:1001107 ATL/TAL

All of these are 1:1 so:

field_name1 value1 predicate field_name2 value2
cell_type endothelial cell cluster_matches subclass.l1 EC
cell_type podocyte cluster_matches subclass.l1 POD
cell_type leukocyte cluster_matches subclass.l1 Immune
cell_type epithelial cell of proximal tubule cluster_matches subclass.l1 PT
cell_type parietal epithelial cell cluster_matches subclass.l1 PEC
cell_type kidney interstitial cell cluster_matches subclass.l1 Interstitial
cell_type kidney connecting tubule epithelial cell cluster_matches subclass.l1 CNT
cell_type kidney distal convoluted tubule epithelial cell cluster_matches subclass.l1 DCT
cell_type kidney loop of Henle thick ascending limb epithelial cell cluster_matches subclass.l1 TAL
cell_type kidney loop of Henle thin ascending limb epithelial cell cluster_matches subclass.l1 ATL/TAL

print(bl.obs[['cell_type', 'cell_type_ontology_term_id', 'subclass.l2']].drop_duplicates().sort_values(by=['cell_type']).to_csv(sep='\t'))
cell_type cell_type_ontology_term_id subclass.l2
endothelial cell CL:0000115 cycEC
endothelial cell CL:0000115 EC-AEA
endothelial cell CL:0000115 EC-GC
endothelial cell CL:0000115 dEC-PTC
endothelial cell CL:0000115 EC-PTC
endothelial cell CL:0000115 EC-LYM
podocyte CL:0000653 POD
leukocyte CL:0000738 PL
leukocyte CL:0000738 ncMON
leukocyte CL:0000738 NK2
leukocyte CL:0000738 T-REG
leukocyte CL:0000738 MON
leukocyte CL:0000738 cDC
leukocyte CL:0000738 NKT
leukocyte CL:0000738 cycMNP
leukocyte CL:0000738 MDC
leukocyte CL:0000738 NK1
leukocyte CL:0000738 B
leukocyte CL:0000738 MAST
leukocyte CL:0000738 MAC-M2
leukocyte CL:0000738 T
leukocyte CL:0000738 T-CYT
leukocyte CL:0000738 cycT
leukocyte CL:0000738 pDC
epithelial cell of proximal tubule CL:0002306 cycEPI
epithelial cell of proximal tubule CL:0002306 PT-S3
epithelial cell of proximal tubule CL:0002306 dPT/DTL
epithelial cell of proximal tubule CL:0002306 dPT
epithelial cell of proximal tubule CL:0002306 aPT
epithelial cell of proximal tubule CL:0002306 PT-S1/S2
parietal epithelial cell CL:1000452 PEC
kidney interstitial cell CL:1000500 MC
kidney interstitial cell CL:1000500 REN
kidney interstitial cell CL:1000500 MyoF
kidney interstitial cell CL:1000500 dVSMC
kidney interstitial cell CL:1000500 FIB
kidney interstitial cell CL:1000500 aFIB
kidney interstitial cell CL:1000500 VSMC/P
kidney connecting tubule epithelial cell CL:1000768 CNT
kidney connecting tubule epithelial cell CL:1000768 dCNT
kidney distal convoluted tubule epithelial cell CL:1000849 dDCT
kidney distal convoluted tubule epithelial cell CL:1000849 DCT1
kidney loop of Henle thick ascending limb epithelial cell CL:1001106 C-TAL
kidney loop of Henle thick ascending limb epithelial cell CL:1001106 M-TAL
kidney loop of Henle thick ascending limb epithelial cell CL:1001106 dC-TAL
kidney loop of Henle thin ascending limb epithelial cell CL:1001107 aTAL2
kidney loop of Henle thin ascending limb epithelial cell CL:1001107 aTAL1
kidney loop of Henle thin descending limb epithelial cell CL:1001111 DTL1
kidney collecting duct principal cell CL:1001431 dCNT-PC
kidney collecting duct principal cell CL:1001431 dPC
kidney collecting duct principal cell CL:1001431 PC
kidney collecting duct principal cell CL:1001431 tPC-IC
kidney collecting duct principal cell CL:1001431 CNT-PC
kidney collecting duct intercalated cell CL:1001432 dIC-A
kidney collecting duct intercalated cell CL:1001432 CNT-IC-A
kidney collecting duct intercalated cell CL:1001432 IC-A
kidney collecting duct intercalated cell CL:1001432 IC-B

Here there are many cases where the cell_type term defines a cluster with many subclusters:

e.g.

field_name1 value1 predicate field_name2 value2
subclass.l1 dIC-A subClusterOf cell_type kidney collecting duct intercalated cell
subclass.l1 CNT-IC-A subClusterOf cell_type kidney collecting duct intercalated cell
subclass.l1 IC-A subClusterOf cell_type kidney collecting duct intercalated cell
subclass.l1 IC-B subClusterOf cell_type kidney collecting duct intercalated cell

potential issue:

Do we enforce a single direction? What does this mean for ease of deriving a graph?

Would it be better to use a different underlying formalism (e.g. networkX) to store and then treat tables as useful reports?

@dosumis
Copy link
Contributor Author

dosumis commented Jun 18, 2023

(TODO - add example of cluster overlaps)

@dosumis
Copy link
Contributor Author

dosumis commented Jun 18, 2023

CC @hkir-dev

@ubyndr ubyndr self-assigned this Jun 20, 2023
@ubyndr ubyndr closed this as completed in #9 Jun 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants