You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The purpose of this design document is to provide guidelines and generate ideas for an in-depth implementation of AnnData. It differs to #1073, in where that was a high level overview on why. Here we will discuss how we will transition to this format.
Design Overview / Code Walkthrough
Currently, we have decided to transition AnnData after Notebook 4, as it's a good point to get our feet wet without rewriting a large chunk of Ark.
Let's consider all the data associated with single FOV from the Example Dataset, FOV 0.
FOV 0 is $512px \times 512px$ image, consisting of $669$ individual cells.
We will walk through converting a majority of the data associated with FOV 0 to an AnnData table, as it's easiest to understand the conversion process by walking through a concrete example.
You will need to run the following notebooks beforehand:
obs: Houses region properties, fov, cell_meta_cluster, labels
var_names: The name of the markers (channel names)
obs_names: The cell label IDs
Since we also have transformations ($\tt{arcsinh}$) of the cell-by-markers matrix, we should add those in as an additional layer for X.
We will use DaskDataFrames to build up a computation graph and only execute it when needed.
# Load the size normalized cell table with post-pixie clustersfov0_cell_table_n=dd.read_csv(
base_dir/"segmentation/cell_table/cell_table_size_normalized_cell_labels.csv").query("fov == 'fov0'").compute()
# Load the arcsinh transformed cell table (no cluster information in this one from the example dataset)fov0_cell_table_a=dd.read_csv(base_dir/"segmentation/cell_table/cell_table_arcsinh_transformed.csv").query("fov == 'fov0'").compute()
Subset the cell tables to only extract the channels for var.
obs=fov0_cell_table_n[obs_cols].copy()
obs["label"] =obs["label"].astype(int)
# So Pandas defaults to PyArrow when it assumes it can for strings. # And this causes issues when saving the AnnData as a Zarr Store.# You can convert the PyArrow string columns to str and that'll fix it.obs["cell_meta_cluster"] =obs["cell_meta_cluster"].astype(str)
obs["fov"] =obs["fov"].astype(str)
Make sure to use Categorical Dtypes as they provide very efficient indexing / sub-setting for well, categorical properties.
Here we create the AnnData object with X, obs and the $\tt arcsinh$ transformation of X. You might see the following warning ImplicitModificationWarning, where it converts the current index (usually an integer) to a string.
The layers parameter is a mapping from a string key to a matrix of the same shape as X, representing the same data just transformed by an operation, linear or otherwise. You can have multiple layers as well.
We can view X as a DataFrame by using the to_df() method.
fov0_adata.to_df()
Since we created that layer in the initialization of fov0_adata, we can also view it's DataFrame representation of X.
fov0_adata.to_df(layer="arcsinh")
Additional Components for SquidPy Support
If we want to make use of SquidPy's Spatial Analysis tools, we need to add the following components to our AnnData object:
spatial_key: Several functions in SquidPy make use of this key to identify the spatial coordinates of each cell ($X$, $Y$). This key lives in obsm and is a 2D matrix of shape $n_{obs} \times 2$.
The shape of the distance matrix for FOV 0 is $669 \times 669$, which we can generalize to of n_obs$\times$n_obs.
We can extract the underlying NumPy array from the xarray.DataArray with the .data attribute.
A quick sidenote on .data, it preserves the underlying array type, be it a Dask Array, in memory NumPy array, SciPy sparse array, or a CuPy GPU Array.
obsp is a container to store pairwise annotations of observations, so, the pairwise distances should be stored here. We'll use the key "distance" for now to identify the distance matrix.
We subset on those associated with FOV 0 and extract only the unique cell_meta_cluster labels. This gives us the following:
The shapes would be n_cell_meta_cluster$\times$n_cell_meta_cluster which is square, but doesn't fit in with any of the other AnnData specs for storing pairwise information. These would be stored in uns.
The neighborhood diversity would be ideal for obs since there is a value per cell for each FOV. This would be an additional column. We can perform an outer merge with obs DataFrame and the neighborhood_diversity_cell_meta_cluster_radius50.csv DataFrame per FOV, on the label column.
# Merge the neighborhood diversity data with the `obs` table.fov0_adata.obs=fov0_adata.obs.merge(right=neighborhood_diversity_fov0[["label", "diversity_cell_meta_cluster"]], on= ["label"], how="outer")
fov0_adata.obs_names
neighborhood_diversity_fov0
fov0_adata.obs
uns is a catch-all container (dictionary) for unstructured annotations of the object. It's a good place to store parameters, metadata, and other information that doesn't fit into the other components of AnnData. In this case, we'll store the radius value for the neighborhood analysis.
The cell distances in cell_meta_cluster_avg_dists-nearest_5.csv contains the average distance of the $k$ the closest cells. The shape of this is n_obs$\times$n_cell_meta_cluster. This shape fits with obsm, so we can store it there or in uns.
While in genomic analyses, users generally place embeddings of their X matrix in obsm, we can also place the cell distances here as well, anything goes as long as the shapes fit, we just have to make sure to document it.
I have placed the cell_meta_cluster_avg_dists-nearest_5 in obsm["cell_distances].
The mixing scores are stored in the following files:
{popultation 1}_{population 2}-{mixing type}_mixing_score.csv, and with the example dataset they are CD4_CD8-homogeneous_mixing_score.csv,
Just like the others, we will subset on just FOV 0.
This file has the columns: fov, mixing_score, cell_count, cell_ratio, and a row per FOV.
There are a few ways to add this to the AnnData object, the first is to add the mixing_score, cell_count, the cell_ratio, population 1 and population 2 into the uns such as shown below:
Another approach would be to transform the mixing scores into Matrices, one for each FOV,
For example, here is the mixing score matrix for FOV 0, where (CD4T, CD8T) has been filled.
CD4T
CD8T
CD14_monocyte
Bcell
other
M2_macrophage
immune_other
M1_macrophage
APC
stroma
endothelium
Myofibroblast
tumor_ck17
tumor_ecad
CD4T
CD8T
0.256647
CD14_monocyte
Bcell
other
M2_macrophage
immune_other
M1_macrophage
APC
stroma
endothelium
Myofibroblast
tumor_ck17
tumor_ecad
And fill out all the values for each FOV. This would be stored in uns as well. We would also store matrices like these for cell_count and cell_ratio fields.
For now, I've stored it in uns using the first, nested dictionary approach.
Fiber Segmentation
The fiber segmentation generates two files of interest:
The fiber segmentation notebook does not make use of the cell level segmentations, and instead creates computes the fiber level segmentations from the raw image.
This one I'm not too sure about, as it doesn't look like there is a X equivalent here, so creating a unique AnnData object may not be ideal. If we could, MuData could be useful as it's a way to represent multi-modal AnnData tables.
Perhaps place it in uns? Let's place in uns for this example.
Columns include: the normalized cell table columns in addition the kmeans neighborhood column
neighborhood_marker.csv
Columns include: kmeans_neighborhood, *var_names (all the markers)
neighborhood_cell_type.csv
Columns include: kmeans_neighborhood, *unique_clusters (all the pixie clusters)
cell_table_kmeans_fov0=dd.read_csv(base_dir/"spatial_analysis/neighborhood_analysis/cell_meta_cluster_radius50_counts/cell_table_size_normalized_cell_labels_kmeans_nh.csv").query("fov == 'fov0'").compute()
cell_table_kmeans_fov0[["label", "kmeans_neighborhood"]] =cell_table_kmeans_fov0[["label", "kmeans_neighborhood"]].astype(int)
# Merge the neighborhood marker data with the `obs` table, subset on the label and kmeans_neighborhood columnsfov0_adata.obs=fov0_adata.obs.merge(right=cell_table_kmeans_fov0[["label", "kmeans_neighborhood"]], on= ["label"], how="outer")
The neighborhood_marker.csv could be stored in varm. Here we've transposed it to fit the AnnData spec.
Some other things to go over would be the Spatial LDA notebook, and I'm sure I've missed a few other metrics as well. But I think this gets an idea of the general structure and how we can convert our data to AnnData components. We can discuss the specifics of which parameters get placed where of course.
One AnnData Table per Cohort or One AnnData Table per FOV?
One of the main design decisions we have to make is whether to store all the data in one AnnData object, or to split it up into many AnnData objects.
I am in favor of a single AnnData object per FOV. This is mainly because the majority of ScanPy and SquidPy functions tend to work best on an individual "image" by "image" basis.
If we have a single AnnData object per FOV, the square shaped components might not make sense. For example, the take distance component of obsp which is a 2D array consisting of the distances between cell $i$ and cell $j$ for all cells in the FOV. If we have a single AnnData object, this matrix would take up a much larger footprint, and would require some rather obtuse indexing to get the distances for a single FOV.
This will quickly take up $\mathcal{O} (n^2)$ space which is not feasible. The majority of it will be empty though and can be represented as a Block Diagonal Matrix.
The SPAIN cohort has about 400 FOVs. While taking advantage of sparsity and Dask can make the overhead of loading the data into memory more manageable. This kind of setup would (generously) upper bound the storage to $\mathcal{O} (n\log{n})$. This is completely feasible, but it can make working with the data more difficult.
In addition, SpatialData is converting to a many AnnData table approach, where a table can be associated to 0,1 or many images / coordinate systems. See scverse/spatialdata#298 for the discussion on this.
Does this mean we need to run a for loop every time we want to run a particular function across all FOV-level AnnData tables (such as scanpy.pp.filter_cells)?
No, we can use AnnCollection to do this. We can lazily concatenate the AnnData objects and perform operations on them as if it was one big AnnData object, internally it'll load each AnnData object into memory as needed with Dask (done automatically for us) and map a function to it.
Internally, the majority of functions in Ark load the whole cell table, but end up iterating by FOV anyway, so this can simplify some functions.
As far as I know the benefits with one table per cohort start and end with having a single file stored to disk. I'm sure there are more, but I wasn't able to find them. Let me know if there are any others.
Storage: Saving and Loading
Let's take a quick detour to discuss storage.
For persistent storage we should save the AnnData table as a Zarr store.
Zarr provides a nice interface for storing AnnData objects, as it's a hierarchical key-value store, which is exactly what AnnData is.
Unfortunately, we will have to write to the Zarr store once we are done modifying the AnnData table. SpatialData provides a psuedo-"backed" feature to write modifications to disk immediately.
Where backed is defined as the property where in memory changes are written to disk immediately.
Zarr backed AnnData stores are sort of slowly in the works. See scverse/anndata#219.
#`obs_names` must be a string, or else you won't be able to read it back infov0_adata.obs_names=fov0_adata.obs_names.astype(str)
fov0_adata.write_zarr(base_dir/"tables/fov0.ome.zarr", chunks=(1000, 1000))
We can read Zarr backed AnnData Tables using AnnData.read_zarr().
The actual data, matrices, DataFrames, dictionaries and all are stored in the leaf nodes of the hierarchy as binary data. The rest of the nodes are just metadata, providing column names (if applicable) and other information, like the encoding type and encoding versioning.
Not sure where we can save the AnnData objects, perhaps under a directory called tables in the root of the cohort's dataset?
It's also very much worth considering making use of SquidPy's tool set for single cell spatial analysis. We should read through their code and learn how they interface with AnnData when designing future components / redesigning current ones.
AnnData
Conversion Design Document Part 2Relevant Background
The purpose of this design document is to provide guidelines and generate ideas for an in-depth implementation of
AnnData
. It differs to #1073, in where that was a high level overview on why. Here we will discuss how we will transition to this format.Design Overview / Code Walkthrough
Currently, we have decided to transition
AnnData
after Notebook 4, as it's a good point to get our feet wet without rewriting a large chunk ofArk
.Let's consider all the data associated with single FOV from the Example Dataset, FOV 0.
FOV 0 is$512px \times 512px$ image, consisting of $669$ individual cells.
We will walk through converting a majority of the data associated with FOV 0 to an
AnnData
table, as it's easiest to understand the conversion process by walking through a concrete example.You will need to run the following notebooks beforehand:
Let's get the following packages installed:
AnnData
Dask
Ray
optional, only used here to provide a superior scheduler forDask
Import packages, set up a local
Dask
cluster and get the example data.Cell Table (Post Notebook 4)
The
cell_table_size_normalized_cell_labels.csv
houses the following informationFor each cell we have the following types of metrics:
label
: The unique segmentation label for each cell, (integer
)markers
: The intensity of each marker for each cell, (float
), specifically the following markers:CD68, CD8, CK17, Collagen1, ECAD, Fibronectin, GLUT1,
H3K27me3, H3K9ac, HLADR, IDO, Ki67, PD1, SMA, Vim
region properties
: This consists of physical metadata or measurements of each cell, (int
,float
), specifically the following:cell_size
,area
,eccentricity
,major_axis_length
,minor_axis_length
,perimeter
,convex_area
,equivalent_diameter
,centroid-0
,centroid-1
,major_minor_axis_ratio
,perim_square_over_area
,major_axis_equiv_diam_ratio
,convex_hull_resid
,centroid_dif
,num_concavities
fov
: A unique identifier for the FOV, (str
)"fov0"
.cell_meta_cluster
: A classification for the cell (str
), in the example dataset we have:To convert these to
AnnData
components we have:X
: The values for themarkers
obs
: Housesregion properties
,fov
,cell_meta_cluster
,labels
var_names
: The name of themarkers
(channel names)obs_names
: The celllabel
IDsSince we also have transformations ($\tt{arcsinh}$ ) of the cell-by-markers matrix, we should add those in as an additional
layer
forX
.We will use
Dask
DataFrames
to build up a computation graph and only execute it when needed.Subset the cell tables to only extract the channels for
var
.Subset the cell tables to only extract the properties for
obs
.Make sure to use Categorical
Dtypes
as they provide very efficient indexing / sub-setting for well, categorical properties.Here we create the$\tt arcsinh$ transformation of
AnnData
object withX
,obs
and theX
. You might see the following warningImplicitModificationWarning
, where it converts the current index (usually an integer) to a string.The
layers
parameter is a mapping from a string key to a matrix of the same shape asX
, representing the same data just transformed by an operation, linear or otherwise. You can have multiple layers as well.X
gets saved as a NumPyNDArray
.We can view
X
as aDataFrame
by using theto_df()
method.Since we created that layer in the initialization of
fov0_adata
, we can also view it'sDataFrame
representation ofX
.Additional Components for
SquidPy
SupportIf we want to make use of
SquidPy
's Spatial Analysis tools, we need to add the following components to ourAnnData
object:spatial_key
: Several functions inSquidPy
make use of this key to identify the spatial coordinates of each cell (obsm
and is a 2D matrix of shapegr.spatial_neighbors
We will also rename the
Centroid 0
andCentroid 1
columns to something more meaningful to demonstrate modifying anAnnData
object.Spatial Analysis Components
Here we will discuss how to convert many (not all) of the spatial analysis results to
AnnData
components.Distance Matrices
The distance matrices get saved as a
netcdf3
file, which can be loaded in withxarray
.fov0_distances_xr
The shape of the distance matrix for FOV 0 is$669 \times 669$ , which we can generalize to of $\times$
n_obs
n_obs
.We can extract the underlying NumPy array from the
xarray.DataArray
with the.data
attribute.A quick sidenote on
.data
, it preserves the underlying array type, be it a Dask Array, in memory NumPy array, SciPy sparse array, or a CuPy GPU Array.obsp
is a container to store pairwise annotations of observations, so, the pairwise distances should be stored here. We'll use the key"distance"
for now to identify the distance matrix.We can view the current
AnnData
object with the newly added"distance"
attribute forobsp
.fov0_adata
The neighborhood counts and frequencies are stored in the following files:
neighborhood_counts-cell_meta_cluster_radius50.csv
neighborhood_freqs-cell_meta_cluster_radius50.csv
We subset on those associated with FOV 0 and extract only the unique
cell_meta_cluster
labels. This gives us the following:The shapes would be$\times$
n_cell_meta_cluster
n_cell_meta_cluster
which is square, but doesn't fit in with any of the otherAnnData
specs for storing pairwise information. These would be stored inuns
.fov0_adata
Cell Neighbors Analysis
The neighborhood diversity would be ideal for
obs
since there is a value per cell for each FOV. This would be an additional column. We can perform an outer merge withobs
DataFrame and theneighborhood_diversity_cell_meta_cluster_radius50.csv
DataFrame per FOV, on thelabel
column.Let's just select the diversity for FOV 0.
neighborhood_diversity_fov0
uns
is a catch-all container (dictionary) for unstructured annotations of the object. It's a good place to store parameters, metadata, and other information that doesn't fit into the other components ofAnnData
. In this case, we'll store the radius value for the neighborhood analysis.The cell distances in$k$ the closest cells. The shape of this is $\times$
cell_meta_cluster_avg_dists-nearest_5.csv
contains the average distance of then_obs
n_cell_meta_cluster
. This shape fits withobsm
, so we can store it there or inuns
.While in genomic analyses, users generally place embeddings of their
X
matrix inobsm
, we can also place the cell distances here as well, anything goes as long as the shapes fit, we just have to make sure to document it.I have placed the
cell_meta_cluster_avg_dists-nearest_5
inobsm["cell_distances]
.Mixing Scores
The mixing scores are stored in the following files:
{popultation 1}_{population 2}-{mixing type}_mixing_score.csv
, and with the example dataset they areCD4_CD8-homogeneous_mixing_score.csv
,Just like the others, we will subset on just FOV 0.
This file has the columns:
fov
,mixing_score
,cell_count
,cell_ratio
, and a row per FOV.There are a few ways to add this to the
AnnData
object, the first is to add themixing_score
,cell_count
, thecell_ratio
,population 1
andpopulation 2
into theuns
such as shown below:cd4_cd8_homogeneous_mixing_score_fov0
Another approach would be to transform the mixing scores into Matrices, one for each FOV,
For example, here is the mixing score matrix for FOV 0, where (CD4T, CD8T) has been filled.
And fill out all the values for each FOV. This would be stored in
uns
as well. We would also store matrices like these forcell_count
andcell_ratio
fields.For now, I've stored it in
uns
using the first, nested dictionary approach.Fiber Segmentation
The fiber segmentation generates two files of interest:
fiber_object_table.csv
fov
,label
,centroid-0
,centroid-1
,major_axis_length
,minor_axis_length
,orientation
,area
,eccentricity
,euler_number
,alignment_score
fiber_stats_table.csv
fov
,pixel_density
,fiber_density
,avg_major_axis_length
,avg_minor_axis_length
,avg_orientation
,avg_area
,avg_eccentricity
,avg_euler_number
,avg_alignment_score
The fiber segmentation notebook does not make use of the cell level segmentations, and instead creates computes the fiber level segmentations from the raw image.
This one I'm not too sure about, as it doesn't look like there is a
X
equivalent here, so creating a uniqueAnnData
object may not be ideal. If we could,MuData
could be useful as it's a way to represent multi-modalAnnData
tables.Perhaps place it in
uns
? Let's place inuns
for this example.Neighborhood Analysis
The neighborhood analysis notebook generates three files of interest:
cell_table_size_normalized_cell_labels_kmeans_nh.csv
kmeans
neighborhood columnneighborhood_marker.csv
kmeans_neighborhood
,*var_names
(all the markers)neighborhood_cell_type.csv
kmeans_neighborhood
,*unique_clusters
(all the pixie clusters)The
neighborhood_marker.csv
could be stored invarm
. Here we've transposed it to fit theAnnData
spec.Similar to
X
we can viewvarm
as aDataFrame
by using theto_df()
method.The
neighborhood_cell_type.csv
could be stored inuns
due to it's shape.fov0_adata
Final
AnnData
TableOur final
AnnData
table for FOV 0 now looks like this:Some other things to go over would be the Spatial LDA notebook, and I'm sure I've missed a few other metrics as well. But I think this gets an idea of the general structure and how we can convert our data to
AnnData
components. We can discuss the specifics of which parameters get placed where of course.One
AnnData
Table per Cohort or OneAnnData
Table per FOV?One of the main design decisions we have to make is whether to store all the data in one
AnnData
object, or to split it up into manyAnnData
objects.I am in favor of a single
AnnData
object per FOV. This is mainly because the majority ofScanPy
andSquidPy
functions tend to work best on an individual "image" by "image" basis.If we have a single$i$ and cell $j$ for all cells in the FOV. If we have a single
AnnData
object per FOV, the square shaped components might not make sense. For example, the takedistance
component ofobsp
which is a 2D array consisting of the distances between cellAnnData
object, this matrix would take up a much larger footprint, and would require some rather obtuse indexing to get the distances for a single FOV.This will quickly take up$\mathcal{O} (n^2)$ space which is not feasible. The majority of it will be empty though and can be represented as a Block Diagonal Matrix.
The SPAIN cohort has about 400 FOVs. While taking advantage of sparsity and$\mathcal{O} (n\log{n})$ . This is completely feasible, but it can make working with the data more difficult.
Dask
can make the overhead of loading the data into memory more manageable. This kind of setup would (generously) upper bound the storage toIn addition,
SpatialData
is converting to a manyAnnData
table approach, where a table can be associated to 0,1 or many images / coordinate systems. See scverse/spatialdata#298 for the discussion on this.Does this mean we need to run a
for
loop every time we want to run a particular function across all FOV-levelAnnData
tables (such asscanpy.pp.filter_cells
)?No, we can use
AnnCollection
to do this. We can lazily concatenate theAnnData
objects and perform operations on them as if it was one bigAnnData
object, internally it'll load eachAnnData
object into memory as needed withDask
(done automatically for us) and map a function to it.Internally, the majority of functions in
Ark
load the whole cell table, but end up iterating by FOV anyway, so this can simplify some functions.As far as I know the benefits with one table per cohort start and end with having a single file stored to disk. I'm sure there are more, but I wasn't able to find them. Let me know if there are any others.
Storage: Saving and Loading
Let's take a quick detour to discuss storage.
For persistent storage we should save the
AnnData
table as aZarr
store.Zarr
provides a nice interface for storingAnnData
objects, as it's a hierarchical key-value store, which is exactly whatAnnData
is.Unfortunately, we will have to write to the
Zarr
store once we are done modifying theAnnData
table.SpatialData
provides a psuedo-"backed" feature to write modifications to disk immediately.Where
backed
is defined as the property where in memory changes are written to disk immediately.Zarr
backedAnnData
stores are sort of slowly in the works. See scverse/anndata#219.We can read
Zarr
backedAnnData
Tables usingAnnData.read_zarr()
.Lets' take a look at what the saved
Zarr
file saved at various depths:Depth 1 Hierarchy
. ├── layers ├── obs ├── obsm ├── obsp ├── uns ├── var ├── varm ├── varp └── X
Depth 2 Hierarchy
. ├── layers │ └── arcsinh ├── obs │ ├── _index │ ├── area │ ├── cell_meta_cluster │ ├── cell_size │ ├── Centroid X │ ├── Centroid Y │ ├── centroid_dif │ ├── convex_area │ ├── convex_hull_resid │ ├── diversity_cell_meta_cluster │ ├── eccentricity │ ├── equivalent_diameter │ ├── fov │ ├── kmeans_neighborhood │ ├── label │ ├── major_axis_equiv_diam_ratio │ ├── major_axis_length │ ├── major_minor_axis_ratio │ ├── minor_axis_length │ ├── num_concavities │ ├── perim_square_over_area │ └── perimeter ├── obsm │ ├── cell_distances │ └── coordinates ├── obsp │ └── distance ├── uns │ ├── fiber_object │ ├── fiber_stats │ ├── mixing_scores │ ├── neighborhood_cell_type │ ├── neighborhood_counts │ ├── neighborhood_diversity │ └── neighborhood_freqs ├── var │ └── _index ├── varm │ └── kmeans ├── varp └── X └── 0.0
Depth 3 Hierarchy
. ├── layers │ └── arcsinh │ └── 0.0 ├── obs │ ├── _index │ │ └── 0 │ ├── area │ │ └── 0 │ ├── cell_meta_cluster │ │ ├── categories │ │ └── codes │ ├── cell_size │ │ └── 0 │ ├── Centroid X │ │ └── 0 │ ├── Centroid Y │ │ └── 0 │ ├── centroid_dif │ │ └── 0 │ ├── convex_area │ │ └── 0 │ ├── convex_hull_resid │ │ └── 0 │ ├── diversity_cell_meta_cluster │ │ └── 0 │ ├── eccentricity │ │ └── 0 │ ├── equivalent_diameter │ │ └── 0 │ ├── fov │ │ ├── categories │ │ └── codes │ ├── kmeans_neighborhood │ │ └── 0 │ ├── label │ │ └── 0 │ ├── major_axis_equiv_diam_ratio │ │ └── 0 │ ├── major_axis_length │ │ └── 0 │ ├── major_minor_axis_ratio │ │ └── 0 │ ├── minor_axis_length │ │ └── 0 │ ├── num_concavities │ │ └── 0 │ ├── perim_square_over_area │ │ └── 0 │ └── perimeter │ └── 0 ├── obsm │ ├── cell_distances │ │ ├── _index │ │ ├── APC │ │ ├── Bcell │ │ ├── CD14_monocyte │ │ ├── CD4T │ │ ├── CD8T │ │ ├── endothelium │ │ ├── immune_other │ │ ├── M1_macrophage │ │ ├── M2_macrophage │ │ ├── Myofibroblast │ │ ├── other │ │ ├── stroma │ │ ├── tumor_ck17 │ │ └── tumor_ecad │ └── coordinates │ └── 0.0 ├── obsp │ └── distance │ ├── 0.0 │ ├── 0.1 │ ├── 1.0 │ └── 1.1 ├── uns │ ├── fiber_object │ │ ├── _index │ │ ├── alignment_score │ │ ├── area │ │ ├── centroid-0 │ │ ├── centroid-1 │ │ ├── eccentricity │ │ ├── euler_number │ │ ├── label │ │ ├── major_axis_length │ │ ├── minor_axis_length │ │ └── orientation │ ├── fiber_stats │ │ ├── avg_alignment_score │ │ ├── avg_area │ │ ├── avg_eccentricity │ │ ├── avg_euler_number │ │ ├── avg_major_axis_length │ │ ├── avg_minor_axis_length │ │ ├── avg_orientation │ │ ├── fiber_density │ │ └── pixel_density │ ├── mixing_scores │ │ └── 0 │ ├── neighborhood_cell_type │ │ ├── APC │ │ ├── Bcell │ │ ├── CD14_monocyte │ │ ├── CD4T │ │ ├── CD8T │ │ ├── endothelium │ │ ├── immune_other │ │ ├── kmeans_neighborhood │ │ ├── M1_macrophage │ │ ├── M2_macrophage │ │ ├── Myofibroblast │ │ ├── other │ │ ├── stroma │ │ ├── tumor_ck17 │ │ └── tumor_ecad │ ├── neighborhood_counts │ │ ├── _index │ │ ├── APC │ │ ├── Bcell │ │ ├── CD14_monocyte │ │ ├── CD4T │ │ ├── CD8T │ │ ├── endothelium │ │ ├── immune_other │ │ ├── M1_macrophage │ │ ├── M2_macrophage │ │ ├── Myofibroblast │ │ ├── other │ │ ├── stroma │ │ ├── tumor_ck17 │ │ └── tumor_ecad │ ├── neighborhood_diversity │ │ └── cell_meta_cluster │ └── neighborhood_freqs │ ├── _index │ ├── APC │ ├── Bcell │ ├── CD14_monocyte │ ├── CD4T │ ├── CD8T │ ├── endothelium │ ├── immune_other │ ├── M1_macrophage │ ├── M2_macrophage │ ├── Myofibroblast │ ├── other │ ├── stroma │ ├── tumor_ck17 │ └── tumor_ecad ├── var │ └── _index │ └── 0 ├── varm │ └── kmeans │ └── 0.0 ├── varp └── X └── 0.0
Misc Notes
The actual data, matrices,
DataFrames
, dictionaries and all are stored in the leaf nodes of the hierarchy as binary data. The rest of the nodes are just metadata, providing column names (if applicable) and other information, like the encoding type and encoding versioning.Not sure where we can save the
AnnData
objects, perhaps under a directory calledtables
in the root of the cohort's dataset?It's also very much worth considering making use of
SquidPy
's tool set for single cell spatial analysis. We should read through their code and learn how they interface withAnnData
when designing future components / redesigning current ones.The text was updated successfully, but these errors were encountered: