-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BPT-155] Spatial Data Refactor #104
Conversation
add download badge
Removed all files that have not been refactored with SpatialData
39153f8
to
1d582ac
Compare
Added a modified SpatialData query script that works around their intrinsic_axes issue.
Added a formatting function that reads a Spatial Data object to: sjoin points to shapes, sjoin shapes to shapes, and changes shape object indices to strings. Sjoin functions changed in geometry.
Refactor of _shape_features to read and save to SpatialData.
Refactor point features to read and save to SpatialData.
Refactoring lp function and implementing lp_diff for discrete phenotypes
Implemented lp_diff_continuous, caught phenotype error in lp_diff_discrete, and clipped infinite values to max for -log10p and -log10padj.
Added _neighborhoods.py with no changes
Added sync_points function to sync points dask dataframe to groud truth sdata.table anndata + minor fixes
Refactored flux and fluxmap functions to SpatialData. Minor changes to format data and sindex_points.
Removed dask functionality since there is no speed up in the point computations. Changed geometry functions and edited how they were used in io, shape features, point features, and flux.
Refactored _flux_enrichment.py and removed register_points functionality since we are placing cell_raster in a points element that is dynamic instead of leaving it in unstructured.
Refactored colocation.py which relies on _decomposition.py being refactored. Current _composition.py comp_diff function does not work correctly. Refactored to the point where we run across the same error.
Refactoring plotting script and all of its depending scripts: colors, layers, utils
Refactoring plotting scripts for localization patterns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- change all file permissions back to 755
- cleanup commented out imports and decorators (track)
- add summary list of changes
- unit tests
Moved to PR Summary |
#104 (comment) - defer to native SpatialData API for query - change bento.geo.get_points astype parameter options to lowercase - change bento.geo.sindex_points unindexed points from "None" to empty string - in bento.tl.lp pull out gene expression array into variable so it is not queried twice - cleanup unused @ track, batch, copy - in bento.tl.fe make getting sparse matrix in csr format more readable
used instance_key as an indicator of cell_shapes across all point features.
used {shape_key}_index as an indicator of whether a point is inside or outside a specified shape. Changed this because points element and the pulled in shape elements have columns with the same name. Changing the columns in the points element to have the _index suffix to differentiate.
Since we aren't holding all polygons that have been sjoined to cells in the cell shapes element, I added a join to pull those polygons into points_df. Points and shape elements will have similarly named columns so I am adding the _index suffix to the columns in the points element.
wrote test cases for point_features.py
moved reading and formatting sdata to setup. Condensed feature lists
moved test cases to a loop instead of listing them one by one
Changed PATTERN_FEATURES to match the new naming conventions (cell_boundaries_x and nucleus_boundaries_x)
Incorporated the use of instance_key into lp to be consistent with the rest of the package
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
edit: nvm didnt finish reviewing
Updated set_metadata documentation
changed flux to use instance_key instead of hard coding cell_boundaries
made the default value for gene feature_name to be consistent with the Xenium format like the rest of the package
test script for flux and fluxmap
changed overwriting cell_boundaires_raster to adding metadata with set_points_metadata
Changed all hard coded cell_raster to use the instance key
Test cases for flux enrichment functions
removed import random from test_flux.py and test_flux_enrichment.py
added instance_key and feature_key to colocation and _colocation_tensor
added instance_key, feature_key, points key to coloc_quotient, _cell_clq, and _clq_statistic
Tests for _colocation.py
Tests for _geometry.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, need to find new home for datasets that supports zarr streaming
@@ -87,13 +92,15 @@ def lp(sdata: SpatialData, groupby: str = "gene"): | |||
sdata.table.uns["lp"] = indicator_df.reset_index() | |||
sdata.table.uns["lpp"] = pattern_prob.reset_index() | |||
|
|||
def lp_stats(sdata: SpatialData): | |||
def lp_stats(sdata: SpatialData, instance_key: str = "cell_boundaries"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should default value be set for instance_key
?
@@ -106,28 +98,23 @@ def _colocation_tensor(data: AnnData, copy: bool = False): | |||
tensor = s.todense() | |||
print(tensor.shape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove print
data: AnnData, | ||
shapes: List[str] = ["cell_shape"], | ||
sdata: SpatialData, | ||
shapes: List[str] = ["cell_boundaries"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove as default, infer from points instance_key
(I believe this is the static key we use for cells?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm we can do this when we update tutorials
@@ -83,13 +84,13 @@ def fe( | |||
Returns | |||
------- | |||
sdata : SpatialData | |||
.points["cell_raster"]["flux_fe"] : DataFrame | |||
.points["cell_boundaries_raster"]["flux_fe"] : DataFrame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be {instance_key}_raster
🥳 |
Refactoring Bento with SpatialData
SpatialData
API for querybento.geo.get_points
astype parameter options to lowercasebento.geo.sindex_points
unindexed points from "None" to empty stringbento.tl.lp
pull out gene expression array into variable so it is not queried twicebento.tl.fe
make getting sparse matrix in csr format more readableSpatialData
as dependency