Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BPT-155] Spatial Data Refactor #104

Merged
merged 71 commits into from
Apr 5, 2024
Merged

[BPT-155] Spatial Data Refactor #104

merged 71 commits into from
Apr 5, 2024

Conversation

dylanclam12
Copy link
Collaborator

@dylanclam12 dylanclam12 commented Aug 7, 2023

Refactoring Bento with SpatialData

  • change all file permissions back to 755
  • add summary list of changes
  • datasets module refactor
  • signatures module refactor
  • defer to native SpatialData API for query
  • change bento.geo.get_points astype parameter options to lowercase
  • change bento.geo.sindex_points unindexed points from "None" to empty string
  • in bento.tl.lp pull out gene expression array into variable so it is not queried twice
  • cleanup unused @ track, batch, copy
  • in bento.tl.fe make getting sparse matrix in csr format more readable
  • resolve SpatialData as dependency
  • validate cell raster points and fluxmap shapes using spatialdata parse functions.
  • convert flux and flux_embed in cell raster points to individual columns so they can be saved
  • unit tests

@dylanclam12 dylanclam12 self-assigned this Aug 7, 2023
@dylanclam12 dylanclam12 requested a review from ckmah August 7, 2023 21:34
@dylanclam12 dylanclam12 changed the title Spatialdata Refactor [BPT-155] Spatialdata Refactor Aug 7, 2023
@notion-workspace
Copy link

Port spatial features

@dylanclam12 dylanclam12 changed the title [BPT-155] Spatialdata Refactor [BPT-155] Port Spatial Features Aug 7, 2023
@dylanclam12 dylanclam12 changed the title [BPT-155] Port Spatial Features [BPT-155] Spatial Data Refactor Aug 7, 2023
Removed all files that have not been refactored with SpatialData
@dylanclam12 dylanclam12 force-pushed the spatialdata_refactor branch from 39153f8 to 1d582ac Compare August 8, 2023 17:43
Added a modified SpatialData query script that works around their intrinsic_axes issue.
Added a formatting function that reads a Spatial Data object to: sjoin points to shapes, sjoin shapes to shapes, and changes shape object indices to strings. Sjoin functions changed in geometry.
Refactor of _shape_features to read and save to SpatialData.
Refactor point features to read and save to SpatialData.
Refactoring lp function and implementing lp_diff for discrete phenotypes
Implemented lp_diff_continuous, caught phenotype error in lp_diff_discrete, and clipped infinite values to max for -log10p and -log10padj.
Added _neighborhoods.py with no changes
Added sync_points function to sync points dask dataframe to groud truth sdata.table anndata + minor fixes
Refactored flux and fluxmap functions to SpatialData. Minor changes to format data and sindex_points.
Removed dask functionality since there is no speed up in the point computations. Changed geometry functions and edited how they were used in io, shape features, point features, and flux.
Refactored _flux_enrichment.py and removed register_points functionality since we are placing cell_raster in a points element that is dynamic instead of leaving it in unstructured.
Refactored colocation.py which relies on _decomposition.py being refactored. Current _composition.py comp_diff function does not work correctly. Refactored to the point where we run across the same error.
Refactoring plotting script and all of its depending scripts: colors, layers, utils
Refactoring plotting scripts for localization patterns
Copy link
Collaborator

@ckmah ckmah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • change all file permissions back to 755
  • cleanup commented out imports and decorators (track)
  • add summary list of changes
  • unit tests

bento/datasets/_datasets.py Outdated Show resolved Hide resolved
bento/plotting/_signatures.py Outdated Show resolved Hide resolved
bento/query/__init__.py Outdated Show resolved Hide resolved
bento/query/_query.py Outdated Show resolved Hide resolved
bento/_utils.py Outdated Show resolved Hide resolved
bento/geometry/_geometry.py Outdated Show resolved Hide resolved
bento/geometry/_geometry.py Outdated Show resolved Hide resolved
bento/plotting/_lp.py Outdated Show resolved Hide resolved
bento/plotting/_plotting.py Outdated Show resolved Hide resolved
bento/tools/_flux_enrichment.py Outdated Show resolved Hide resolved
@dylanclam12
Copy link
Collaborator Author

dylanclam12 commented Jan 16, 2024

  • change all file permissions back to 755
  • cleanup commented out imports and decorators (track)
  • add summary list of changes
  • unit tests

Moved to PR Summary

#104 (comment)

- defer to native SpatialData API for query
- change bento.geo.get_points astype parameter options to lowercase
- change bento.geo.sindex_points unindexed points from "None" to empty string
- in bento.tl.lp pull out gene expression array into variable so it is not queried twice
- cleanup unused @ track, batch, copy
- in bento.tl.fe make getting sparse matrix in csr format more readable
@dylanclam12 dylanclam12 requested a review from ckmah January 16, 2024 21:47
pyproject.toml Show resolved Hide resolved
pyproject.toml Show resolved Hide resolved
used instance_key as an indicator of cell_shapes across all point features.
used {shape_key}_index as an indicator of whether a point is inside or outside a specified shape. Changed this because points element and the pulled in shape elements have columns with the same name. Changing the columns in the points element to have the _index suffix to differentiate.
Since we aren't holding all polygons that have been sjoined to cells in the cell shapes element, I added a join to pull those polygons into points_df. Points and shape elements will have similarly named columns so I am adding the _index suffix to the columns in the points element.
wrote test cases for point_features.py
moved reading and formatting sdata to setup. Condensed feature lists
moved test cases to a loop instead of listing them one by one
Changed PATTERN_FEATURES to match the new naming conventions (cell_boundaries_x and nucleus_boundaries_x)
Incorporated the use of instance_key into lp to be consistent with the rest of the package
Copy link
Collaborator

@ckmah ckmah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

edit: nvm didnt finish reviewing

Updated set_metadata documentation
changed flux to use instance_key instead of hard coding cell_boundaries
made the default value for gene feature_name to be consistent with the Xenium format like the rest of the package
test script for flux and fluxmap
changed overwriting cell_boundaires_raster to adding metadata with set_points_metadata
Changed all hard coded cell_raster to use the instance key
Test cases for flux enrichment functions
removed import random from test_flux.py and test_flux_enrichment.py
added instance_key and feature_key to colocation and _colocation_tensor
added instance_key, feature_key, points key to coloc_quotient, _cell_clq, and _clq_statistic
Tests for _colocation.py
Tests for _geometry.py
@ckmah ckmah marked this pull request as ready for review April 5, 2024 20:44
Copy link
Collaborator

@ckmah ckmah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, need to find new home for datasets that supports zarr streaming

@@ -87,13 +92,15 @@ def lp(sdata: SpatialData, groupby: str = "gene"):
sdata.table.uns["lp"] = indicator_df.reset_index()
sdata.table.uns["lpp"] = pattern_prob.reset_index()

def lp_stats(sdata: SpatialData):
def lp_stats(sdata: SpatialData, instance_key: str = "cell_boundaries"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should default value be set for instance_key?

@@ -106,28 +98,23 @@ def _colocation_tensor(data: AnnData, copy: bool = False):
tensor = s.todense()
print(tensor.shape)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove print

data: AnnData,
shapes: List[str] = ["cell_shape"],
sdata: SpatialData,
shapes: List[str] = ["cell_boundaries"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove as default, infer from points instance_key (I believe this is the static key we use for cells?)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm we can do this when we update tutorials

@@ -83,13 +84,13 @@ def fe(
Returns
-------
sdata : SpatialData
.points["cell_raster"]["flux_fe"] : DataFrame
.points["cell_boundaries_raster"]["flux_fe"] : DataFrame
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be {instance_key}_raster

@ckmah ckmah merged commit e5e193c into v2.1 Apr 5, 2024
@LucaMarconato
Copy link

🥳

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants