Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi table support #455

Merged
merged 32 commits into from
Mar 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
33d8320
initial tests multi_table design (#405)
melonora Nov 27, 2023
75d66f1
Multi table (#410)
melonora Jan 17, 2024
a5f01b5
Enforce instance key to be dtype int (#444)
melonora Jan 29, 2024
8813e70
Join elements table (#445)
melonora Feb 13, 2024
3e88351
Merge branch 'main' into multi_table
LucaMarconato Feb 19, 2024
055e549
fix tests
LucaMarconato Feb 19, 2024
c6ae76a
fix docs
LucaMarconato Feb 19, 2024
b8ea945
add possibility for custom table name (#459)
melonora Feb 19, 2024
e05ba1a
Update locate values (#460)
melonora Feb 19, 2024
ea0989d
Filter table annotate (#462)
melonora Feb 20, 2024
f03ba37
wip get_centroids
LucaMarconato Feb 22, 2024
0c7293b
implemented get_centroids()
LucaMarconato Feb 22, 2024
5f045db
Merge branch 'main' into multi_table
LucaMarconato Feb 22, 2024
569a528
Merge branch 'feature/get_centroids' into multi_table
LucaMarconato Feb 22, 2024
b0f1710
made _assert_spatialdata_objects_seem_identical() into a util
LucaMarconato Feb 22, 2024
a873a33
fix docs, attemp
LucaMarconato Feb 25, 2024
6fd4d8a
Merge branch 'main' into multi_table
LucaMarconato Feb 25, 2024
0685fb7
allow table to be None in get_values and _locate_values (#466)
melonora Feb 25, 2024
5248d3e
Added `validate_table_annotation_target()` (#468)
LucaMarconato Feb 26, 2024
5cc1347
silence warning
melonora Mar 6, 2024
a09ea49
fix gettin dtype from multiscale
melonora Mar 12, 2024
b3571ef
add else dtype back
melonora Mar 12, 2024
5392ea1
silence scipy.misc.face deprecation
melonora Mar 13, 2024
b3ca213
Merge branch 'main' into multi_table
LucaMarconato Mar 13, 2024
3386b3d
Operation `to_circles()` (#473)
LucaMarconato Mar 13, 2024
0c1e339
deedcopy() utils function (#480)
LucaMarconato Mar 13, 2024
9c20ee6
fix bug deepcopy() of points wrong columns order
LucaMarconato Mar 14, 2024
b6897a8
workaround wrong order points columns after deepcopy
LucaMarconato Mar 14, 2024
09e339e
rechunking raster data after spatial query (#479)
LucaMarconato Mar 14, 2024
a2970d3
Test joins with string indices and instance id (#485)
melonora Mar 14, 2024
e67ab47
cleanup tests
melonora Mar 14, 2024
57e9f61
remove comments
melonora Mar 14, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 49 additions & 7 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,27 +12,69 @@ and this project adheres to [Semantic Versioning][].

### Added

#### Major

- Implemented support in SpatialData for storing multiple tables. These tables can annotate a SpatialElement but not
necessarily so.
- Added SQL like joins that can be executed by calling one public function `join_sdata_spatialelement_table`. The
following joins are supported: `left`, `left_exclusive`, `right`, `right_exclusive` and `inner`. The function has
an option to match rows. For `left` only matching `left` is supported and for `right` join only `right` matching of
rows is supported. Not all joins are supported for `Labels` elements.
- Added function `match_element_to_table` which allows the user to perform a right join of `SpatialElement`(s) with a
table with rows matching the row order in the table.
- Increased in-memory vs on-disk control: changes performed in-memory (e.g. adding a new image) are not automatically
performed on-disk.

#### Minor

- Added public helper function get_table_keys in spatialdata.models to retrieve annotation information of a given
table.
- Added public helper function check_target_region_column_symmetry in spatialdata.models to check whether annotation
metadata in table.uns['spatialdata_attrs'] corresponds with respective columns in table.obs.
- Added function validate_table_in_spatialdata in SpatialData to validate the annotation target of a table being
present in the SpatialData object.
- Added function get_annotated_regions in SpatialData to get the regions annotated by a given table.
- Added function get_region_key_column in SpatialData to get the region_key column in table.obs.
- Added function get_instance_key_column in SpatialData to get the instance_key column in table.obs.
- Added function set_table_annotates_spatialelement in SpatialData to either set or change the annotation metadata of
a table in a given SpatialData object.
- Added table_name parameter to the aggregate function to allow users to give a custom table name to table resulting
from aggregation.
- Added table_name parameter to the get_values function.
- Added tables property in SpatialData.
- Added tables setter in SpatialData.
- Added gen_spatial_elements generator in SpatialData to generate the SpatialElements in a given SpatialData object.
- Added gen_elements generator in SpatialData to generate elements of a SpatialData object including tables.
- added SpatialData.subset() API
- added SpatialData.locate_element() API
- added transform_to_data_extent()
- added utils function: transform_to_data_extent()
- added utils function: are_extents_equal()
- added utils function: postpone_transformation()
- added utils function: remove_transformations_to_coordinate_system()
- added utils function: get_centroids()
- added utils function: deepcopy()
- added operation: to_circles()
- added testing utilities: assert_spatial_data_objects_are_identical(), assert_elements_are_identical(),
assert_elements_dict_are_identical()

### Minor
### Changed

- improved usability and robustness of sdata.write() when overwrite=True @aeisenbarth
#### Major

- refactored data loader for deep learning

#### Minor

- Changed the string representation of SpatialData to reflect the changes in regard to multiple tables.

### Fixed

#### Major

- improved usability and robustness of sdata.write() when overwrite=True @aeisenbarth
- generalized queries to any combination of 2D/3D data and 2D/3D query region #409
- fixed warnings for categorical dtypes in tables in TableModel and PointsModel

#### Minor

- refactored data loader for deep learning

## [0.0.14] - 2023-10-11

### Added
Expand Down
18 changes: 18 additions & 0 deletions docs/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,10 +28,14 @@ Operations on `SpatialData` objects.
get_values
get_extent
get_centroids
join_sdata_spatialelement_table
match_element_to_table
get_centroids
match_table_to_element
concatenate
transform
rasterize
to_circles
aggregate
```

Expand All @@ -43,6 +47,7 @@ Operations on `SpatialData` objects.

unpad_raster
are_extents_equal
deepcopy
```

## Models
Expand Down Expand Up @@ -139,3 +144,16 @@ The transformations that can be defined between elements and coordinate systems
save_transformations
get_dask_backing_files
```

## Testing utilities

```{eval-rst}
.. currentmodule:: spatialdata.testing

.. autosummary::
:toctree: generated

assert_spatial_data_objects_are_identical
assert_elements_are_identical
assert_elements_dict_are_identical
```
9 changes: 5 additions & 4 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -133,10 +133,11 @@
html_title = project_name
html_logo = "_static/img/spatialdata_horizontal.png"

# html_theme_options = {
# "repository_url": repository_url,
# "use_repository_button": True,
# }
html_theme_options = {
"navigation_with_keys": True,
# "repository_url": repository_url,
# "use_repository_button": True,
}

pygments_style = "default"

Expand Down
4 changes: 3 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,8 @@ dependencies = [
"xarray-spatial>=0.3.5",
"tqdm",
"fsspec<=2023.6",
"dask<=2024.2.1"
"dask<=2024.2.1",
"pooch",
]

[project.optional-dependencies]
Expand All @@ -58,6 +59,7 @@ docs = [
# For notebooks
"ipython>=8.6.0",
"sphinx-copybutton",
"sphinx-pytest",
]
test = [
"pytest",
Expand Down
13 changes: 12 additions & 1 deletion src/spatialdata/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,14 @@
"dataloader",
"concatenate",
"rasterize",
"to_circles",
"transform",
"aggregate",
"bounding_box_query",
"polygon_query",
"get_values",
"join_sdata_spatialelement_table",
"match_element_to_table",
"match_table_to_element",
"SpatialData",
"get_extent",
Expand All @@ -31,17 +34,25 @@
"save_transformations",
"get_dask_backing_files",
"are_extents_equal",
"deepcopy",
]

from spatialdata import dataloader, models, transformations
from spatialdata._core._deepcopy import deepcopy
from spatialdata._core.centroids import get_centroids
from spatialdata._core.concatenate import concatenate
from spatialdata._core.data_extent import are_extents_equal, get_extent
from spatialdata._core.operations.aggregate import aggregate
from spatialdata._core.operations.rasterize import rasterize
from spatialdata._core.operations.transform import transform
from spatialdata._core.operations.vectorize import to_circles
from spatialdata._core.query._utils import circles_to_polygons, get_bounding_box_corners
from spatialdata._core.query.relational_query import get_values, match_table_to_element
from spatialdata._core.query.relational_query import (
get_values,
join_sdata_spatialelement_table,
match_element_to_table,
match_table_to_element,
)
from spatialdata._core.query.spatial_query import bounding_box_query, polygon_query
from spatialdata._core.spatialdata import SpatialData
from spatialdata._io._utils import get_dask_backing_files, save_transformations
Expand Down
105 changes: 105 additions & 0 deletions src/spatialdata/_core/_deepcopy.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
from __future__ import annotations

from copy import deepcopy as _deepcopy
from functools import singledispatch

from anndata import AnnData
from dask.array.core import Array as DaskArray
from dask.array.core import from_array
from dask.dataframe.core import DataFrame as DaskDataFrame
from geopandas import GeoDataFrame
from multiscale_spatial_image import MultiscaleSpatialImage
from spatial_image import SpatialImage

from spatialdata._core.spatialdata import SpatialData
from spatialdata._utils import multiscale_spatial_image_from_data_tree
from spatialdata.models._utils import SpatialElement
from spatialdata.models.models import Image2DModel, Image3DModel, Labels2DModel, Labels3DModel, PointsModel, get_model


@singledispatch
def deepcopy(element: SpatialData | SpatialElement | AnnData) -> SpatialData | SpatialElement | AnnData:
"""
Deepcopy a SpatialData or SpatialElement object.

Deepcopy will load the data in memory. Using this function for large Dask-backed objects is discouraged. In that
case, please save the SpatialData object to a different disk location and read it back again.

Parameters
----------
element
The SpatialData or SpatialElement object to deepcopy

Returns
-------
A deepcopy of the SpatialData or SpatialElement object

Notes
-----
The order of the columns for a deepcopied points element may be differ from the original one, please see more here:
https://github.com/scverse/spatialdata/issues/486
"""
raise RuntimeError(f"Wrong type for deepcopy: {type(element)}")


# In the implementations below, when the data is loaded from Dask, we first use compute() and then we deepcopy the data.
# This leads to double copying the data, but since we expect the data to be small, this is acceptable.
@deepcopy.register(SpatialData)
def _(sdata: SpatialData) -> SpatialData:
elements_dict = {}
for _, element_name, element in sdata.gen_elements():
elements_dict[element_name] = deepcopy(element)
return SpatialData.from_elements_dict(elements_dict)


@deepcopy.register(SpatialImage)
def _(element: SpatialImage) -> SpatialImage:
model = get_model(element)
if isinstance(element.data, DaskArray):
element = element.compute()
if model in [Image2DModel, Image3DModel]:
return model.parse(element.copy(deep=True), c_coords=element["c"]) # type: ignore[call-arg]
assert model in [Labels2DModel, Labels3DModel]
return model.parse(element.copy(deep=True))


@deepcopy.register(MultiscaleSpatialImage)
def _(element: MultiscaleSpatialImage) -> MultiscaleSpatialImage:
# the complexity here is due to the fact that the parsers don't accept MultiscaleSpatialImage types and that we need
# to convert the DataTree to a MultiscaleSpatialImage. This will be simplified once we support
# multiscale_spatial_image 1.0.0
model = get_model(element)
for key in element:
ds = element[key].ds
assert len(ds) == 1
variable = ds.__iter__().__next__()
if isinstance(element[key][variable].data, DaskArray):
element[key][variable] = element[key][variable].compute()
msi = multiscale_spatial_image_from_data_tree(element.copy(deep=True))
for key in msi:
ds = msi[key].ds
variable = ds.__iter__().__next__()
msi[key][variable].data = from_array(msi[key][variable].data)
element[key][variable].data = from_array(element[key][variable].data)
assert model in [Image2DModel, Image3DModel, Labels2DModel, Labels3DModel]
model().validate(msi)
return msi


@deepcopy.register(GeoDataFrame)
def _(gdf: GeoDataFrame) -> GeoDataFrame:
new_gdf = _deepcopy(gdf)
# temporary fix for https://github.com/scverse/spatialdata/issues/286.
new_attrs = _deepcopy(gdf.attrs)
new_gdf.attrs = new_attrs
return new_gdf


@deepcopy.register(DaskDataFrame)
def _(df: DaskDataFrame) -> DaskDataFrame:
return PointsModel.parse(df.compute().copy(deep=True))


@deepcopy.register(AnnData)
def _(adata: AnnData) -> AnnData:
return adata.copy()
12 changes: 12 additions & 0 deletions src/spatialdata/_core/_elements.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
from typing import Any
from warnings import warn

from anndata import AnnData
from dask.dataframe.core import DataFrame as DaskDataFrame
from datatree import DataTree
from geopandas import GeoDataFrame
Expand All @@ -20,6 +21,7 @@
Labels3DModel,
PointsModel,
ShapesModel,
TableModel,
get_axes_names,
get_model,
)
Expand Down Expand Up @@ -103,3 +105,13 @@ def __setitem__(self, key: str, value: DaskDataFrame) -> None:
raise TypeError(f"Unknown element type with schema: {schema!r}.")
PointsModel().validate(value)
super().__setitem__(key, value)


class Tables(Elements):
def __setitem__(self, key: str, value: AnnData) -> None:
self._check_key(key, self.keys(), self._shared_keys)
schema = get_model(value)
if schema != TableModel:
raise TypeError(f"Unknown element type with schema: {schema!r}.")
TableModel().validate(value)
super().__setitem__(key, value)
22 changes: 22 additions & 0 deletions src/spatialdata/_core/_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
from spatialdata._core.spatialdata import SpatialData


def _find_common_table_keys(sdatas: list[SpatialData]) -> set[str]:
"""
Find table keys present in more than one SpatialData object.

Parameters
----------
sdatas
A list of SpatialData objects.

Returns
-------
A set of common keys that are present in the tables of more than one SpatialData object.
"""
common_keys = set(sdatas[0].tables.keys())

for sdata in sdatas[1:]:
common_keys.intersection_update(sdata.tables.keys())

return common_keys
Loading
Loading