Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor polygon_query() #422

Merged
merged 29 commits into from
Feb 9, 2024
Merged

Refactor polygon_query() #422

merged 29 commits into from
Feb 9, 2024

Conversation

LucaMarconato
Copy link
Member

@LucaMarconato LucaMarconato commented Jan 7, 2024

Made on top of #409, which should be merged first.

Closes #255.

The PR does the following:

  • improved comments
  • polygon_query doesn't transform the data but back-transforms the polygon/multipolyon. This is essential for having good performance with multiscale raster data.
  • unifies the polygon_query and bounding_box_query APIs. In particular
    • Uses single dispatch for polygon_query()
    • Add sdata.query.polygon_query
  • makes the functionality of polygon_query complete
  • unifies and adds more tests, in particular:
    • tests polygon_query and bounding_box_query within the same test
    • checks that the results are identical
    • tests 3D bounding boxes with polygons
    • tests multipolygons
    • tests different coordinate systems via affine transformations
    • tests Dask dataframes with multiple partitions.
  • bounding_box_query for points now doesn't reset indices
  • the transformation of queried elements was passed by reference instead of copy, now fixed

Minor:

  • PointsModel.validate() (which is also called by the parser) now checks that no radius value is <= 0. Added also a test for this.

@LucaMarconato LucaMarconato changed the title wip Refactor polygon_query() Jan 7, 2024
Copy link

codecov bot commented Jan 7, 2024

Codecov Report

Attention: 10 lines in your changes are missing coverage. Please review.

Comparison is base (9f188f9) 91.78% compared to head (9db42a8) 92.13%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #422      +/-   ##
==========================================
+ Coverage   91.78%   92.13%   +0.34%     
==========================================
  Files          37       37              
  Lines        5052     5073      +21     
==========================================
+ Hits         4637     4674      +37     
+ Misses        415      399      -16     
Files Coverage Δ
src/spatialdata/_core/spatialdata.py 93.40% <100.00%> (+0.05%) ⬆️
src/spatialdata/models/models.py 86.36% <88.88%> (+0.14%) ⬆️
src/spatialdata/_core/query/spatial_query.py 95.73% <94.76%> (+2.81%) ⬆️

... and 2 files with indirect coverage changes

@LucaMarconato LucaMarconato self-assigned this Jan 12, 2024
@LucaMarconato LucaMarconato marked this pull request as ready for review January 16, 2024 10:56
Copy link
Member

@giovp giovp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small things around documentation

src/spatialdata/_core/query/spatial_query.py Outdated Show resolved Hide resolved
src/spatialdata/_core/query/spatial_query.py Outdated Show resolved Hide resolved
Comment on lines 199 to 207
axes_only_in_bb = set(axes) - set(output_axes_without_c)
axes_only_in_output = set(output_axes_without_c) - set(axes)

# let's remove from the bounding box whose axes that are not in the output axes (e.g. querying 2D points with a
# 3D bounding box)
indices_to_remove_from_bb = [axes.index(ax) for ax in axes_only_in_bb]
axes = tuple([ax for ax in axes if ax not in axes_only_in_bb])
min_coordinate = np.delete(min_coordinate, indices_to_remove_from_bb)
max_coordinate = np.delete(max_coordinate, indices_to_remove_from_bb)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

min_coordinate and max_coordinate are with respect to axes right? If so

idx_bb = np.in1d(axes, output_axes_without_c)
idx_out = np.in1d(output_axes_without_c, axes)

min_coordinate = min_coordinate[idx_out]
max_coordinate = max_coordinate[idx_out]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if yes, I would change axes to axes_bb and maybe output_axes_without_c to axes_out_without_x to anyway make them consistent

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, do you need to check (maybe done outside) that min/max coordinate matches in len with some axes (either bb or out)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

min_coordinate and max_coordinate are with respect to axes right? If so

idx_bb = np.in1d(axes, output_axes_without_c)
idx_out = np.in1d(output_axes_without_c, axes)

this still returns a boolean mask, so it's basically equivalent to what I wrote; I'll keep the old code.

min_coordinate = min_coordinate[idx_out]
max_coordinate = max_coordinate[idx_out]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if yes, I would change axes to axes_bb and maybe output_axes_without_c to axes_out_without_x to anyway make them consistent

good idea, I have renamed these two variables

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we check that it matches with bb, and we do it BoundingBoxRequest.__post_init__()

src/spatialdata/_core/query/spatial_query.py Outdated Show resolved Hide resolved
src/spatialdata/_core/query/spatial_query.py Show resolved Hide resolved
src/spatialdata/_core/query/spatial_query.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@kevinyamauchi kevinyamauchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @LucaMarconato . I think this looks pretty good. I've left some comments below, mostly related to code clarity.

I am okay with the refactors in test_spatial_query. I see that they likely reduce the amount of code. However, I think this is at the cost of being able to quickly understand the intention of the test. If that's the trade-off we want, I am fine with it. I would add some comments and docstrings to make it easier to understand though.

Comment on lines +1214 to +1215
Please see
:func:`spatialdata.bounding_box_query` for the complete docstring.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the reason for this change? so you don't have to maintain two docstrings? I think it's fine, but I'm curious

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you did the same below for polygon. I would keep in mind that this is going to be a really common way that users interact with the query interface. With this change, I don't think they will have access to the docstring when they are using an IDE (e.g., inspecting sdata.query.polygon()`

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, I would like to have it only in one place because it's easy to forget to update the other (happened). Is there any Sphinx trick we can use here? Asking also @giovp. For this PR I'll keep it like this because I'd like to merge some other PRs dependent on this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see also torch has something similar to what is in this PR: https://pytorch.org/docs/stable/generated/torch.Tensor.sum.html

tests/models/test_models.py Show resolved Hide resolved
src/spatialdata/_core/query/spatial_query.py Outdated Show resolved Hide resolved
src/spatialdata/_core/query/spatial_query.py Outdated Show resolved Hide resolved
src/spatialdata/_core/query/spatial_query.py Outdated Show resolved Hide resolved
src/spatialdata/_core/query/spatial_query.py Outdated Show resolved Hide resolved
src/spatialdata/_core/query/spatial_query.py Show resolved Hide resolved
@LucaMarconato LucaMarconato merged commit e0754fb into main Feb 9, 2024
9 checks passed
@LucaMarconato LucaMarconato deleted the refactor_polygon_bb branch February 9, 2024 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactoring polygon_query() and bounding_box_query()
3 participants