Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

query.bounding_box - differing axes sizes when querying #285

Closed
dylanclam12 opened this issue Jun 5, 2023 · 7 comments · Fixed by #409
Closed

query.bounding_box - differing axes sizes when querying #285

dylanclam12 opened this issue Jun 5, 2023 · 7 comments · Fixed by #409

Comments

@dylanclam12
Copy link
Contributor

I recently opened the Xenium rep 1 data and attempted to do a bounding box query for a cropped sdata and noticed that the "transcripts" Points element was not being cropped correctly.

cropped_sdata = sdata.query.bounding_box(axes=["x", "y"], min_coordinate= [12000, 12000], max_coordinate= [16000, 16000], target_coordinate_system="global")

Tracing through the bounding box code I found that this error occurred because the transcripts had 3D axes while the axes I inputted (axes=["x", "y"]) were in 2D. Because of this when the bounding box querying for Points found the intrinsic_axes from _get_bounding_box_corners_in_intrinsic_coordinates, it returned ["x", "y", "z"].

(intrinsic_bounding_box_corners, intrinsic_axes) = _get_bounding_box_corners_in_intrinsic_coordinates( element=points, axes=axes, min_coordinate=min_coordinate, max_coordinate=max_coordinate, target_coordinate_system=target_coordinate_system, )

It then filtered the points by x between 12000 and 16000, y between 12000 and 16000, and z between 0 to 0 with _bounding_box_mask_points which filtered out all points since all the z coordinates were not between 0 and 0.

in_intrinsic_bounding_box = _bounding_box_mask_points( points=points, axes=intrinsic_axes, min_coordinate=min_coordinate_intrinsic, max_coordinate=max_coordinate_intrinsic, )

simple fix for this would be to check the intrinsic axes against the inputed axes and, if there is a discrepancy, accept all values for an axis that is found in the intrinsic axes but not in the inputed axes.

@LucaMarconato
Copy link
Member

LucaMarconato commented Jun 5, 2023

Hi @dylanclam12, we noticed the bug and tracked it here: #231 but AFAIK nobody looked into into it yet, so thanks for reporting and in particular for the detailed explanation; we will now prioritize it.

@kevinyamauchi
Copy link
Collaborator

kevinyamauchi commented Jun 6, 2023

Thank you for the clear report, @dylanclam12 and for linking the issue, @LucaMarconato.

It's a bit unclear to me what the expected behavior is. Should we assume the axis labels are aligned? That is, in the example above, we would assume x_global -> x_intrinsic and y_global -> y_intrinsic, so we set the bounding box in the to span min(z_intrinsic) to max(z_intrinsic)? This works when the global axes and intrinsic axes are aligned , but doesn't when there is a rotation.

One option is to do the following for now to be able to move forward and then figure out what to do for that rotation case:

  1. expand the "unmapped" axes as described above when there isn't any rotation in the transformation
  2. raise an error when there is a rotation

Thoughts?

edit: another option is to add check for any dimensions of the bounding box with size 0 and then expand to have size 1.

@LucaMarconato
Copy link
Member

LucaMarconato commented Jun 6, 2023

Thanks @kevinyamauchi for looking into this. Since the bounding box is specified on the target space where the points are already transformed and not in the intrinsic space, I think that it's fine to interpret a missing axis as "keep everything in that axis", so if I am not missing something, I believe that having or not a rotation would not make a difference here and we can thus deal with a xy bounding box request as something that operates independently on the z.

Also, the polygon_query() function behaves exactly as described above (because geopandas ignores the z component).

A criticism of this it that it's maybe a bit ugly that the user can omit the z when querying 3D points, while for instance not being allowed to omit the x, but I think it's still fine to have this because the 3D data for points is often ignored in analyses anyway, so it's a "special" case.

@kevinyamauchi
Copy link
Collaborator

Okay, I can implement it to take the full range of any axes of the bounding box with size equal to zero. I am not sure this is the API we want to support in the long run, but if it gets us unstuck, then I'm okay with it for now.

Since the bounding box is specified on the target space where the points are already transformed and not in the intrinsic space, I think that it's fine to interpret a missing axis as "keep everything in that axis", so if I am not missing something, I believe that having or not a rotation would not make a difference here and we can thus deal with a xy bounding box request as something that operates independently on the z.

Basically, if there is a rotation between the query space axes and the intrinsic space axes, the 2D plane of the query space may have have of plane components in the intrinsic space (z axis in your example), which makes for what I find to be a non-intuitive bounding box. however, I don't think there is a clearly correct solution here, so we can just document

@berl
Copy link

berl commented Oct 26, 2023

@kevinyamauchi @LucaMarconato I also just ran into this in some Xenium data. My situation is a bit worse because I have 2D images, so if I define my bounding box with axes = ["x","y","z"] and appropriate min max values, it successfully subsets the points but fails on the images because ValueError: Invalid case. The bounding box axes are ['x', 'y', 'z'],the spatial axes in global are('y', 'x')

I think a good fix would avoid expanding the dimensionality of any element. depending on implementation, @kevinyamauchi's suggestion above could do it.

@LucaMarconato
Copy link
Member

LucaMarconato commented Jan 7, 2024

@dylanclam12 @berl this issue went out of my radar, but I just made a fix for it.

I covered all the cases: points, single/multiscale labels, single/multiscale images, with data being 2D/3D and bounding boxes being 2D/3D.

@LucaMarconato
Copy link
Member

Related, the follow up PR #422 also tests this for shapes, and greatly improves functionalities and robustness of polygon_query.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants