Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling a ROI when loading raster data #642

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

vschaffn
Copy link
Contributor

@vschaffn vschaffn commented Dec 17, 2024

Resolves GlacioHack/xdem#602
Resolves #645

Description

This PR introduces the functionality for handling a Region of Interest (ROI) in pixels when loading raster data.

Changes

  • Added Raster.icrop() method to handle ROI in pixels.

Test

  • Added test_icrop() in test_raster to ensure the good behavior when cropping a raster

Doc

  • Added Raster.icrop() along with Raster.crop() in doc, and making it clear that the crops are also functional when the Raster is not loaded into memory.

@rhugonnet
Copy link
Member

Great addition, thanks @vschaffn! 🙂

I have a main remark about the roi addition to _load_rio() specifically:

As it happens, the crop() function already worked with the data not loaded on disk, by reading the ROI from a window in the same way you did (but it wasn't very clear from the documentation), see here:

window=final_window_disk,

In short, doing:

rst = Raster("file.tif")
rst_cropped = rst.crop(my_roi)

worked without loading the full dataset.

We had been debating about adding the windowed reading of crop directly to _load_rio() to avoid the loading of crop being separate from all other loading of the Raster class. This would allow to use an roi/crop_geom from __init__() or from_array() directly, just as you did.

Maybe this is the occasion to merge things together?
We could modify _crop to call your new _load_rio(roi=) (might require some small modifs following the code in _crop), and merge the roi/crop_geom arguments into one, to have everything consistent?

@rhugonnet
Copy link
Member

Additionally, by merging like this, all the existing tests we have for crop() would also help ensure it works properly through __init__ and from_array! So less work on tests 😛

@vschaffn
Copy link
Contributor Author

vschaffn commented Jan 23, 2025

Great addition, thanks @vschaffn! 🙂

I have a main remark about the roi addition to _load_rio() specifically:

As it happens, the crop() function already worked with the data not loaded on disk, by reading the ROI from a window in the same way you did (but it wasn't very clear from the documentation), see here:

window=final_window_disk,

In short, doing:

rst = Raster("file.tif")
rst_cropped = rst.crop(my_roi)

worked without loading the full dataset.

We had been debating about adding the windowed reading of crop directly to _load_rio() to avoid the loading of crop being separate from all other loading of the Raster class. This would allow to use an roi/crop_geom from __init__() or from_array() directly, just as you did.

Maybe this is the occasion to merge things together? We could modify _crop to call your new _load_rio(roi=) (might require some small modifs following the code in _crop), and merge the roi/crop_geom arguments into one, to have everything consistent?

@rhugonnet I had not noticed that the crop() method already worked with the data not loaded on disk. After studying _crop() function, I think the simplest thing is to restore _load_rio() without ROI, and to use crop, as it is, before _load_rio() if a ROI is passed when the raster is initialised. This will effectively mean that there is only one function dealing with the ROI, and therefore simplified code maintenance 😃

@rhugonnet
Copy link
Member

@vschaffn Yes that's true, this would work as well 🙂. The difference would be only internal to make _crop depend on _load_rio(), so we don't need to do it (could at a later stage if we feel it is useful to avoid having independent loading calls, but that's not a priority).

@adebardo
Copy link

adebardo commented Jan 27, 2025

Okay for me, we can open an issue for the _crop dependance. #645

@adehecq
Copy link
Member

adehecq commented Jan 29, 2025

Hi all,
Thanks for implementing this!

I'm slowly trying to catch up, and sorry if some of it should have come up earlier during the conception...
As raised by rhugonnet, it would have helped if you had known that crop was already working on the unloaded dataset (in general, we only load the data when strictly needed, i.e., for printing the data, changing the data or plotting, otherwise only metadata are loaded). Sorry, we could have clarified this earlier if we had been more available.
Considering this, I wonder if it would not be easier to update the method crop to take a geometry in pixels? Then either simply suggest people to use the crop method directly, or have an argument crop to Raster to enable direct cropping. In either case, I am a bit concerned that having an "ROI" functionality and "crop" functionality may create some confusion as both are somewhat redundant. What do you think @rhugonnet ?

One more thing to consider. We should make sure that crop and ROI (if kept separate) behave the same way when the ROI is partially or fully out of the raster bounds. I had to check with crop, but it looks like it does not raise any warning or error if the bound is larger (but silently crops to the maximum extent) and raises an error if there is no intersection at all between the two. See example below. Maybe we should raise a warning when the requested crop partially overlap and an error when it does not overlap at all?
I made a comment directly in the code, but I think these edge cases should be tested in the tests, to make sure we don't end up with unexpected errors.

import geoutils as gu
img = gu.Raster(gu.examples.get_path("everest_landsat_b4"))
print(img.bounds)
# -> BoundingBox(left=478000.0, bottom=3088490.0, right=502000.0, top=3108140.0)

# Test with crop geometry too large on the right
img_crop = img.crop((478000, 3088490, 502100, 3108140), inplace=False)
print(img_crop.bounds)
# -> same extent as original

# Test with crop geometry outside of bounds 
img_crop = img.crop((502000, 3088490, 502500, 3108140), inplace=False)
# -> raises a rasterio WindowError -> we could actually catch this earlier and raise a clearer error 

@rhugonnet
Copy link
Member

What do you think @rhugonnet ?

I agree, we should make the naming and behaviour fully consistent between Raster(roi=) and Raster.crop(crop_geom=), as they are the same functionality.
I'd suggest simply renaming roi into crop_geom in Raster.__init__. Then, as @vschaffn already proposed, calling crop(crop_geom=) directly. This will ensure everything is the same, and rely on all existing tests to reduce maintenance of this addition 😉 .

@rhugonnet
Copy link
Member

rhugonnet commented Feb 10, 2025

Link towards Rioxarray nomenclature for the same functionality clip and clip_box: https://corteva.github.io/rioxarray/stable/rioxarray.html#rioxarray.raster_array.RasterArray.clip

This could be the occasion to define a __geo_interface__ for the Raster, Vector objects etc to always consistently access the bounding box.
See reference here for the geo interface: https://gist.github.com/sgillies/2217756

@adebardo
Copy link

image

Copy link

@adebardo adebardo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can put the copyright CNES.

Copy link

@adebardo adebardo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THE CI is failing because of the documentation

Copy link

@adebardo adebardo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the copyright CNES is added, the CI working and @adehecq or @rhugonnet agreed the changes please feel free to merge during this week.

@vschaffn
Copy link
Contributor Author

@adebardo I fixed the documentation issue in the CI, and I added CNES copyright to raster.py file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[POC] Refactor CROP and load_rio() [POC] Simplify the creation of an ROI for a user.
4 participants