Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🚸 Walkthrough on creating batches of data chips #20

Merged
merged 5 commits into from
Jul 16, 2022
Merged

Conversation

weiji14
Copy link
Owner

@weiji14 weiji14 commented Jun 18, 2022

Initial tutorial on creating batches of chipped data (256x256) from full-size satellite scenes (10000x10000) with different coordinate reference systems! Using example Sentinel-1 GRD GeoTIFFs over Osaka and Tokyo in Japan.

Preview at https://zen3geo--20.org.readthedocs.build/en/20/chipping.html

Osaka Tokyo
Sentinel-1 image over Osaka, Japan on 20220614 Sentinel-1 image over Tokyo, Japan on 20220616

https://planetarycomputer.microsoft.com/explore?c=137.1529%2C35.0944&z=7.94&v=2&d=sentinel-1-grd&s=false%3A%3A100%3A%3Atrue&ae=0&m=cql%3A08211c0dd907a5066c41422c75629d5f&r=VV%2C+VH+False-color+composite

TODO:

Initial draft tutorial on creating batches of chipped data from full-size satellite scenes! Will be working with Sentinel-1 GRD GeoTIFFs, let's see how far this will go.
@weiji14 weiji14 added the documentation Improvements or additions to documentation label Jun 18, 2022
@weiji14 weiji14 self-assigned this Jun 18, 2022
@weiji14 weiji14 added this to the 0.2.0 milestone Jun 18, 2022
Bump minimum pyogrio version from 0.4.0a1 to 0.4.0 and include new XbatcherSlicer feature! Also get the refactored RioXarrayReader in.
Walkthrough how to cut up a large satellite scene into multiple smaller chips of size 512 pixels by 512 pixels. Heavy lifting done by xbatcher which handles slicing along dimensions and overlapping strides. Needed a hacky workaround in XbatcherSlicer to fix a ValueError due to the xarray.DataArray name not being set (though it should be).
Fix readthedocs build failure because xbatcher was not installed.
Finalize tutorial by converting chips from xarray.Dataset to torch.Tensor and stacking them per mini-batch! Debated on whether to have the xarray collate function in the codebase, but let's wait for updates on xbatcher's end (xarray-contrib/xbatcher#71). Also renamed the tutorial file from batching to chipping and added more emojis to the intro section.
@weiji14 weiji14 changed the title 🚸 Walkthrough on creating batches of data 🚸 Walkthrough on creating batches of data chips Jul 16, 2022
@weiji14 weiji14 marked this pull request as ready for review July 16, 2022 04:12
@weiji14 weiji14 merged commit e89820a into main Jul 16, 2022
@weiji14 weiji14 deleted the batching branch July 16, 2022 04:18
weiji14 added a commit that referenced this pull request Sep 7, 2022
Just a random collection of mostly documentation-related patches. Patches type-hints in #52, isort imports in #35, mention functional name of IterDataPipe in walkthroughs #8 and #20, and remove mention of returned tuple to patch #33.

* 🏷️ Add specific type hints for mask_datapipe in geopandas.py

Should be either an xarray.DataArray or xarray.Dataset.

* 🚨 Sort spatialpandas imports in datashader.py

Ran isort to sort spatialpandas.geometry imports alphabetically. Also intersphinx linked the `.crs` attribute to geopandas.GeoDataFrame.crs.

* 💬 Mention functional name of IterDataPipe in walkthroughs

So people don't get confused on why the class-form like `Collator` is mentioned but `.collate` was used instead.

* 📝 Remove mention of tuple being returned in test_pyogrio_reader

Forgot to edit the unit test's docstring. Patches #33.

* 🍻 It's GeoPackage and GeoDataFrame, not GeoTIFF and DataArray

Need to be more careful when copying and pasting stuff.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant