GridGeoSampler may never reach end of image #431

remtav · 2022-02-23T19:56:50Z

In ran into this issue after running multiple tests and double-checking why my inference script using GridGeoSampler never reached the end (right and bottom) of an image.

I'm using GridGeoSampler with an image of size 1591 x 2187 pixels, a chip size of 512 and stride of 256. For the sake of clarity, let's just say my image is 600 x 600 (pixel units or crs units doesn't matter here -> just assume minimum bounds in x and y are 0 if you're thinking crs units):

I want GridGeoSampler to sample my 600 x 600 image in chips of 512x512, with a stride of 256.
The total number of row and columns as calculated in the __init__ and __iter_ of GridGeoSampler will be:
rows = columns = (600 - 512) // 256 +1 = 1

Therefore, the __iter__ function will use two for loops with range(rows) and range (columns). Since those two variables are 1, only 1 512x512 sample will be returned by the sampler, thus ignoring 88 (600-512) pixels of width and height where valid values existed in my image.

To adapt the current implementation, I believe rows and columns should be calculated this way:
( image width - chip size + stride size ) // stride size + 1
or in the code, for rows:
rows = int((bounds.maxy - bounds.miny - self.size[0] + self.stride[0]) // self.stride[0]) + 1

The text was updated successfully, but these errors were encountered:

adamjstewart · 2022-02-24T05:15:53Z

As a result of #376, GridGeoSampler will also skip areas smaller than the patch size entirely.

I think the issue you reported is easy to fix, we just have to be careful not to go out of bounds of the image. So the minx/miny will need to be adjusted with a smaller stride for the last sample per row/col.

remtav · 2022-02-24T13:39:53Z

As a result of #376, GridGeoSampler will also skip areas smaller than the patch size entirely.

I think the issue you reported is easy to fix, we just have to be careful not to go out of bounds of the image. So the minx/miny will need to be adjusted with a smaller stride for the last sample per row/col.

Hadn't though of the "smaller stride at the end of image" solution. I guess this ensures the chip_size is respected and could definitely work. The only drawback is that this changes the expected behaviour of the sampler since the stride may not always be the size that was initially set (in my dummy example, the last stride would be 88 rather than the expected 256).

If a user (like me :) ) is stitching inference chips into a large array assuming the stride is constant (let's say he's using pixel units from #279 ), the result of the "smaller stride at the end of image" solution may be surprising. If this solution is chosen, I guess the user will have to make sure he/she is not naively assuming the stride will always be constant and make sure to rely on the bbox information returned by the sampler either directly from the __iter__ function of the sampler of from a __getitem__ as in RasterDataset.

An alternative that I'd consider is to keep a constant stride, but return a sample that is truncated (in my example: a 88 x 512 sample for the last column and vice versa for last row), leaving it to the user to use a pad_to function as is done in test_transforms on the Chesapeake dataset.

To synthesize:
Solution 1: "smaller stride at the end of image, but no change in chip size value"
Solution 2: "truncated chip at the end of image, but no change to stride size value"

In both cases, I'd issue a warning since either (1) the stride's value or (2) the chip size's value as inputted by the user will be forced to change at the limits of an image.

EDIT:
Just checked the discussion on issue #319. The conclusions of that discussion would logically lead to solution 1 rather than 2. I guess we'll have to assume the user is using the bbox returned by the sampler to perform stitching. I also see one potential flaw with 2: the grid sampling with GridGeoSampler may sample from right to left or bottom to top of an image if using CRS units. Therefore, the limit could potentially (and quite frequently) be at the top and left of image. If using a pad_to function as mentionned above, the padding function won't "know" the border the sample to pad is coming from and would naively assume (at least in the Cheseapeake implementation) last samples would be right and bottom borders. This, again, would cause problems at stitching since the padding value (generally 0) could overlap a region where valid values were predicted.

To help illustrate, I'll borrow from @adamjstewart's artwork :). Given two chips with a one pixel overlap, ie stride = chip_size - 1, where L: last, SL: second to last and 0: padding. Suppose these are being sampled from CRS units where maxx is on the left (pixel wise) of minx.

+-----+
|    0|
|  L 0|
|    0|
+-----+
+-----+
|     |
|  SL |
|     |
+-----+
+----+-+----+
|    |x|    |
|  L |x| SL |  --> potential stitching problems
|    |x|    |
+----+-+----+

adamjstewart · 2022-02-24T22:04:16Z

Also see #30. Part of the reason that GeoDatasets return the CRS and bbox is so you can stitch things back together easily even if the stride changes for the last image. Of course, we still need to actually write stitching utilities to use this information.

calebrob6 · 2022-02-26T01:51:42Z

Thanks a ton for the bug report @remtav!

Here's a notebook that gives a self-contained example of:

running a model on a RasterDataset made up of a single tile with GridGeoSampler (the model is a pre-trained land cover model I have laying around and the imagery is NAIP 2013 imagery from Maryland)
stitching the results together in the inference loop
visualizing the input/output

https://gist.github.com/calebrob6/7b226eb73877187f85fb5e1621bb7971

There is definitely something wrong here, but I haven't parsed it yet.

add first draft of datasets/ccmeo.py and datamodules/ccmeo.py chesapeake_cvpr.yaml: add trainer parameters for testing segmentation.py: (WIP) add a BinarySemanticSegmentationTask tested with Spacenet single.py: add GridGeoSamplerPlus with temporary solution to microsoft#431

adamjstewart added the samplers Samplers for indexing datasets label Feb 24, 2022

adamjstewart added this to the 0.2.1 milestone Feb 24, 2022

remtav mentioned this issue Mar 2, 2022

GridGeoSampler: Sample beyond roi limit if needed #448

Closed

adamjstewart modified the milestones: 0.2.1, 0.2.2 Mar 19, 2022

This was referenced Jun 28, 2022

GridGeoSampler: Sample beyond roi limit if needed #629

Closed

GridGeoSampler: change stride of last patch to sample entire ROI #630

Merged

adamjstewart modified the milestones: 0.2.2, 0.3.0 Jul 2, 2022

adamjstewart modified the milestones: 0.3.0, 0.3.1 Jul 9, 2022

adamjstewart closed this as completed in #630 Sep 3, 2022

mpelchat04 mentioned this issue Oct 6, 2023

GridGeoSampler on areas smaller than patch size #1630

Open

DimitrisMantas mentioned this issue Apr 10, 2024

Add Inference Example #1991

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GridGeoSampler may never reach end of image #431

GridGeoSampler may never reach end of image #431

remtav commented Feb 23, 2022 •

edited

Loading

adamjstewart commented Feb 24, 2022

remtav commented Feb 24, 2022 •

edited

Loading

adamjstewart commented Feb 24, 2022

calebrob6 commented Feb 26, 2022 •

edited

Loading

GridGeoSampler may never reach end of image #431

GridGeoSampler may never reach end of image #431

Comments

remtav commented Feb 23, 2022 • edited Loading

adamjstewart commented Feb 24, 2022

remtav commented Feb 24, 2022 • edited Loading

adamjstewart commented Feb 24, 2022

calebrob6 commented Feb 26, 2022 • edited Loading

remtav commented Feb 23, 2022 •

edited

Loading

remtav commented Feb 24, 2022 •

edited

Loading

calebrob6 commented Feb 26, 2022 •

edited

Loading