Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sample in pixel units, not CRS units #279

Closed
RitwikGupta opened this issue Dec 13, 2021 · 6 comments · Fixed by #294
Closed

Sample in pixel units, not CRS units #279

RitwikGupta opened this issue Dec 13, 2021 · 6 comments · Fixed by #294
Labels
datasets Geospatial or benchmark datasets samplers Samplers for indexing datasets
Milestone

Comments

@RitwikGupta
Copy link
Contributor

Related to #278, once our datasets are projected to UTM with fixed xres and yres, I want to sample a set amount of pixels, say 1024 x 1024. Current samplers sample in the unit of the CRS (in this case meters). Therefore, I have to set the sample size to desired_chip_size / xres_yres_of_imagery. While this works, a flag which lets the user choose which unit to sample in would be great.

@adamjstewart adamjstewart added datasets Geospatial or benchmark datasets samplers Samplers for indexing datasets labels Dec 13, 2021
@calebrob6
Copy link
Member

I'm fine with adding this as a feature -- this is especially annoying with datasets in units of degrees (we do this here https://github.com/microsoft/torchgeo/blob/main/benchmark.py#L143)

@RitwikGupta
Copy link
Contributor Author

Perhaps you can help me understand what the intended behavior is, based on that benchmark. When passing a size to the sampler, does it give you a patch that is (size, size) pixels large (so a Tensor that is [1, 1, size, size]), or is the intended behavior that it will give you a patch that is (size / res, size / res) pixels large?

Currently, I am getting the latter behavior (which seems to be the intended one.

from datasets import WatchRaster
import matplotlib.pyplot as plt
from torch.utils.data import DataLoader
from torchgeo.samplers import RandomGeoSampler
from torchgeo.datasets import stack_samples
import torch

ds = WatchRaster()
print(f'Dataset resolution: {ds.res}')
sampler = RandomGeoSampler(ds, size=512, length=5)
dl = DataLoader(ds, sampler=sampler, collate_fn=stack_samples)

for sample in dl:
    image = sample['image']
    print(image.shape)
Dataset resolution: 2.4
torch.Size([1, 1, 214, 214])
torch.Size([1, 1, 214, 214])
torch.Size([1, 1, 214, 214])
torch.Size([1, 1, 213, 213])
torch.Size([1, 1, 213, 213])

The proposed change could be to modify get_random_bounding_box in utils to add a pixelspace boolean flag. This would change maxx and maxy to be minx + t_size[1] * res.

@calebrob6
Copy link
Member

Yes, latter behavior is correct, and proposed change is correct! We may want to think about how we name the args in the samplers, options:

  • mutually exclusive size_in_pixels and size_in_crs_units args
  • size arg and pixelspace flag

@RitwikGupta
Copy link
Contributor Author

I like former better. Make it a required arg so that the user is forced to choose. What's the voting process for these changes? @adamjstewart

@RitwikGupta
Copy link
Contributor Author

The former can be implemented as an enum to make it explicit in the code. sampler(sample_mode=utils.SIZE_IN_PIXELS)

@adamjstewart
Copy link
Collaborator

I feel like there was a reason I originally chose to do things in CRS units, but I can't remember what that reason was. I'm fine with whatever. We could just only support sampling in pixel units.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets samplers Samplers for indexing datasets
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants