Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dtype validation when rescale=True #209

Merged
merged 13 commits into from
Sep 19, 2023

Conversation

RSchueder
Copy link
Contributor

This MR adds validation of the inputs to stackstac.stack to ensure that the output dtype is compatible when rescale=True. Rescaling requires that the output dtype is floating-point.

@RSchueder RSchueder changed the title dtype validation when rescale=True Add dtype validation when rescale=True Jul 28, 2023
@RSchueder
Copy link
Contributor Author

RSchueder commented Jul 28, 2023

@gjoseph92 I would like to assign you as a reviewer. I am hoping this mention enables this as per the solution here (I am more of a GitLab person myself).

Copy link
Owner

@gjoseph92 gjoseph92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @RSchueder! This looks like a good start. I was thinking to make the check conditional on the scale and offset values we actually read from STAC metadata, though. If you specify dtype=int, rescale=True, but all assets have scale=1, offset=0 (or no scale and offset at all), that shouldn't cause this error. Probably right after we pull out the values from metadata here:

asset_offset = raster_bands[0].get('offset', 0)

An easy check (for v in [scale, offset]) would just be np.can_cast(v, dtype). I think should catch:

  • Wrong dtype kind (int vs float)
  • Wrong dtype size (uint8 vs uint64)
  • Wrong dtype signing (uint8 vs int8) (this one is a very rare case, but also slightly tricky, since I suppose offset=-1 doesn't necessarily mean the array needs to be signed, so long as all data values are >0)

@@ -276,6 +276,9 @@ def stack(
The size of ``y`` and ``x`` will be determined by ``resolution`` and ``bounds``, which in many cases are
automatically computed from the items you pass in.
"""
if rescale:
assert "float" in np.dtype(dtype).name, "The requested dtype is incompatible with rescale=True, which requires a floating-point dtype."
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert "float" in np.dtype(dtype).name, "The requested dtype is incompatible with rescale=True, which requires a floating-point dtype."
assert np.dtype(dtype).kind in ("f", "c"), "The requested dtype is incompatible with rescale=True, which requires a floating-point dtype."

nit: using dtype.kind is probably the more proper way to write this test. Also, complex floats would be valid too (could come up for SAR data).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I was searching for a good way to do this and could not find one, this is handy.

@RSchueder
Copy link
Contributor Author

@gjoseph92 I am not sure we can ever have a (scale, offset) of dtype other than float because of:

https://github.com/RSchueder/stackstac/blob/20f7dbbfc5e7cd29e3b9faec20e8b16aea64a2d8/stackstac/prepare.py#L30

If this is true, isn't actually checking the value a bit redundant since we know it will be a float?

@RSchueder RSchueder requested a review from gjoseph92 August 8, 2023 18:42
@RSchueder
Copy link
Contributor Author

Okay, so even if the (scale, offset) is always technically (float, float), we only apply the value if it != (1,0). Therefore, I now only do the check on scale != 1 and offset != 0. Hopefully this makes sense.

@gjoseph92
Copy link
Owner

Sorry for the delay @RSchueder. I'd still like to move the check around here

asset_offset = raster_bands[0].get('offset', 0)

That way, you get the error immediately, instead of having to wait until compute.

@RSchueder
Copy link
Contributor Author

RSchueder commented Sep 19, 2023

Now worries @gjoseph92!

Gotcha, I didn't want to modify the signature of prepare_items but I have now implemented the change in prepare_items.

Copy link
Owner

@gjoseph92 gjoseph92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @RSchueder! I appreciate thinking about not modifying the signature. Since that's an internal function, it didn't matter much in this case.

I just pushed up a slight clarification to the error message you wrote. Looks good!

@gjoseph92 gjoseph92 merged commit 705fd70 into gjoseph92:main Sep 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants