You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
From looking at some examples, it appears that data is always loaded to float64 arrays. For example in https://github.com/gjoseph92/stackstac/blob/5f984b211993380955b5d3f9eba3f3e285f6952c/examples/show.ipynb, loading the RGB bands of a Sentinel 2 asset (rgb = stack.sel(band=["B04", "B03", "B02"]).persist() ) creates an xarray dataset of type float64. It seems to me that you could improve performance (or at least memory usage) if you were able to use a smaller data type when possible.
You could look at the raster:bands object if it exists to optimize the xarray data type. If the extension doesn't exist, or if the bands have mixed dtypes, then fall back to float64?
The text was updated successfully, but these errors were encountered:
Using float64 by default was an intentional choice because
raster:bands didn't exist when I wrote everything a few months ago, so there was no way to know without actually fetching data what the native dtype of the asset would be. But we have to know that ahead of time to correctly construct the dask array. So float64 seemed like the safest default, since anything else could lose precision.
rescale=True by default, which uses the scale_offset metadata defined within each GeoTIFF (not known within the STAC metadata) to apply rescaling. So even if the asset were uint16 to begin with, it could become float64 after applying rescaling—yet another reason why that default made sense.
However from what I've seen, nobody really sets the scale_offset metadata at the GeoTIFF level, so I think this might be reasonable to remove. It would make thinking about dtypes a lot easier.
Note that you can control the dtype using the dtype= parameter to stackstac.stack. You'll also want to set rescale=False if doing this, as noted in the docs.
I'd really like to make this automatic though. I think raster:bands is the missing link to allow us to do that. Having data_type, scale, offset, and nodata in metadata really changes the game!
From looking at some examples, it appears that data is always loaded to float64 arrays. For example in https://github.com/gjoseph92/stackstac/blob/5f984b211993380955b5d3f9eba3f3e285f6952c/examples/show.ipynb, loading the RGB bands of a Sentinel 2 asset (
rgb = stack.sel(band=["B04", "B03", "B02"]).persist()
) creates an xarray dataset of type float64. It seems to me that you could improve performance (or at least memory usage) if you were able to use a smaller data type when possible.You could look at the raster:bands object if it exists to optimize the xarray data type. If the extension doesn't exist, or if the bands have mixed dtypes, then fall back to float64?
The text was updated successfully, but these errors were encountered: