Use smaller internal data format when possible #63

kylebarron · 2021-06-25T17:02:33Z

From looking at some examples, it appears that data is always loaded to float64 arrays. For example in https://github.com/gjoseph92/stackstac/blob/5f984b211993380955b5d3f9eba3f3e285f6952c/examples/show.ipynb, loading the RGB bands of a Sentinel 2 asset (rgb = stack.sel(band=["B04", "B03", "B02"]).persist() ) creates an xarray dataset of type float64. It seems to me that you could improve performance (or at least memory usage) if you were able to use a smaller data type when possible.

You could look at the raster:bands object if it exists to optimize the xarray data type. If the extension doesn't exist, or if the bands have mixed dtypes, then fall back to float64?

The text was updated successfully, but these errors were encountered:

gjoseph92 · 2021-06-25T17:36:24Z

Using float64 by default was an intentional choice because

raster:bands didn't exist when I wrote everything a few months ago, so there was no way to know without actually fetching data what the native dtype of the asset would be. But we have to know that ahead of time to correctly construct the dask array. So float64 seemed like the safest default, since anything else could lose precision.
rescale=True by default, which uses the scale_offset metadata defined within each GeoTIFF (not known within the STAC metadata) to apply rescaling. So even if the asset were uint16 to begin with, it could become float64 after applying rescaling—yet another reason why that default made sense.

However from what I've seen, nobody really sets the scale_offset metadata at the GeoTIFF level, so I think this might be reasonable to remove. It would make thinking about dtypes a lot easier.

Note that you can control the dtype using the dtype= parameter to stackstac.stack. You'll also want to set rescale=False if doing this, as noted in the docs.

I'd really like to make this automatic though. I think raster:bands is the missing link to allow us to do that. Having data_type, scale, offset, and nodata in metadata really changes the game!

gjoseph92 mentioned this issue Nov 29, 2021

Use data_type and nodata from raster extension if present #91

Open

gjoseph92 mentioned this issue Dec 5, 2022

"Assets must have exactly 1 band" #193

Closed

gjoseph92 mentioned this issue Jun 21, 2023

Scale offset from item asset #202

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use smaller internal data format when possible #63

Use smaller internal data format when possible #63

kylebarron commented Jun 25, 2021

gjoseph92 commented Jun 25, 2021

Use smaller internal data format when possible #63

Use smaller internal data format when possible #63

Comments

kylebarron commented Jun 25, 2021

gjoseph92 commented Jun 25, 2021