Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zarr custom Reader? #453

Closed
vincentsarago opened this issue Nov 9, 2021 · 6 comments
Closed

Zarr custom Reader? #453

vincentsarago opened this issue Nov 9, 2021 · 6 comments

Comments

@vincentsarago
Copy link
Member

with the release of GDAL 3.4, the Zarr driver is now available by default, which mean that rasterio/rio-tiler can now open/read Zarr files.

Why do we need a custom reader

  • Zarr format can be quite different from a COG or other raster format because it's not restricted to 2D dataset.

If the Zarr dataset contains one single array with 2 dimensions, it will be exposed as a regular GDALDataset when using the classic raster API. If the dataset contains more than one such single array, or arrays with 3 or more dimensions, the driver will list subdatasets to access each array and/or 2D slices within arrays with 3 or more dimensions.

ref: https://gdal.org/drivers/raster/zarr.html#particularities-of-the-classic-raster-api

  • No Overviews

Because there is no overview (currently in dev) we cannot fetch lower resolution of the data and should restrict to one zoom level

@vincentsarago
Copy link
Member Author

Some notes after some good talks with Even at FOSS4G.

The support for Zarr dataset using rasterio seems quite complicated because rasterio doesn't implement the multidimensional API. If we end up pursuing using rasterio raster API the user experience will not be great and reading a Zarr dataset might results in a lot of useless dataset read.

It might be better to help in a creation of a rio-tiler plugin that could use zarr python lib or something else.

@adair-kovac
Copy link

Hi, I got to this thread because I was wondering if I could use titiler with a custom rio-tiler reader to expose some data that's stored in zarr on S3.

In my use case, the data isn't huge (x: 1799, y: 1059), do you think having or not having different zoom levels would be important with that size of data?

When it comes to the dimensionality of data, I've used rioxarray to convert this 2D weather analysis data and the 3D forecast versions to GeoTiff. For the 3D data, it automatically just puts the time dimension into different bands. I would naively expect for rio-tiler working on 2D or 3D zarr arrays to behave the same as if I had first used rioxarray to create GeoTiffs and then used rio-tiler on those.

Do you think it would be possible for me to write a custom reader that handles these simpler 2D/3D, single-variable zarr arrays, without worrying about all the multidimensionality possible in zarr?

@vincentsarago
Copy link
Member Author

👋

Do you think it would be possible for me to write a custom reader that handles these simpler 2D/3D, single-variable zarr arrays, without worrying about all the multidimensionality possible in zarr?

That's totally possible! but maybe not but using GDAL/Rasterio (because rasterio doesn't support the new GDAL multidimensional API.

I would be interested to see if we can use Zarr native library to read Zarr but I'm not sure about their geospatial support!

FYI: if you use the latest rasterio (1.3) wheels (with GDAL 3.5) you should be able to read Zarr natively:

  • if your data is 2D (x, y), you can use it directly with COGReader
  • if your data is 3D (x, y, z), then each Z dimension is represented has a subdataset in rasterio, so you need to pass the subdataset url ZARR:"myfile.zarr":/var1

@rabernat
Copy link
Contributor

There are (at least) two ways to read Zarr from python without GDAL / rasterio:

  • Just "raw" zarr-python
  • Via Xarray

but I'm not sure about their geospatial support!

What does "geospatial support" mean here? What specific features are needed? In general Zarr gives you access to the metadata. If that includes things like CRS, etc. it is accessible. It obviously doesn't implement any coordinate transforms or anything like that.

An interesting idea would be just to implement a generic Xarray reader. Then any format readable by Xarray (including Zarr, NetCDF, GRIB, etc.) would be possible.

@sgillies
Copy link
Contributor

@vincentsarago I'm giving up on hierarchical data in rasterio. See rasterio/rasterio#1759 (comment). It seems like a lot of wasted effort on redundant Python software. I know I have flip-flopped on this in the past, but now I believe that using zarr (python) or h5py is the way to go.

@vincentsarago
Copy link
Member Author

thanks for trying @sgillies. I commented on the issue directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants