Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate with rioxarray and implement GMT_IMAGE to read RGB GeoTIFFs #1555

Open
2 of 5 tasks
weiji14 opened this issue Oct 1, 2021 · 5 comments
Open
2 of 5 tasks
Assignees
Labels
feature request New feature wanted help wanted Helping hands are appreciated

Comments

@weiji14
Copy link
Member

weiji14 commented Oct 1, 2021

Description of the desired feature

This is a parallel issue to #578 and #1442. While researching how to read GeoTIFFs into an xarray.DataArray, specifically GeoTIFFs with RGB bands stored in a single band but with a color lookup table, I discovered that there wasn't really a straightforward solution 😅 So this issue is mainly to jot down some notes.

1) Reading GeoTiff into xarray.DataArray

Method A: rioxarray.open_rasterio

Pros: Reads in data with proper dtype (e.g. uint8) and proper spatial_ref
Cons: rioxarray is a bit of a heavy dependency as it requires scipy (edit: may be resolved with corteva/rioxarray#413, edit2: rioxarray=0.8.0 has made scipy optional!)

import pygmt
import rioxarray

filename_or_obj = pygmt.which(fname="@earth_day_01d", download="a")

with rioxarray.open_rasterio(filename_or_obj) as dataarray:
    print(dataarray)

gives

<xarray.DataArray (band: 1, y: 180, x: 360)>
array([[[251, 251, ..., 251, 251],
        [251, 252, ..., 251, 252],
        ...,
        [ 68,  16, ...,  68, 217],
        [149, 149, ..., 149,  16]]], dtype=uint8)
Coordinates:
  * band         (band) int64 1
  * x            (x) float64 -179.5 -178.5 -177.5 -176.5 ... 177.5 178.5 179.5
  * y            (y) float64 89.5 88.5 87.5 86.5 ... -86.5 -87.5 -88.5 -89.5
    spatial_ref  int64 0
Attributes:
    scale_factor:  1.0
    add_offset:    0.0

Method B: xr.open_dataarray(..., engine="rasterio")

Pros: Requires only rasterio (a Pythonic GDAL wrapper) to be installed, which keeps things lightweight Easier to put into pygmt.load_dataarray as it only requires the 'engine' argument
Cons: Seems to read in data as float32 by default

Note: There was an xr.open_rasterio function in xarray v0.19.0, but it is pending deprecation, see pydata/xarray#4697, and removal PR at pydata/xarray#5808.

with xr.open_dataarray(filename_or_obj) as dataarray:
    print(dataarray)

produces

<xarray.DataArray 'band_data' (band: 1, y: 180, x: 360)>
array([[[251., 251., ..., 251., 251.],
        [251., 252., ..., 251., 252.],
        ...,
        [ 68.,  16., ...,  68., 217.],
        [149., 149., ..., 149.,  16.]]], dtype=float32)
Coordinates:
  * band         (band) int64 1
  * x            (x) float64 -179.5 -178.5 -177.5 -176.5 ... 177.5 178.5 179.5
  * y            (y) float64 89.5 88.5 87.5 86.5 ... -86.5 -87.5 -88.5 -89.5
    spatial_ref  int64 ...

TLDR My preference is to use (A) rioxarray.open_rasterio because it reads in data properly as uint8, with a fallback to (B) xr.open_dataarray(..., engine=rasterio) if only rasterio is installed.

2) Reading RGB colors from a GeoTIFF

Here's where things get a bit tricky. Neither rioxarray.open_rasterio nor xarray.open_dataarray reads in GeoTIFFs with color interpretation, i.e. RGB colors stored in a GeoTIFF will be read as a single band rather than 3 separate bands! This includes the @earth_day_01d and @earth_night_01d imagery. To be clear, non-color GeoTIFFs (e.g. DEMs, single band Landsat/Sentinel-2 images) are fine.

Good news is that the color interpretation mechanism is present in rasterio see https://github.com/mapbox/rasterio/blob/1.2.8/docs/topics/color.rst and rasterio/rasterio#100. This is wrapping around the GDAL Color Table stuff (https://gdal.org/user/raster_data_model.html#color-table)

Bad news is that we'll need to have that mechanism in rioxarray or xarray in order to use it (unless we want to go with a pure rasterio route). There is an issue open though at corteva/rioxarray#191, so it might be a matter of helping to open a Pull Request in those libraries to implement the feature.

3) Implementing GMT_IMAGE in PyGMT

So once we've solved (1) and (2) above, the next step would be, how to pass that 3-band RGB color xarray.DataArray into GMT? Max mentioned at #578 (comment) that we might need to use the GMT_IMAGE resource, but that hasn't been added into pygmt.clib as of PyGMT v0.4.1.

Probably need something like a virtualfile_from_image function, similar to virtualfile_from_grid (done in #159) at https://github.com/GenericMappingTools/pygmt/blame/v0.4.1/pygmt/clib/session.py#L1278

Are you willing to help implement and maintain this feature? Something to do for PyGMT v0.6.0 I think.

The steps I think would be to:

References:

@snowman2
Copy link

snowman2 commented Oct 1, 2021

Side note: Both A & B require rioxarray ref.

@weiji14
Copy link
Member Author

weiji14 commented Oct 1, 2021

Thanks for chiming in @snowman2! I see you've opened an issue with making scipy optional in rioxarray at corteva/rioxarray#413, that will really help slim things down since SciPy is a ~30MB dependency by itself.

Side note: Both A & B require rioxarray ref.

I stand corrected, it's only the xr.open_rasterio function which requires rasterio, but that looks to be deprecated soon following pydata/xarray#5808 so we won't use that.

@weiji14 weiji14 mentioned this issue Oct 23, 2021
18 tasks
@weiji14 weiji14 self-assigned this Nov 7, 2021
@weiji14 weiji14 modified the milestones: 0.6.0, 0.7.0 Mar 13, 2022
@seisman seisman modified the milestones: 0.7.0, 0.8.0 Jun 16, 2022
@maxrjones
Copy link
Member

@joa-quim how do you link the data to the GMT_IS_IMAGE family in GMT.jl? Is this done via GMT_Put_Matrix in the same way as for GMT_IS_GRID?

@joa-quim
Copy link
Member

I don't use GMT_Put_Matrix for grids (well maybe sometimes under certain circumstances but don't remember now). What I do for all types is to wrap the GMT type interface for each of them. So for images I do this, which wraps an image type. Images and grids differ for very little but aren't equal either. Again, interfacing with GMT native types: grids, images, datasets, cpts, ps, etc... is how we do the IO business in MEX and Julia, which has the further advantage of not requiring any writing/reading of temporary files.

@maxrjones
Copy link
Member

That makes sense, thanks for the explanation and sharing the link to the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature wanted help wanted Helping hands are appreciated
Projects
None yet
Development

No branches or pull requests

5 participants