Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential issues in STAC items for USGS 3DEP-seamless dataset #126

Open
vlandau opened this issue Nov 8, 2022 · 8 comments
Open

Potential issues in STAC items for USGS 3DEP-seamless dataset #126

vlandau opened this issue Nov 8, 2022 · 8 comments
Labels
bug Something isn't working

Comments

@vlandau
Copy link

vlandau commented Nov 8, 2022

I'm noticing inconsistencies with the USGS 3DEP elevation data when trying to load tiles into DataArrays via stackstac. I posted an issue in the stackstac repo but I'm starting to think the issue might be in the item entries in the STAC itself.

I'm encountering an issue where stackstac throws an error (which appears to be because it can't find the resolution in the item). Specifying a resolution the the stackstac.stack call manually fixes that issue, but then I end up empty (0-length) indices for band and time, and a few other attributes are missing as well.

Any idea what might be going on? MWEs are below.

To create the wonky DataArray:

import planetary_computer as pc
import stackstac
from pystac_client import Client as stac_client

bbox = (-104.79075, 30.86168, -104.77074999999999, 30.881680000000003)

# Open the catalog with pystac_client  
catalog = stac_client.open("https://planetarycomputer.microsoft.com/api/stac/v1")

# Search the catalog
search = catalog.search(collections=["3dep-seamless"], bbox=bbox)
items = list(search.items())

# We only want the high res layers
items_high_res = [
    pc.sign(item).to_dict()
    for item in items
    if item.properties["gsd"] == 10
]

# Lazily initialize our data layers
elevation = stackstac.stack(
    items_high_res,
    bounds=bbox,
    chunksize=64,
    resolution=(9.2592593e-05, 9.2592593e-05) # This is the native resolution, hard coded 
                                # as a workaround for a bug
).squeeze()

elevation

Output:

>>> elevation
<xarray.DataArray 'stackstac-2f63e84b98241f78db41c110675eccdc' (time: 0,
                                                                band: 0,
                                                                y: 217, x: 217)>
dask.array<fetch_raster_window, shape=(0, 0, 217, 217), dtype=float64, chunksize=(0, 0, 64, 64), chunktype=numpy.ndarray>
Coordinates:
  * time     (time) datetime64[ns] 
    id       (time) float64 
  * band     (band) float64 
  * x        (x) float64 -104.8 -104.8 -104.8 -104.8 ... -104.8 -104.8 -104.8
  * y        (y) float64 30.88 30.88 30.88 30.88 ... 30.86 30.86 30.86 30.86
    epsg     int64 5498
Attributes:
    spec:        RasterSpec(epsg=5498, bounds=(-104.790833794413, 30.86166680...
    crs:         epsg:5498
    transform:   | 0.00, 0.00,-104.79|\n| 0.00,-0.00, 30.88|\n| 0.00, 0.00, 1...
    resolution:  9.2592593e-05

To create a normal DataArray with a different bbox

import planetary_computer as pc
import stackstac
from pystac_client import Client as stac_client

## This bbox line is the only difference from the above code block
bbox = (-97.73474, 33.785180000000004, -97.71473999999999, 33.80518)

# open the catalog with pystac_client  
catalog = stac_client.open("https://planetarycomputer.microsoft.com/api/stac/v1")

# Search the catalog
search = catalog.search(collections=["3dep-seamless"], bbox=bbox)
items = list(search.items())

# We only want the high res layers
items_high_res = [
    pc.sign(item).to_dict()
    for item in items
    if item.properties["gsd"] == 10
]

# Lazily initialize our data layers
elevation = stackstac.stack(
    items_high_res,
    bounds=bbox,
    chunksize=64,
    resolution=(9.2592593e-05, 9.2592593e-05) # This is the native resolution, hard coded 
                                # as a workaround for a bug
).squeeze()

elevation

output:

>>> elevation
<xarray.DataArray 'stackstac-bb3aa93860305fb871c9f6e04bdf9be2' (y: 217, x: 217)>
dask.array<getitem, shape=(217, 217), dtype=float64, chunksize=(64, 64), chunktype=numpy.ndarray>
Coordinates: (12/15)
    time             datetime64[ns] 2019-11-25
    id               <U10 'n34w098-13'
    band             <U4 'data'
  * x                (x) float64 -97.73 -97.73 -97.73 ... -97.72 -97.71 -97.71
  * y                (y) float64 33.81 33.81 33.81 33.8 ... 33.79 33.79 33.79
    proj:shape       object {10812}
    ...               ...
    gsd              int64 10
    start_datetime   <U20 '2018-03-09T00:00:00Z'
    threedep:region  <U7 'n30w100'
    title            <U40 'USGS 1/3 arc-second n34w098 1 x 1 degree'
    description      <U1849 'This tile of the 3D Elevation Program (3DEP) sea...
    epsg             int64 5498
Attributes:
    spec:        RasterSpec(epsg=5498, bounds=(-97.734815244848, 33.785092741...
    crs:         epsg:5498
    transform:   | 0.00, 0.00,-97.73|\n| 0.00,-0.00, 33.81|\n| 0.00, 0.00, 1.00|
    resolution:  9.2592593e-05
@TomAugspurger
Copy link

Thanks for the report. I'll take a closer look when I get a chance, but a couple quick notes:

  1. I'm not super familiar with which properties of the STAC item stackstac uses for resolution, but presumably it's using proj:transform?
  2. Maybe it makes sense to get the links for a "good" and "bad" STAC item and check the
  • STAC metadata for each
  • The values in the COGs for each (using e.g. gdalinfo)

@vlandau
Copy link
Author

vlandau commented Nov 9, 2022

Thanks @TomAugspurger. I'll get the links for each STAC item and will take a look. I'll also post those links here in case you get a change to look at it as well. Will report back here shortly with those links.

@vlandau
Copy link
Author

vlandau commented Nov 9, 2022

@vlandau
Copy link
Author

vlandau commented Nov 9, 2022

Good item:

⟩ gdalinfo data/USGS_13_n34w098.tiff
Driver: GTiff/GeoTIFF
Files: data/USGS_13_n34w098.tiff
Size is 10812, 10812
Coordinate System is:
GEOGCRS["NAD83",
    DATUM["North American Datum 1983",
        ELLIPSOID["GRS 1980",6378137,298.257222101004,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4269]]
Data axis to CRS axis mapping: 2,1
Origin = (-98.000555556293307,34.000555555795103)
Pixel Size = (0.000092592592692,-0.000092592592692)
Metadata:
  AREA_OR_POINT=Area
  BandDefinitionKeyword=*
  DataType=*
Image Structure Metadata:
  COMPRESSION=LZW
  INTERLEAVE=BAND
  PREDICTOR=3
Corner Coordinates:
Upper Left  ( -98.0005556,  34.0005556) ( 98d 0' 2.00"W, 34d 0' 2.00"N)
Lower Left  ( -98.0005556,  32.9994444) ( 98d 0' 2.00"W, 32d59'58.00"N)
Upper Right ( -96.9994444,  34.0005556) ( 96d59'58.00"W, 34d 0' 2.00"N)
Lower Right ( -96.9994444,  32.9994444) ( 96d59'58.00"W, 32d59'58.00"N)
Center      ( -97.5000000,  33.5000000) ( 97d30' 0.00"W, 33d30' 0.00"N)
Band 1 Block=256x256 Type=Float32, ColorInterp=Gray
  Description = Layer_1
  Min=142.573 Max=403.063 
  Minimum=142.573, Maximum=403.063, Mean=262.266, StdDev=44.176
  NoData Value=-9999
  Overviews: 5406x5406, 2703x2703, 1352x1352, 676x676, 338x338
  Metadata:
    LAYER_TYPE=athematic
    RepresentationType=*
    STATISTICS_MAXIMUM=403.06253051758
    STATISTICS_MEAN=262.2663938277
    STATISTICS_MINIMUM=142.57325744629
    STATISTICS_STDDEV=44.175693340491
    STATISTICS_VALID_PERCENT=100

"Bad" item:

⟩ gdalinfo data/USGS_13_n31w103.tiff                                                                                                (soil_moisture) 
Driver: GTiff/GeoTIFF
Files: data/USGS_13_n31w103.tiff
Size is 10812, 10812
Coordinate System is:
GEOGCRS["NAD83",
    DATUM["North American Datum 1983",
        ELLIPSOID["GRS 1980",6378137,298.257222101004,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4269]]
Data axis to CRS axis mapping: 2,1
Origin = (-103.000555555555565,31.000555555555557)
Pixel Size = (0.000092592592593,-0.000092592592593)
Metadata:
  AREA_OR_POINT=Area
Image Structure Metadata:
  COMPRESSION=LZW
  INTERLEAVE=BAND
  PREDICTOR=3
Corner Coordinates:
Upper Left  (-103.0005556,  31.0005556) (103d 0' 2.00"W, 31d 0' 2.00"N)
Lower Left  (-103.0005556,  29.9994444) (103d 0' 2.00"W, 29d59'58.00"N)
Upper Right (-101.9994444,  31.0005556) (101d59'58.00"W, 31d 0' 2.00"N)
Lower Right (-101.9994444,  29.9994444) (101d59'58.00"W, 29d59'58.00"N)
Center      (-102.5000000,  30.5000000) (102d30' 0.00"W, 30d30' 0.00"N)
Band 1 Block=256x256 Type=Float32, ColorInterp=Gray
  Description = Layer_1
  Min=560.307 Max=1660.885 
  Minimum=560.307, Maximum=1660.885, Mean=962.446, StdDev=176.971
  NoData Value=-999999
  Overviews: 5406x5406, 2703x2703, 1352x1352, 676x676, 338x338
  Metadata:
    LAYER_TYPE=athematic
    STATISTICS_MAXIMUM=1660.8853759766
    STATISTICS_MEAN=962.4457942364
    STATISTICS_MINIMUM=560.30670166016
    STATISTICS_STDDEV=176.97062422337
    STATISTICS_VALID_PERCENT=100

Nothing seems funny here. I'll look into the actual stac metadata now -- I didn't initially see any differences after comparing the outputs of to_dict(), but I'll take a closer look.

@vlandau
Copy link
Author

vlandau commented Nov 9, 2022

Ah, looks like the transform entry in the bad item is just wrong. The math also doesn't make sense for the bounding box, transform, and dimensions (ROWxCOL).

transform for "bad" item (item.properties["transform"]):

'proj:transform': [1e-05,
     0.0,
     -103.0005555556,
     0.0,
     -1e-05,
     31.00055555556,
     0.0,
     0.0,
     1.0]

transform, and dimensions (ROWxCOL)

transform for "good" item ( (item.properties["transform"]):

'proj:transform': [9.2592593e-05,
     0.0,
     -98.0005555563,
     0.0,
     -9.2592593e-05,
     34.0005555558,
     0.0,
     0.0,
     1.0]

You'll notice that these don't line up with the pixel sizes that were output from gdalinfo above, so it looks like the data are correct, but the stac metadata is not.

There are several items where this problem is arising. Based on an admittedly superficial look, all of the problem items also seem to be dated 2013.

@TomAugspurger
Copy link

Thanks for tracking that down. We'll look into it.

@TomAugspurger
Copy link

Sorry for failing to update the status here. In stactools-packages/threedep#12, we fixed the upstream stactools package used to generate these items.

We'll need to pull that into our pipeline and regenerate all these items.

@ghidalgo3 ghidalgo3 added the bug Something isn't working label Nov 8, 2024
@jeremyg19
Copy link

I am having similar issues, thought using odc-stac rather than stackstac. The problem seems to arise from the 2013 data with an improper resolution value in the proj:transform. Even specifying to use the proper value in resolution in the .stac_load results in invalid data.

Recreate the issue

import planetary_computer
import pystac_client
from pystac import ItemCollection
import odc.stac
import numpy as np

catalog = pystac_client.Client.open(
        "https://planetarycomputer.microsoft.com/api/stac/v1",
        modifier=planetary_computer.sign_inplace,
    )

bbox = (-79.88555907199995, 39.237380310000034, -78.87994380799995, 40.375348947000106)


search = catalog.search(collections=["3dep-seamless"], bbox=bbox)
items_all = search.item_collection()

# Can return the high 10m or low 30 m res
items = ItemCollection([item for item in items_all if item.properties["gsd"] == 10])

for item in items:
    print(item.self_href)
    print(item.id, item.properties)

xdem_bad_res_spec = odc.stac.stac_load([items[2]], bbox=bbox, resolution=stac_res, chunks={'x': 1024, 'y': 1024})
xdem_bad_no_res_spec = odc.stac.stac_load([items[2]], bbox=bbox, chunks={'x': 1024, 'y': 1024})
# Note the large (46Gb) size using the default value

xdem_good_res_spec = odc.stac.stac_load([items[0]], bbox=bbox, resolution=stac_res, chunks={'x': 1024, 'y': 1024})
xdem_good_no_res_spec = odc.stac.stac_load([items[0]], bbox=bbox, chunks={'x': 1024, 'y': 1024})

xdem_bad_res_spec_da = xdem_bad_res_spec.to_array().compute().squeeze()
# xdem_bad_no_res_spec_da = xdem_bad_no_res_spec.to_array().compute().squeeze()  # Compute at your peril!

xdem_good_res_spec_da = xdem_good_res_spec.to_array().compute().squeeze()
xdem_good_no_res_spec_da = xdem_good_no_res_spec.to_array().compute().squeeze()

np.all(np.isnan(xdem_bad_res_spec_da.values))  # Results in a full nan array
np.all(np.isnan(xdem_good_res_spec_da.values))
np.all(np.isnan(xdem_good_no_res_spec_da.values))

Any help is appreciated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants