Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STAC should express original path to odc_yaml, to preserve relative links #326

Closed
benjimin opened this issue Oct 8, 2021 · 1 comment
Closed

Comments

@benjimin
Copy link
Contributor

benjimin commented Oct 8, 2021

Explorer has access to both the original .yaml location (in an s3 bucket alongside the raster files) and the API endpoint symlink http://explorer/dataset/<uuid>. If the STAC odc_yaml field provided the former, instead of the latter, it would facilitate easier indexing.


Using the stac api to query for a EO dataset:

https://explorer-aws.dea.ga.gov.au/stac/search?collections=wofs_albers&datetime=2000-01-01T00:00:00Z/2001-01-01T00:00:00Z&bbox=137.01,-28.76,137.02,-28.75&limit=1

The result looks like:

{
  "features": [
    {
      "id": "aeae0e2d-2421-4956-9a14-fad9cc9238cc",
      "links": [
        {
          "rel": "self",
          "href": "https://explorer-aws.dea.ga.gov.au/stac/collections/wofs_albers/items/aeae0e2d-2421-4956-9a14-fad9cc9238cc", ..
        },
        {
          "rel": "odc_yaml",
          "href": "https://explorer-aws.dea.ga.gov.au/dataset/aeae0e2d-2421-4956-9a14-fad9cc9238cc.odc-metadata.yaml",
          "title": "ODC Dataset YAML", ..
        },
        {
          "rel": "alternative",
          "href": "https://explorer-aws.dea.ga.gov.au/dataset/aeae0e2d-2421-4956-9a14-fad9cc9238cc",
          "title": "ODC Dataset Overview", ..
        }
      ],
      "assets": {
        "water": {
          "href": "s3://dea-public-data/WOfS/WOFLs/v2.1.5/combined/x_4/y_-32/2000/01/01/LS_WATER_3577_4_-32_20000101003706500000_v1526711988_water.tif",
          ...

Trying to index the above "ODC Dataset YAML" (e.g. using datacube dataset add) leads to failures (not on index but later upon API data access), because the yaml only contains a relative reference to the .tif imagery location, and that relative reference is only valid for the original yaml location not the explorer API endpoint location that is referred to here.

The "ODC Dataset Overview" above is a html page containing (in its dataset-location div) a hyperlink to the original yaml path:

https://dea-public-data.s3.ap-southeast-2.amazonaws.com/WOfS/WOFLs/v2.1.5/combined/x_4/y_-32/2000/01/01/LS_WATER_3577_4_-32_20000101003706500000_v1526711988.yaml

It looks as if indexing this yaml path should work fine (the relative path to the raster data will be valid).

If the "ODC Dataset YAML" (odc_yaml) stac field above instead expressed this yaml path then I think it would be quite straight-forward to index from stac searches:

curl -s "https://explorer-aws.dea.ga.gov.au/stac/search?collections=wofs_albers&datetime=2000-01-01T00:00:00Z/2001-01-01T00:00:00Z&bbox=137.01,-28.76,137.02,-28.75&limit=10" \
| jq '.features[].links[] | select(.rel == "self") | .href' \
| xargs datacube dataset add

(The stac json does contain an absolute reference to the image asset, so this would also be solved if the datacube CLI supported indexing stac json. Alternatively, may also be solvable by modifying the yaml files to include absolute location fields, e.g. upgrading to EO3 metadata format.)

@benjimin
Copy link
Contributor Author

This is hopefully resolved by #332.

(@jeremyh notes that this proposal may not have worked in general, as the yaml sourced by explorer could be embedded in a netcdf rather than a simple location, hence the solution to inject absolute references into the yaml - as was already being done for EO3 datasets.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant