Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

validate_all throws KeyError when validating Microsoft Planetary Computer STAC API #343

Closed
philvarner opened this issue May 17, 2021 · 10 comments
Labels
bug Things which are broken
Milestone

Comments

@philvarner
Copy link
Collaborator

philvarner commented May 17, 2021

The code below throws a KeyError instead of giving a STACValidationError. I think it needs to be more defensive so if the "id" field is missing, and convert the KeyError into a STACValidationError. It's also possible that it's trying to process an entity that's not a STAC object and has no "id" field.

import pystac_client

pystac_client.Client.open(
    'https://planetarycomputer.microsoft.com/api/stac/v1/').validate_all()

The exception, from pystac/catalog.py:

Traceback (most recent call last):
  File "/Users/philvarner/code/stac-api-validation-suite/stac_api_validation_suite/validate_all_error.py", line 3, in <module>
    pystac_client.Client.open(
  File "/Users/philvarner/.local/share/virtualenvs/stac-api-validation-suite-tzI1nfla/lib/python3.9/site-packages/pystac/catalog.py", line 679, in validate_all
    child.validate_all()
  File "/Users/philvarner/.local/share/virtualenvs/stac-api-validation-suite-tzI1nfla/lib/python3.9/site-packages/pystac/catalog.py", line 680, in validate_all
    for item in self.get_items():
  File "/Users/philvarner/.local/share/virtualenvs/stac-api-validation-suite-tzI1nfla/lib/python3.9/site-packages/pystac/stac_object.py", line 343, in get_stac_objects
    link.resolve_stac_object(root=self.get_root())
  File "/Users/philvarner/.local/share/virtualenvs/stac-api-validation-suite-tzI1nfla/lib/python3.9/site-packages/pystac/link.py", line 146, in resolve_stac_object
    obj = STAC_IO.read_stac_object(target_href, root=root)
  File "/Users/philvarner/.local/share/virtualenvs/stac-api-validation-suite-tzI1nfla/lib/python3.9/site-packages/pystac/stac_io.py", line 131, in read_stac_object
    return cls.stac_object_from_dict(d, href=uri, root=root)
  File "/Users/philvarner/.local/share/virtualenvs/stac-api-validation-suite-tzI1nfla/lib/python3.9/site-packages/pystac/serialization/__init__.py", line 37, in stac_object_from_dict
    return Catalog.from_dict(d, href=href, root=root)
  File "/Users/philvarner/.local/share/virtualenvs/stac-api-validation-suite-tzI1nfla/lib/python3.9/site-packages/pystac/catalog.py", line 790, in from_dict
    id = d.pop('id')
KeyError: 'id'
@duckontheweb
Copy link
Contributor

This is related to #340 and should raise a more reasonable exception when it encounters a non-STAC object.

It looks like this is failing because the Collection at "https://planetarycomputer.microsoft.com/api/stac/v1/collections/aster-l1t/items" has an item link that actually points to an endpoint that returns an ItemCollection.

curl https://planetarycomputer.microsoft.com/api/stac/v1/collections/aster-l1t -s | jq ".links[2]"
{
  "href": "https://planetarycomputer.microsoft.com/api/stac/v1/collections/aster-l1t/items",
  "rel": "item",
  "type": "application/geo+json"
}

Do we define the rel type that should be used for the /collections/{collectionId}/items anywhere in the API spec?

@lossyrob

@lossyrob
Copy link
Member

I can't find a spec mention, but I've seen elsewhere a rel="items" point to the items endpoint. The 'item' reltype is a typo; I'll fix that in a bit!

@philvarner
Copy link
Collaborator Author

I can't find a spec mention, but I've seen elsewhere a rel="items" point to the items endpoint. The 'item' reltype is a typo; I'll fix that in a bit!

OGC Part 1
https://docs.opengeospatial.org/is/17-069r3/17-069r3.html#_link_relations


In addition the following link relation types are used for which no applicable registered link relation type could be identified.

    items: Refers to a resource that is comprised of members of the collection represented by the link’s context.

    conformance: Refers to a resource that identifies the specifications that the link’s context conforms to.

    data: Indicates that the link’s context is a distribution of a dataset that is an API and refers to the root resource of the dataset in the API.

@duckontheweb
Copy link
Contributor

I can't find a spec mention, but I've seen elsewhere a rel="items" point to the items endpoint. The 'item' reltype is a typo; I'll fix that in a bit!

OGC Part 1
https://docs.opengeospatial.org/is/17-069r3/17-069r3.html#_link_relations


In addition the following link relation types are used for which no applicable registered link relation type could be identified.

    items: Refers to a resource that is comprised of members of the collection represented by the link’s context.

    conformance: Refers to a resource that identifies the specifications that the link’s context conforms to.

    data: Indicates that the link’s context is a distribution of a dataset that is an API and refers to the root resource of the dataset in the API.

@philvarner Maybe we should add a "Relation types" section to the STAC API - Features docs similar to what we have for Using Relation Types in the core best practices. I think it would be helpful for backend and client implementors to have those collected in one place.

@philvarner
Copy link
Collaborator Author

@duckontheweb radiantearth/stac-api-spec#142

@duckontheweb
Copy link
Contributor

I think it needs to be more defensive so if the "id" field is missing, and convert the KeyError into a STACValidationError.

This still seems like something that should be addressed in PySTAC despite the root cause being a bad rel type in the API. Related to #340 , it seems like pystac.serialization.identify_stac_object should raise a STACTypeException for non-STAC entities, and Catalog.validate_all should catch these exceptions and convert them to STACValidationErrors.

@philvarner
Copy link
Collaborator Author

I think it needs to be more defensive so if the "id" field is missing, and convert the KeyError into a STACValidationError.

This still seems like something that should be addressed in PySTAC despite the root cause being a bad rel type in the API. Related to #340 , it seems like pystac.serialization.identify_stac_object should raise a STACTypeException for non-STAC entities, and Catalog.validate_all should catch these exceptions and convert them to STACValidationErrors.

Agree. I'm not that familiar with the API, so forgive my hand-waviness here, but I would expect that if I tried to get that specific rel=item that I'd get an exception, and if I was trying to iterate over all the items that the object has rels to that the bad one would be ignored and not included in the iterator (Ideally raising a warning that there as a bad link, e.g., something more complicated like the iterator being Union[Item, StacValidationError] or Tuple[Optional[Item], Optional[SVE]] or something).

@duckontheweb
Copy link
Contributor

I would expect that if I tried to get that specific rel=item that I'd get an exception

This is fixed via #402

and if I was trying to iterate over all the items that the object has rels to that the bad one would be ignored and not included in the iterator (Ideally raising a warning that there as a bad link, e.g., something more complicated like the iterator being Union[Item, StacValidationError] or Tuple[Optional[Item], Optional[SVE]] or something).

@philvarner Maybe this should just raise the same exception. The problem I see with skipping them and raising warnings is that in some automated systems those warnings would just get ignored, which might gloss over problems in a catalog. If we raise the exception downstream applications can just handle that exception in their iterators however they want.

@philvarner
Copy link
Collaborator Author

That sounds fine!

@duckontheweb
Copy link
Contributor

In that case this is closed via #402

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Things which are broken
Projects
None yet
Development

No branches or pull requests

3 participants