-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with get_collections
#320
Comments
I think that https://cmr.earthdata.nasa.gov/stac/OB_DAAC is missing a If |
For APIs that don't advertise their collections endpoint, you can use the I tested this out on the command line and here's the results. First, without $ time stac-client collections https://cmr.earthdata.nasa.gov/stac/OB_DAAC | jq '. | length'
10
stac-client collections https://cmr.earthdata.nasa.gov/stac/OB_DAAC 0.35s user 0.09s system 0% cpu 59.677 total Then, with $ time stac-client collections --ignore-conformance https://cmr.earthdata.nasa.gov/stac/OB_DAAC | jq '. | length'
897
stac-client collections --ignore-conformance 1.33s user 0.13s system 5% cpu 28.014 total It may be reasonable for pystac-client to (e.g.) throw a warning when iterating collections without the EDIT: @TomAugspurger we may maintain that endpoint but I don't know, I'm asking around. |
@TomAugspurger that endpoint is maintained by the NASA CMR team. |
Thanks. I opened nasa/cmr-stac#236 to track that. We can leave this open to address whether pystac-client should warn when taking the slow path. I don't have a strong preference, as long as there's a relatively easy way to silence those warnings (like a parameter when creating the catalog). |
I think a warning would be a good idea. Only implementing Core & Item Search happens, so calling collections on such an API could be surprisingly slow for the user if there's a deeply-nested tree of collections. |
@gadomski Thanks, adding @TomAugspurger Thanks for opening an issue on NASA CMR. |
I'll also say that the meaning of Obviously some issues with changing the semantics now, but I think these heuristic behaviors (like /collections vs child links) are problematic because they do things that aren't clear to the user. If we didn't already have the behavior, I'd say that it should use the Features or Collections conformance class by default or if there's a |
Agreed. In general, I'd like to see granular configuration (probably in a config-style object) where you can enable/disable(/warn/error) various heuristics, since this type of issue seems to keep popping up (e.g. related to #136 you could force the client to use the items endpoint even if its not advertised). |
Ok just to make sure this comes full circle now that #480 is in (and there should probably be more docs on this). First you try running it naively: In [1]: from pystac_client import Client
...:
...: STAC_URL = 'https://cmr.earthdata.nasa.gov/stac'
...: catalog = Client.open(f'{STAC_URL}/OB_DAAC')
...:
...: for c in catalog.get_collections():
...: print(c)
...:
/home/jsignell/pystac-client/pystac_client/client.py:399: DoesNotConformTo: Server does not conform to COLLECTIONS, FEATURES
self._warn_about_fallback("COLLECTIONS", "FEATURES")
/home/jsignell/pystac-client/pystac_client/client.py:399: FallbackToPystac: Falling back to pystac. This might be slow.
self._warn_about_fallback("COLLECTIONS", "FEATURES")
<CollectionClient id=Turbid9.v0>
<CollectionClient id=GreenBay.v0>
<CollectionClient id=Catlin_Arctic_Survey.v0>
... # this goes slowly Then you notice the warnings about conformance and falling back to pystac so you do In [2]: from pystac_client import Client
...:
...: STAC_URL = 'https://cmr.earthdata.nasa.gov/stac'
...: catalog = Client.open(f'{STAC_URL}/OB_DAAC')
...: catalog.add_conforms_to("COLLECTIONS")
...:
...: for c in catalog.get_collections():
...: print(c)
...:
/home/jsignell/pystac-client/pystac_client/client.py:601: MissingLink: No link with rel='data' could be found on this Client.
href = self._get_href("data", data_link, "collections")
<CollectionClient id=Turbid9.v0>
<CollectionClient id=GreenBay.v0>
<CollectionClient id=Catlin_Arctic_Survey.v0>
... # this goes fast |
I am seeing a problem where if I call
get_collections
directly it is very slow and only the first page of the collections are returned. If I reimplement theget_collections
function it runs almost instantaneously and returns the entire list of collections.Here is a minimal example:
This takes almost 30 seconds to run and returns only the first page of collections (10 in total).
This is the re-implementation which runs as expected:
This implementation is basically copy pasted from the code, so I am not sure why calling it directly on the class doesn't provide the same performance or output.
The text was updated successfully, but these errors were encountered: