Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search API does not match a collection that is in the registry #239

Closed
jjacob7734 opened this issue Jan 25, 2023 · 15 comments
Closed

Search API does not match a collection that is in the registry #239

jjacob7734 opened this issue Jan 25, 2023 · 15 comments
Assignees
Labels

Comments

@jjacob7734
Copy link

jjacob7734 commented Jan 25, 2023

Checked for duplicates

Yes - I've already checked

🐛 Describe the bug

The endpoint https://pds.nasa.gov/api/search/1/products/urn:nasa:pds:orex.ovirs:data_calibrated gave me a matching result, but the same LID in the endpoint https://pds.nasa.gov/api/search/1/classes/collections/urn:nasa:pds:orex.ovirs:data_calibrated gave me a 404 not found response.

🕵️ Expected behavior

I expected to get get a match with both endpoints.

📜 To Reproduce

1. Run the following and inspect the matching result:
curl 'https://pds.nasa.gov/api/search/1/products/urn:nasa:pds:orex.ovirs:data_calibrated' | json_pp
2. Run the following and inspect the 404 not found response:
curl 'https://pds.nasa.gov/api/search/1/classes/collections/urn:nasa:pds:orex.ovirs:data_calibrated' | json_pp
...

🖥 Environment Info

- Queries sent to production registry deployed at https://pds.nasa.gov/api/search/1/
- Curl command on MacOSX 13.1 Ventura
...

📚 Version of Software Used

curl 7.85.0 (x86_64-apple-darwin22.0) libcurl/7.85.0 (SecureTransport) LibreSSL/3.3.6 zlib/1.2.11 nghttp2/1.47.0
Release-Date: 2022-08-31

🩺 Test Data / Additional context

No response

🦄 Related requirements

🦄 #xyz

⚙️ Engineering Details

N/A

@jjacob7734 jjacob7734 added the bug Something isn't working label Jan 25, 2023
@jordanpadams
Copy link
Member

For some additional context here, I think all /classes/ endpoints do not work on the API. Is this just something in our AWS deployment or is this also a problem with our local deployments?

curl --get 'https://pds.nasa.gov/api/search/1/classes/bundles'     --header 'Accept: application/json'
{"timestamp":1674658125774,"status":500,"error":"Internal Server Error","path":"/classes/bundles"}

curl --get 'https://pds.nasa.gov/api/search/1/classes/collections/'     --header 'Accept: application/json'
{"timestamp":1674658164725,"status":500,"error":"Internal Server Error","path":"/classes/collections/"}

@alexdunnjpl
Copy link
Contributor

@jordanpadams I found this issue while looking for whether /classes/{class}/{identifier} was known to be not-implemented or known to be broken - it doesn't appear in the swagger docs and doesn't work on my local (which I was testing for an unrelated ticket)

@al-niessner @tloubrieu-jpl any positive memory of this path being already implemented?

@al-niessner
Copy link
Contributor

@jordanpadams I found this issue while looking for whether /classes/{class}/{identifier} was known to be not-implemented or known to be broken - it doesn't appear in the swagger docs and doesn't work on my local (which I was testing for an unrelated ticket)

@al-niessner @tloubrieu-jpl any positive memory of this path being already implemented?

@alexdunnjpl

Because it is just /products/(identifier}. Doing /classes/{class}/{identifier} is possible and only adds confusion to /products/{identifier} if the user does /classes/bundles/urn:pds:mybundle:mycollection because now you need to throw an error because the collection lidvid is not a bundle. So, /products/{identifier} should be used instead.

@alexdunnjpl
Copy link
Contributor

@al-niessner how does this gel with the fact that (for example) /collections/{identifier} is implemented/supported?

@al-niessner
Copy link
Contributor

@al-niessner how does this gel with the fact that (for example) /collections/{identifier} is implemented/supported?

@alexdunnjpl

Annoyingly, it throws an error when identifier is not a collection. It is clearer to the user than returning nothing because the identifier is not a collection. I had to put some gross blister code in the controller to check that the identifier, if found, was of the correct type to handle all those.

The original code was written to do /products/{id}/references/{class} or referenced-by for references and /classes/{class}/references/{id} again with references or referenced-by. All the current endpoints are some butchering of those 4 endpoints so it uses the referencing code but just bashes up the user context. The original 4 also prevent this crazy check of is this id of that class.

@alexdunnjpl
Copy link
Contributor

Thanks Al. Given that, @jordanpadams, what is the intended functionality here?

@jordanpadams
Copy link
Member

@alexdunnjpl I'm not sure I am entirely following here, but if we have an identifier collection_lid, then I am thinking /classes/collections/collection_lid and /products/collection_lid should return the same thing. I know it is duplicative, but it supports the different ways a user may be thinking about/accessing the API. if we have bundle_lid, then I am thinking /classes/collections/bundle_lid should return a 404 not found. I don't think we need to put anything specialized in there to handle this. I am not sure what is happening under the hood, but I imagine the different endpoints just add additional query parameters to the OpenSearch query?

@tloubrieu-jpl ☝️ thoughts?

@alexdunnjpl
Copy link
Contributor

@jordanpadams basically:
/collections/{identifier} is implemented, and works
/classes/collections/{identifier} does not appear in the swagger docs, and 404s per Joe's OP. It's not clear prima facie whether it is the route that is missing or the product, but if it's missing from the swagger docs I assume it's actually the latter.

The same is also probably true for the bundles equivalents.

So my question was/is "Were they ever actually implemented, or was it just /classes/collections/ and these were intentionally left out?

If I understand @al-niessner correctly, his position is that they aren't and shouldn't be implemented, and the fact that their equivalents were implemented in the deprecated version of the API is a bug rather than a feature, because they require an additional check to validate the result's collection-ness or bundle-ness (Al, is this a bad thing because such checks are hard to implement in the intended paradigm/architecture of the API? I'd have thought this was a pretty standard level of flexibility to require of an API)

So the decision to be made is what is supposed to be implemented and why/why-not.

@jjacob7734
Copy link
Author

jjacob7734 commented Feb 1, 2023

Just to chime in here, as somebody that is new to the API, my first impression is that the API is too complex. The /classes/ part of the api seems redundant and could be removed. I would also eliminate the explicit endpoint specification of .../members to get the members of a bundle or collection. In addition, I would eliminate the 404 errors that arise because of the identifier being of the wrong type (product, collection, bundle). To me a clear API would just have the following endpoints that can give info on any class of product in PDS, as well as drilling up to parent entity, or down to members:

  1. /info/<lidvid>: info about the identified product, collection, or bundle. I would return this info regardless of the class of the entity (product, collection, or bundle).
  2. /bundle/<lidvid>: This will always give information about a bundle regardless of the class of the identifier (could be a product, collection, or bundle lidvid). If the identifier is for a bundle, get info about that bundle (equivalent to /info/<bundle-lidvid>); else, if the identifier is for a collection or product, get information about the parent bundle (or bundles since a collection or product could conceivably be members of multiple bundles).
  3. /collection/<lidvid>: This will always give information about a collection or collections, regardless of the class of the identifier (could be a product, collection, or bundle lidvid). If the identifier is for a bundle, get all the member collections of that bundle; else, if the identifier is for a collection, get info about that collection (equivalent to /info/<collection-lidvid>); else, if the identifier is for a product, get information about the parent collection (or collections, since a product could conceivably be members of multiple collections).
  4. /product/<lidvid>: This will always give information about a product or products, regardless of the class of the identifier (could be a product, collection, or bundle lidvid). If the identifier is for a bundle or collection, get all the member products or that bundle or collection; else, if the identifier is for a product, get information about that product (equivalent to /info/<product-lidvid>).

With that, the user only has 4 endpoints to learn how to use.

@tloubrieu-jpl
Copy link
Member

Your proposal is interesting @jjacob7734, I like the idea and we should discuss that further. However I would left that discussion aside for now to simplify the discussion here (we could create a different ticket).

To @alexdunnjpl , I think this is my mistake to understand that /collections/ has been migrated to /classes/collections/. This is fine with me for now if it does not work and that we use /products/ instead.

@tloubrieu-jpl
Copy link
Member

If everyone agrees, and if I did not miss anything we can close this ticket with label wontfix

@alexdunnjpl
Copy link
Contributor

@tloubrieu-jpl my thought - pragmatism no object - is that we want these endpoints, if not because I think it's a preferable choice of functionality then because our spec for the PDS API mandates RESTful implementations, and for any endpoint returning a collection (of resources - not a PDS Collection) it's kind of a gimme that a minimal implementation will include support for "that endpoint URI path plus an identifier" which returns the single resource from that collection, having the given identifier.

In practice, it sounds like there's a good reason why it may be difficult to support this - reading between the lines it seems like maybe the existing implementation architecture of the API is too rigid to cleanly support the required behaviour (hence gross blister code)?

Perhaps @al-niessner can more-accurately speak to the "why" of that, otherwise if you think it's worth considering I'll dig in and try to figure that out myself.

@tloubrieu-jpl
Copy link
Member

@alexdunnjpl I am with you on the RestFul implementation, but for now I was rather looking easy way to close tickets which are not blocking bugs.
I will just change the labeling of the ticket for now, you can keep on working on it.

@tloubrieu-jpl tloubrieu-jpl added task p.should-have and removed bug Something isn't working s.high High severity labels Feb 1, 2023
@jordanpadams
Copy link
Member

@alexdunnjpl @tloubrieu-jpl just for my own clarification, the endpoint definition missing here is not actually /classes/collections/{identifier} but /classes/{class}/{identifier} where class supports some subset of Product_* from the PDS4 Information Model. In the future we want to support all Product_*, but for now, we can just support the existing subset defined.

@tloubrieu-jpl
Copy link
Member

I am closing ticket as not a bug since we want to deprecate anyway the /classes/{class}/{identifier} end-points in the API

@tloubrieu-jpl tloubrieu-jpl added the wontfix This will not be worked on label Feb 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants