Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

registry-api does not return latest version of product metadata when multiple versions are harvested #224

Closed
alexdunnjpl opened this issue Dec 29, 2022 · 10 comments · Fixed by #235
Assignees
Labels
B13.1 bug Something isn't working i&t.done s.high High severity

Comments

@alexdunnjpl
Copy link
Contributor

🐛 Describe the bug

Given that v7.0 of the example product is harvested and set to archived status, if v14.0 is harvested and ingested:

  • Requesting the LID returns v7.0 metadata
  • Requesting the v14.0 LIDVID succeeds, but returns v7.0 metadata.

📜 To Reproduce

Steps to reproduce the behavior:

  1. Deploy docker-composed registry int-registry-batch-loader profile (which includes urn:nasa:pds:insight_rad:data_derived::7.0
  2. Run harvest against the latest version of urn:nasa:pds:insight_rad bundle
  3. Run registry-mgr against the LIDVID of the bundle or data_derived collection
  4. Run curl --get 'http://localhost:8080/collections/urn:nasa:pds:insight_rad:data_derived' | json_pp, observe v7.0 metadata returned
  5. Run curl --get 'http://localhost:8080/collections/urn:nasa:pds:insight_rad:data_derived::14.0' | json_pp (i.e latest version LIDVID), observe v7.0 metadata returned

🕵️ Expected behavior

Expect return of v14.0 metadata

📚 Version of Software Used

Docker image nasapds/registry-api-service:1.1.11

@alexdunnjpl
Copy link
Contributor Author

Per @jordanpadams, resolved in #193 - whether this should be closed immediately or confirmed as fixed once the changes make it into the docker cluster I'll leave up to him.

@jordanpadams
Copy link
Member

@alexdunnjpl can you verify if this is still an issue or not?

@alexdunnjpl
Copy link
Contributor Author

Will-do today

@jimmie
Copy link
Contributor

jimmie commented Jan 17, 2023

I just got info on how to access pdscloud-prod1 as pds4 so I'll be deploying the provenance script today - please wait until tomorrow to test the all/latest feature.

@alexdunnjpl
Copy link
Contributor Author

@jordanpadams I'm still experiencing this issue despite running provenance.py on my local instance and confirming that the db document was correctly updated with provenance information:

parallels@parallels-Parallels-Virtual-Platform:~$ curl -k -u admin:admin https://localhost:9200/registry/_doc/urn:nasa:pds:insight_rad:data_derived::7.0 --header 'application/vnd.nasa.pds.pds4+json' | json_pp | grep supersede
      "ops:Provenance/ops:superseded_by" : "urn:nasa:pds:insight_rad:data_derived::14.0",

curl --get 'http://localhost:8080/collections/urn:nasa:pds:insight_rad:data_derived'
and
curl --get 'http://localhost:8080/collections/urn:nasa:pds:insight_rad:data_derived::14.0'

both return version 7.0.

We have the provenance script, but was there actually ever a modification to the API to use the extra superseded_by metadata?

@jordanpadams
Copy link
Member

@al-niessner can you checkout what is going on here?

@jordanpadams
Copy link
Member

@alexdunnjpl ☝️

@jordanpadams
Copy link
Member

@al-niessner @alexdunnjpl whatever the outcome from this investigation, can we make sure we have some tests either in this repo or wherever we are maintaining our tests to make sure we check something like this? This should be a base case test we need.

@tloubrieu-jpl ☝️

@al-niessner
Copy link
Contributor

@alexdunnjpl @jimmie @jordanpadams @tloubrieu-jpl

This is ancient code and probably still in use:

public static String getLatestLidVidByLid(
ControlContext ctlContext,
RequestBuildContext reqContext,
String lid) throws IOException,LidVidNotFoundException
{
lid = LidVidUtils.extractLidFromLidVid(lid);
SearchRequest searchRequest = new SearchRequestFactory(RequestConstructionContextFactory.given("lid", lid, true), ctlContext.getConnection())
.build(RequestBuildContextFactory.given(true, "lidvid", reqContext.getPresetCriteria()), ctlContext.getConnection().getRegistryIndex());
SearchResponse searchResponse = ctlContext.getConnection().getRestHighLevelClient().search(searchRequest,
RequestOptions.DEFAULT);
if (searchResponse != null)
{
ArrayList<String> lidvids = new ArrayList<String>();
String lidvid;
for (SearchHit searchHit : searchResponse.getHits())
{
lidvid = (String)searchHit.getSourceAsMap().get("lidvid");;
lidvids.add(lidvid);
}
Collections.sort(lidvids);
if (lidvids.isEmpty()) throw new LidVidNotFoundException(lid);
else return lidvids.get(lidvids.size() - 1);
}
throw new LidVidNotFoundException(lid);
}

While the filtering for latest was added here:

public static void addHistoryStopband (BoolQueryBuilder boolQuery)
{
boolQuery.mustNot(QueryBuilders.existsQuery("ops:Provenance/ops:superseded_by"));
}

to fix many of the searches that do not specifically look for latest, like q or keyword, but none of the logic to fix those that specifically search for the latest lidvid which use the ancient code. As I have said previously, the line

would never work because 14 precedes 7 in a string sort. It is the problem you are seeing now.

In the end it is

that needs to rewritten to use the new superseded_by instead of the sorting approach.

I can do it if @alexdunnjpl is not already on it. I am just not interested in competitive speed typing so will wait to see who is going to do the rewrite.

@alexdunnjpl
Copy link
Contributor Author

alexdunnjpl commented Jan 18, 2023

Thanks for digging up all that context!

@jordanpadams I can do it if it's not something that needs to get sorted yesterday, otherwise @al-niessner would be the better candidate here. Actually, I'm almost certain I can get it done this week, at least the bare essentials.

Regarding versions/sorting, is not viable to do something like

  • define a class VID, wherein comparison operators are defined
  • define a function lidvidToVIdObj()
  • Collections.sort(lidvids, key=lidvidToVidObj)

Obviously the provenance metadata was put there in part so that we could use it here, but if you have a complete set of lidvids for a given lid just sitting there in a collection, seems like maybe the better option to just sort them without relying on the provenance state?

If we don't have a VID class already I'd argue we should anyway, for this reason and because I've found a similar need every time I've worked on software that has to touch lidvids.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B13.1 bug Something isn't working i&t.done s.high High severity
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants