Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

registry-api does not respect VID when a LIDVID is used as an id, instead returns latest version #234

Closed
alexdunnjpl opened this issue Jan 19, 2023 · 39 comments · Fixed by #236
Assignees
Labels
B13.1 bug Something isn't working s.high High severity

Comments

@alexdunnjpl
Copy link
Contributor

🐛 Describe the bug

Given multiple ingested versions of a product, requesting <someclass>/<someLIDVID> returns the document for the most-recent version of the product matching the LID, irrespective of which VID is specified

📜 To Reproduce

Steps to reproduce the behavior:

  1. Deploy docker-composed registry int-registry-batch-loader profile (which includes urn:nasa:pds:insight_rad:data_derived::7.0
  2. Run harvest against the latest version of urn:nasa:pds:insight_rad bundle
  3. Run registry-mgr against the LIDVID of the bundle or data_derived collection
  4. Run curl --get 'http://localhost:8080/collections/urn:nasa:pds:insight_rad:data_derived::7.0' | json_pp, observe v14.0 metadata returned

🕵️ Expected behavior

Expect return of requested version, not latest

📚 Version of Software Used

Docker image nasapds/registry-api-service:1.1.11
registry-api version which includes commit 252bfbe and its dependencies

@alexdunnjpl
Copy link
Contributor Author

This appears to be caused by the default value of URIParameters.selector which is always ProductVersionSelector.LATEST.

My gut feeling is that the value of selector should depend on the value of lidvid and probably be set to ProductVersionSelector.TYPED if the supplied lidvid is actually a LIDVID and not just a LID, but I'm not yet sure of what side-effects this might cause.

@jimmie
Copy link
Member

jimmie commented Jan 19, 2023

arose from registry-api #223

@al-niessner
Copy link
Contributor

@alexdunnjpl @jimmie

No, this VID problem is a sorting problem as stated in yet another ticket that I cannot find but I think @alexdunnjpl implemented a more complex set of classes to be able to sort to fix it. It has nothing to do with URIParameters.

@alexdunnjpl
Copy link
Contributor Author

alexdunnjpl commented Jan 24, 2023

@al-niessner This is probably the ticket in question (hidden because it's closed)

That was a sorting issue, but if I understand what ProductVersionSelector is supposed to do, asking the API for urn:nasa:pds:insight_rad:data_derived::7.0 should result in a value of TYPED because the user typed the version they needed?

I still need to return to this issue for further investigation but it's not that I think the default is wrong, more that as far as I could see it selector should be set toTYPED during a call to LidVidUtils.resolve(), and for whatever reason was not.

I just took a look, and what I'm seeing is ostensibly that in URIParameters.setLidVid(), it resolves the identifier string to an explicit LID or LIDVID but in the event of the user having typed in a specific version (like 7.0) makes no attempt to actually set this.selector to TYPED, leaving it with its default value of LATEST, which then causes the query submitted to OpenSearch to grab the latest version, ignoring the v7.0 specification provided by the user. Adding this.selector = ProductVersionSelector.TYPED to URIParameters.setLidVid()` resolves the issue in this ticket, but only works in isolation and almost certainly breaks other cases than where the user specifies a VID.

Based on what I'm seeing, it seems reasonable to rename ProductVersionSelector.TYPED to ProductVersionSelector.SPECIFIC (so that it makes sense to set selector so, in setLidVid()) and adjust the logic it affects - WIP but just saving my thoughts here before dinner.

@alexdunnjpl
Copy link
Contributor Author

alexdunnjpl commented Jan 24, 2023

So after the most-recent batch of changes (see branch moar-lidvid-classes), everything is pretty-much working as expected, except for the fact that when querying by LID (without VID) or partial-LIDVID (something::, something:1, something:1.), it's throwing due to the presence of >1 hits.

That remains true having ensured that provenance is populated per the following:

parallels@parallels-Parallels-Virtual-Platform:~$ curl --get 'http://localhost:8080/collections/urn:nasa:pds:insight_rad:data_derived::7.0' | json_pp | grep -A 2 supersede

      "ops:Provenance.ops:superseded_by" : [
         "urn:nasa:pds:insight_rad:data_derived::14.0"
      ],

parallels@parallels-Parallels-Virtual-Platform:~$ curl --get 'http://localhost:8080/collections/urn:nasa:pds:insight_rad:data_derived::14.0' | json_pp | grep -A 2 supersede

<no output>

Given curl --get 'http://localhost:8080/collections/urn:nasa:pds:insight_rad:data_derived' | json_pp, the query produced for submission to OpenSearch is as follows, so I'm confused why it's not working correctly:

{
  "bool" : {
    "must" : [
      {
        "term" : {
          "lid" : {
            "value" : "urn:nasa:pds:insight_rad:data_derived",
            "boost" : 1.0
          }
        }
      },
      {
        "terms" : {
          "ops:Tracking_Meta/ops:archive_status" : [
            "archived",
            "certified"
          ],
          "boost" : 1.0
        }
      }
    ],
    "must_not" : [
      {
        "exists" : {
          "field" : "ops:Provenance/ops:superseded_by",
          "boost" : 1.0
        }
      }
    ],
    "adjust_pure_negative" : true,
    "boost" : 1.0
  }
}

@alexdunnjpl
Copy link
Contributor Author

alexdunnjpl commented Jan 24, 2023

Ohh, I see it now...

ops:Provenance/ops:superseded_by
ops:Provenance.ops:superseded_by

@al-niessner which is correct? I'm guessing the former?

EDIT: Now I'm totally confused - I thought the latter must be the result of a typo in provenance.py, but it's correct there

bulk.append (json.dumps({'doc':{'ops:Provenance/ops:superseded_by':supersede}}))

and a global string search in registry-api yields zero hits for ops:Provenance.ops:superseded_by or any additional hits for ops:Provenance, even.

@alexdunnjpl
Copy link
Contributor Author

alexdunnjpl commented Jan 24, 2023

Checking the OpenSearch document itself, the string is correct, so I have no idea why the API call for v7.0 is returning it with a period instead of a forward-slash (or, in redux, why the query is returning both v7.0 and v14.0)

curl -k -u admin:admin https://localhost:9200/registry/_doc/urn:nasa:pds:insight_rad:data_derived::7.0 --header 'application/vnd.nasa.pds.pds4+json' | json_pp | grep supersede

      "ops:Provenance/ops:superseded_by" : "urn:nasa:pds:insight_rad:data_derived::14.0",

@al-niessner
Copy link
Contributor

@alexdunnjpl @jimmie

Ah, / vs . is some good stuff. So, in opensearch ops:Provenance.ops:subperseded means something that either or both @jordanpadams or @tloubrieu-jpl do not like. Therefore we use / in the database. However, the general user community does not / and they want to use . which caused the creation and use of this utility.

Therefore, . is correct in the URI while / is correct in the database itself.

@al-niessner
Copy link
Contributor

al-niessner commented Jan 24, 2023

@alexdunnjpl @jimmie

Okay, I awoke in the middle of the night and realized that my glib answer that it was a sorting problems is not correct. You are more or less on the right track but because I had coffee in the middle of the night, drove into the office, and now short tempered, jittery, and foggy I am just going to outline it pedantically so that I do not get wrong yet again.

So the URI request comes in at registry,.controller (skipping all the empty prelude package names) and is used to build the URIParameters, which still have nothing to do with the problem. Note that in URIParameters the input value is considered an identifier. It the URIParameters are then first processed to determine if the identifier is a lid or lidvid or gibberish or whatever have you.

Since URIParameters is hidden by a "get only" interface registry.UserContext, this is the only place that the lidvid actually gets set. However, the work is done in two places. First, the way to look up the lidvid is done in URIParameters.setLidVid() where it tries to resolve it as typed.

Second, we are now in registry.model.LidVidUtils.resolve(). Since URIParameters is checking as typed and not latest, it returns either the latest if a lid is given or blindly accepts the identifier as a lidvid.

This get us back to the transmuter where we have to verify our class type and lidvid type. In other words, when the URL is of the form /classes/bundles/:lidvid then we have to check that the given lidvid is a bundle. Since everything is a product, including bundles and collections, then we do not have to do this check when the URL is of the form /products/:lidvid as is this case. That takes us handler for this type of request which should be registry.controller.Standard.

The handler for the endpoint is particularly thin and gets you to the beast registry.model.RequestAndResponseContext. There are several factory methods that allow for values to be defaulted to make the constructor happy. The registry.model.Standard.handler() uses the factory method . The factory methods move the ignore value logic from constructing classes to this class where it is needed. Still, it is the constructor that does all of the work. It is during the construction of this class, the user provided lidvid is further refined with output group contraints being the same as the resolution group constraints. They really only differ when you want parents, grandparents, children, or grandchildren. In these cases the output constraints like bundle is different than the resolution constraints (given lidvid group). The registry.controller.EndPointHander registry.controller.Member uses the other factory method for this reason.

Once again we are back at registry.model.LidVidUtils,resolve() with outPreset and resPreset being the same. Historically, the group constraints where called preset. Thus the horrible naming. Anyway, resolving the lidvid is done again but latest or as typed but is not resolved correct. An error in translation maybe at this commit 8fb68e9 might have introduced the bug. No matter where it was introduced, the fix would be to change

    	SearchRequest searchRequest = new SearchRequestFactory(RequestConstructionContextFactory.given("lidvid", identifier, true), ctlContext.getConnection())
    			.build(RequestBuildContextFactory.given(false, "lidvid", reqContext.getPresetCriteria()), ctlContext.getConnection().getRegistryIndex());
    	SearchResponse searchResponse = ctlContext.getConnection().getRestHighLevelClient().search(searchRequest,
    			RequestOptions.DEFAULT);

if (searchResponse == null || searchResponse.getHits().getTotalHits().value == 0)
   result = LidVidUtils.getLatestLidVidByLid(ctlContext, reqContext, lid);
else: result = identifier;

As a side note: you can change true to false and dispense with sorting etc below. True means to get just the latest and if this is not working then there another problem - most likely superseded_by is not in the registry index yet. Do need to make sure there is only one result and panic if there are more because it means the database is corrupt with more than one latest. All in all, it would be far more robust. Have to keep the sorting until superseded_by makes it to the index though.

One last note, if VID 14.0 is not archived or certified you will never be able to reach it via the registry-api. The dream is to have roles (logins) where it is not so hard coded as it is today. Not likely the case since superseded_by is set for 7.0 unless provenance.py is now broken too.

@alexdunnjpl
Copy link
Contributor Author

@al-niessner @jordanpadams so given

Do need to make sure there is only one result and panic if there are more because it means the database is corrupt with more than one latest. All in all, it would be far more robust. Have to keep the sorting until superseded_by makes it to the index though.

is the intention that registry API should

  • support provenance-based resolution of latest product versions when provenance data is available (i.e. the script has been run) and the OpenSearch instance is a version which fully supports it
  • NOT fatally break (as it does currently) if either of the previous conditions is not true, and fall back to resolving "latest LIDVID for the specified LID"?

@jordanpadams
Copy link
Member

@alexdunnjpl

support provenance-based resolution of latest product versions when provenance data is available (i.e. the script has been run) and the OpenSearch instance is a version which fully supports it

yes.

NOT fatally break (as it does currently) if either of the previous conditions is not true, and fall back to resolving "latest LIDVID for the specified LID"?

I think we should just throw an error if we have 2+ versions of a LID with no superseded_by specified. I am thinking it would be better for us to have these errors, track the down, and fix the issues, vs. return some value to the user and we think everything is working correctly when it is not. we are better off providing no data to the user instead of bad data.

@alexdunnjpl
Copy link
Contributor Author

@jordanpadams should I modify the error message (to paraphrase) "Error: More than one hit where there should only be one" to add "Is provenance metadata present and up-to-date?"

This seem likely to be a common dev error for a short while (because no-one's used to running the provenance script and it's not in the docker deployment yet afaik) so the extra information should save some time.

@al-niessner
Copy link
Contributor

@alexdunnjpl @jordanpadams

Tradition has it that it should end up as a 50x error that tells the user something useful to give to us that we can then go find in the log file. Telling them about hits is not going to help them. Pretend that they do no know nor want to know that oracle - no - that elasticsearch - no - that opensearch is backing registry-api. The translation of exceptions to http errors can be extended if you do not see something you can abuse now.

@alexdunnjpl
Copy link
Contributor Author

Okay, I'll modify it to be more informative (and unique).

@tloubrieu-jpl
Copy link
Member

tloubrieu-jpl commented Jan 24, 2023

@alexdunnjpl , @al-niessner , @jordanpadams

Should we manually, in the API java code, select the 'latest' VID when there are multiple lid without superseded by and add a WARNING or ERROR in the log, so that the user is not annoyed by that message which is for PDS EN operators/admins.

Another comment, I am wondering why superseded by is at stake when we just want to get a lidvid ? The user should be able to get any superseded version if he is requesting it, right ?

@alexdunnjpl
Copy link
Contributor Author

@tloubrieu-jpl re the first point, I think that's what Jordan's getting at with "we are better off providing no data to the user instead of bad data." (though per my understanding, it would be correct data for 99.9% of products which don't have multi-LID provenance chains).

Your second point is correct and matches the current behaviour, it's only relevant to the extent that I was manually testing queries by LID/partial-LIDVID after making some extensive changes that affect the whole API (see #236).

@al-niessner
Copy link
Contributor

@alexdunnjpl @tloubrieu-jpl

All queries must default to latest according to @jordanpadams like q= etc unless specifically told to find all of them (rarity supposedly). So, all searches are amended with a filter to only look at those that do not contain superseded_by. Tada, always the latest. Occasionally, need to find all or specific non-latest lidvids so need to turn it off in very few cases. Just weeding those out now (second point).

@alexdunnjpl
Copy link
Contributor Author

@al-niessner @jordanpadams

unless specifically told to find all of them (rarity supposedly)

or unless told to find a single, specific LIDVID (like some:product::7.0), right?

@al-niessner
Copy link
Contributor

@tloubrieu-jpl

Unless somebody sends up a flare, nobody will notice the bad database until performance fails because nobody sees the errors in the log like the ones there now. Give the user an error so that they send up a flare and good message in the log a developer can find and use to understand and debug the problem,

@alexdunnjpl

Yes, specific lidvid is one of the weeds.

@jordanpadams
Copy link
Member

Unless somebody sends up a flare, nobody will notice the bad database until performance fails because nobody sees the errors in the log like the ones there now. Give the user an error so that they send up a flare and good message in the log a developer can find and use to understand and debug the problem,

+1

@jordanpadams
Copy link
Member

@tloubrieu-jpl @al-niessner ☝️ I am thinking the same thing as Al. Let's send up a flare and figure out what is going wrong before it becomes pervasive.

@alexdunnjpl
Copy link
Contributor Author

Prospective fix here

@alexdunnjpl
Copy link
Contributor Author

alexdunnjpl commented Jan 24, 2023

With provenance metadata ironed out, LID and partial-LIDVID requests work, but requesting a specific, superseded LIDVID returns zero results.

Error shows that the superseded_by check is still applied when it should not be.

2023-01-24 14:21:10.675 ERROR 225087 --- [io-8080-exec-10] g.n.p.a.r.m.RequestAndResponseContext    : Got 0 hits for a query which should have returned a singular result. Is provenance metadata present and up-to-date? Query was {
  "bool" : {
    "must" : [
      {
        "term" : {
          "lidvid" : {
            "value" : "urn:nasa:pds:insight_rad:data_derived::7.0",
            "boost" : 1.0
          }
        }
      },
      {
        "terms" : {
          "ops:Tracking_Meta/ops:archive_status" : [
            "archived",
            "certified"
          ],
          "boost" : 1.0
        }
      },
      {
        "term" : {
          "product_class" : {
            "value" : "Product_Collection",
            "boost" : 1.0
          }
        }
      }
    ],
    "must_not" : [
      {
        "exists" : {
          "field" : "ops:Provenance/ops:superseded_by",
          "boost" : 1.0
        }
      }
    ],
    "adjust_pure_negative" : true,
    "boost" : 1.0
  }
}

That constraint is the direct result of checking the context (i.e. its selector) and finding ProductVersionSelector.LATEST.

That value is populated from the URIParameters.

So @al-niessner as far as I can see* the missing element here is still that URIParameters.setLidVid() doesn't set URIParameters.selector=ProductVersionSelector.LATEST in the case that the user has specified a LIDVID. This is hard to deal with in the everything-is-a-String version of the code, but I think it's solved by the existence of the new classes, by virtue of

public URIParameters setProductIdentifier(ControlContext control) throws IOException, LidVidNotFoundException
{
	this.productIdentifier = LidVidUtils.resolve(this.getIdentifier(), ProductVersionSelector.SPECIFIC,
			control, RequestBuildContextFactory.empty());
	if (this.productIdentifier instanceof PdsLidVid) {
		this.selector = ProductVersionSelector.SPECIFIC;
	}
	return this;
}

* I still need to re-re-read a couple of paragraphs from your previous reply to check that this isn't addressed.

@alexdunnjpl
Copy link
Contributor Author

alexdunnjpl commented Jan 24, 2023

@al-niessner just to confirm I didn't misunderstand your earlier email - currently, the provenance script doesn't yet support mixed-LID provenance histories (so I don't need to test whether I've broken that), correct?

@al-niessner
Copy link
Contributor

@al-niessner just to confirm I didn't misunderstand your earlier email - currently, the provenance script doesn't yet support mixed-LID provenance histories (so I don't need to test whether I've broken that), correct?

@alexdunnjpl - correct

@al-niessner
Copy link
Contributor

With provenance metadata ironed out, LID and partial-LIDVID requests work, but requesting a specific, superseded LIDVID returns zero results.

Error shows that the superseded_by check is still applied when it should not be.

Not correct at all. To determine if it is to be added or not it is done, or was done, in one and only one place:

if (context.justLatest())
{
ProductQueryBuilderUtil.addHistoryStopband (this.base);
log.debug("************ created and filled the stopband:" + this.base.mustNot().toString());
}

Note that context.justLatest() is NOT of UserContext but of RequestBuildContext. If you have the call stack or if we roughly trace it to LidVidUtils it is caused because the justLatest flag is being sent as true. I do not see what branch this is being pushed on so I cannot see your current code. Point is, it is NOT from the user input. Push your code on a branch so I can look at LidVidUtils and I will tell you what is wrong still. Did you flesh out resolve()?

@alexdunnjpl
Copy link
Contributor Author

alexdunnjpl commented Jan 24, 2023

it is done, or was done, in one and only one place

This is still true, and I can't see an inconsistency with what I asserted in the two paragraphs following the OpenSearch query dump (with all the code links).

My latest working state is on branch moar-lidvid-classes

The selector isn't set in setLidVid() (renamed setProductIdentifier() in my branch), but my take is that it seems like it should be (just a clarification on my "by virtue of" comment earlier).

EDIT: I see what you're saying now about the classes and I might have messed up the source of the selector value - will confirm now.

@al-niessner
Copy link
Contributor

Your problem is here:

case LATEST:
result = LidVidUtils.getLatestLidVidByLid(ctlContext, reqContext, productIdentifier.getLid().toString());
break;

You need to check that what you have is not already a valid lidvid (see previous post with some maybe working code suggestion) that says if what I got is a lidvid and it exists as is in all of history then use it otherwise find latest. If neither it nor its lid version exists then throw a lidvid non-existent exception (resolve() already does).

@alexdunnjpl
Copy link
Contributor Author

alexdunnjpl commented Jan 24, 2023

If a user has specified a non-latest full VID, why is execution making it to that case instead of SPECIFIC (nee TYPED)? I feel like I must still be fundamentally misunderstanding the purpose of the selectors.

Still gotta let that percolate, but @al-niessner @jordanpadams is the intended response to a request for A::2.0, given versions A::1.0, A::3.0, a 404 or an autoresolution to A::3.0?

I would think that any time a user specifies a full LIDVID, if it's not there it should tell the user it's not there, otherwise you risk giving them something they didn't want, but that conflicts with "if what I got is a lidvid and it exists as is in all of history then use it otherwise find latest".

@al-niessner
Copy link
Contributor

+1 for 404 no such lidvid for a partial/bad VID because it is just gibberish

@alexdunnjpl
Copy link
Contributor Author

So we want to remove that fuzzy lidvi support too? I thought that was an intentional choice (per "We don't punish users for [that]")

@al-niessner
Copy link
Contributor

al-niessner commented Jan 24, 2023

So we want to remove that fuzzy lidvi support too? I thought that was an intentional choice (per "We don't punish users for [that]")

It was an intentional choice but we should punish them. It was more of a slop in the code that made life easy and less a good design/implementation practice. As long as we are clear that the given lidvi could not be fully translated as is then they can correct it. My understanding from @jordanpadams is that almost all user will give us a lid expecting the latest. The database is almost entirely VID 1.0 (millions of just 1.0 and couple thousand not 1.0) so the slop lived out of convenience. I am voting to kill the slop but it is not a democracy.

@jordanpadams
Copy link
Member

+1 for 404 no such lidvid for a partial/bad VID because it is just gibberish

👍

So we want to remove that fuzzy lidvi support too? I thought that was an intentional choice (per "We don't punish users for [that]")

right. we shouldn't give the user back something they didn't ask for. if that returns 404, then they can try asking for just the LID. and then if they want all versions, then they can ask for the LID/all. If they really want to do fuzzier lidvid support (e.g. using *, they can use q= to search by lidvid.

the intent of those API endpoints is to really be a lid/lidvid resolver for the overall system. I don't think it really needs to support too much in terms of fuzziness at the moment. if we start seeing use cases where that is needed, we can figure it out.

@tloubrieu-jpl ☝️

@alexdunnjpl
Copy link
Contributor Author

Current state: need to fix local dev deployment to add superseded_by to OpenSearch index, then re-test, then if it's still broken loop back to looking at a bugfix like in #234 (comment)

@jordanpadams
Copy link
Member

@alexdunnjpl that may have something to do with this? NASA-PDS/registry#140

@alexdunnjpl
Copy link
Contributor Author

Ready for merge, since classes endpoints are either invalid or separately-bugged, per #239

alexdunnjpl added a commit that referenced this issue Jan 31, 2023
Add additional lidvid classes
fixes #234
@tloubrieu-jpl
Copy link
Member

Hi @alexdunnjpl , can you add an estimate for this bug fix ? Thanks

@alexdunnjpl
Copy link
Contributor Author

@tloubrieu-jpl done, though I'm curious what value it provides after the fact?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B13.1 bug Something isn't working s.high High severity
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants