-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API falsely reports 10000 hits for hits>10000 #343
Comments
@tloubrieu-jpl @jimmie @sjoshi-jpl @alexdunnjpl I will setup a working meeting to debug this. this bug has reoccurred a few times now so I want to make sure we are covering all the bases. |
@jordanpadams N.B. I have a meeting now, but a cursory search shows that it may just be the hits count (not the actual results set) which is limited, in which case it may just be necessary to add This is supported by testing with |
Confirmed that appending Quick fix is to append this. If performance becomes an issue (it shouldn't), we can address that later by returning the relation object rather than the fully-resolved count. @jordanpadams could you please re-triage? |
@alexdunnjpl triage complete. |
@alexdunnjpl is there a code update associated with this ticket coming? |
@jordanpadams yes, eventually (or sooner, if you want to bump the priority back up) |
copy thanks @alexdunnjpl we will keep it in the backlog until you complete your current task and we can re-evaluate |
Nothing to be done here unless we are going to be happy with 10+ instead of 11 for total hits. The relation 'gte' is nifty if we could have state as we could dump the current results and move to a scroll window which we would be using already and avoid the problem outright. The Java object for total hits is Integer so 10+ really is not going to work there either. I guess, just nothing to be done really. |
@al-niessner per @alexdunnjpl, can we update the API to us |
Sure you can add another field in the summary to indicate that there are more hits that you can never fetch. The reason you get the 10000 is that it is some max_something_or_other opensearch server configuration value. Some of our PDS units are set to 2**32-1 or some other ridiculously large value. Others are the default 10000. Should be able to search the old tickets in this repo to find the exact parameter. Something about max and window. Anyway, it is all a server configuration problem. |
@al-niessner that configuration was changed, this |
@al-niessner here's the qparam in question https://opensearch.org/docs/1.2/opensearch/rest-api/search/#url-parameters (see: By default, opensearch will not return an exact value for hits, presumably as this takes some nontrivial amount of extra time than not returning the exact value, when paging hits. More relevant discussion/context from earlier in this ticket's chatter |
We need to discuss this one because my experience with opensearch is exactly the opposite. For instance, if I do this search Also, the 10000 is just too coincidental to being the default window size for me to not to be Missourian - show me state. Not disparaging you, but reverting back to my Hume readings that eyewitness testimony is the weakest form of evidence. Apparently Hume was from Missouri long before its founding. Now, maybe when a cluster or something has a node go missing and opensearch says I found what I found but there could be more on the missing node(s) then that would be nice. However, is that not a an error rather than obscure flag int eh summaary? Lastly, would it not be simpler to just the URI parameter to true always rather than pass something back to the user? The parameter you named is an input to opensearch and not an output. Yeah, breakout for this ticket. |
@al-niessner I believe @alexdunnjpl was thinking we just include |
Double checking the URL in the first comment. Changed start to 10001 and left limit alone:
It returned a non-error which indicates that the window is big validating @alexdunnjpl statements:
@jordanpadams |
If further validation is required, just take the API out of the loop and curl the OpenSearch directly, with and without the |
🎉 |
re-implement #343 - track_total_hits = true
Checked for duplicates
Yes
🐛 Describe the bug
When I did https://pds.nasa.gov/api/search/1/products?limit=100&q=product_class%20eq%20%22Product_Observational%22&start=0 I noticed hits = 10000
🕵️ Expected behavior
I expected 1000000000 per https://github.com/NASA-PDS/registry-mgr/blob/main/src/main/resources/elastic/registry.json#LL5C36-L5C46
📜 To Reproduce
🖥 Environment Info
API v1.2
📚 Version of Software Used
No response
🩺 Test Data / Additional context
No response
🦄 Related requirements
No response
⚙️ Engineering Details
No response
The text was updated successfully, but these errors were encountered: