-
Notifications
You must be signed in to change notification settings - Fork 673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SOLR-17151 Check limits between calls to components in SearchHandler #2801
Conversation
Provides additional detail message from first failing shard Tests for this feature pass, still debugging 3 failing tests relating to debugMap and shardsInfo, so not quite ready yet.
… use cases such as tolerant search.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@@ -615,17 +615,23 @@ protected int groupedDistributedProcess(ResponseBuilder rb) { | |||
} | |||
|
|||
protected int regularDistributedProcess(ResponseBuilder rb) { | |||
if (rb.stage < ResponseBuilder.STAGE_PARSE_QUERY) return ResponseBuilder.STAGE_PARSE_QUERY; | |||
if (rb.stage < ResponseBuilder.STAGE_PARSE_QUERY) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, I never liked these one-liners.
shardInfo.add(shardInfoName, nl); | ||
rsp.getValues().add(ShardParams.SHARDS_INFO, shardInfo); | ||
private static String stageInEnglish(int nextStage) { | ||
// This should probably be a enum, but that change should be its own ticket. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to both.
+ " in " | ||
+ components.stream() | ||
.map(SearchComponent::getName) | ||
.collect(Collectors.toList())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should build a list of all component names in initComponents()
and keep it around?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The supplier is only evaluated if the exception is thrown.
@@ -70,9 +70,12 @@ public void testPrefixQuery() throws Exception { | |||
|
|||
// this time we should get a query cache hit and hopefully no exception? this may change in the | |||
// future if time checks are put into other places. | |||
assertJQ(req("q", q, "timeAllowed", "1", "sleep", sleep), assertionString); | |||
// 2024-4-15: it did change..., and now this fails with 1 or 2 ms and passes with 3ms... I see | |||
// no way this won't be terribly brittle. Maybe TestInjection of some sort to bring this back? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use CallerSpecificQueryLimit
(example in TestQueryLimits.testQueryLimits()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that helps. The test wants to verify that the result was cached and is hoping that the request won't time out because of that, but that only worked because timeAllowed was only measured/triggered in a small area of code sensitive to the availability of the cache. Now we check it all over the place. If we raise the limit too much it's not a test, and if this test gets starved for resources it will fail spuriously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What we really need for this test is a cache admin command that can list the contents of the cache (and one would expect a manual clear capability too, though that's not needed here) One might be able to get by with a metrics call, looking for an increment in cache hits, but that's pretty heavy reliance on coincidence rather than explicit testing of the actual desired behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just looked at the ability to check directly. There is some hope since we could
SolrCores solrCores = ExitableDirectoryReaderTest.h.getCoreContainer().solrCores;
List<SolrCore> cores = solrCores.getCores();
for (SolrCore core : cores) {
// identify core we need here...
}
but assuming we can identify the appropriate core, there isn't even an internal api to check caches. Would look for something steming from core.getSearcher().get().???
but we would need to add an accessor for the queryCache and then on SolrCache a way to inspect the keys of the cache... that seems way beyond what this change should attempt.
throw new SolrServerException( | ||
"Time allowed to handle this request exceeded" + suffix, previousEx); | ||
"The processing limits for to this request were exceeded, see cause for details", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes!
Crave seems to be stalled entirely. Tests pass locally, merging |
…pache#2801) (cherry picked from commit 125fc3f)
https://issues.apache.org/jira/browse/SOLR-17151
This change seeks to ensure that a request that has exceeded a query limit will not continue processing (potentially expensive) components before returning partial results in response to a query. At the point of the creation of this PR, there's a couple failing tests to track down, but there's a good chance those will just be some places where the tests expect book-keeping in debugMap and shardsInfo that has now been short-circuited (and may or may not actually need to be retained). Happy to accept feedback while I track those down.