Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable dynamic updates to max allowed boolean clauses #1527

Closed
wants to merge 4 commits into from

Conversation

malpani
Copy link
Contributor

@malpani malpani commented Nov 10, 2021

Description

Make indices.query.bool.max_clause_count a dynamic setting

Issues Resolved

#1526

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@opensearch-ci-bot
Copy link
Collaborator

Can one of the admins verify this patch?

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Wrapper Validation success 1c75ad30ac28c86a84b5b1673ab93655529a525b

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 1c75ad30ac28c86a84b5b1673ab93655529a525b
Log 1024

Reports 1024

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Precommit failure 1c75ad30ac28c86a84b5b1673ab93655529a525b
Log 1530

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Wrapper Validation success 42b864b0606160829dec8f65609eb043e4532ea1

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Precommit success 42b864b0606160829dec8f65609eb043e4532ea1

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success 42b864b0606160829dec8f65609eb043e4532ea1
Log 1025

Reports 1025

Copy link
Member

@kkhatua kkhatua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1

Copy link
Member

@kkhatua kkhatua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1

@@ -302,6 +312,10 @@ public SearchService(

lowLevelCancellation = LOW_LEVEL_CANCELLATION_SETTING.get(settings);
clusterService.getClusterSettings().addSettingsUpdateConsumer(LOW_LEVEL_CANCELLATION_SETTING, this::setLowLevelCancellation);

BooleanQuery.setMaxClauseCount(INDICES_MAX_CLAUSE_COUNT_SETTING.get(settings));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the concerns regarding the impact of this change. First of all, to acknowledge, making this property adjustable is useful. However implementation-wise, since BooleanQuery::setMaxClauseCount is static setting, changing it could impact the queries which are already being submitted / executed in a very weird ways. Any thoughts on that?

Copy link
Contributor Author

@malpani malpani Nov 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the limits get checked at the time of query parse/rewrite - so at shard level, the query will either succeed (below clause limit) or fail (breached limit)

Here are different scenarios:

Initial State: with limit at 1024

  • Query with 1500 clauses will be rejected

Transition state: limit raised to 2048 via the dynamic setting added here

  • Ongoing/queries submitted concurrent with the settings change (say to raise the threshold) will initially fail with too many clauses (as before)
  • Eventual consistency will prevail and all nodes will eventually apply the updated setting(within state lag period)
  • In the case of a query spanning multiple shards across nodes, there could be scenario where shards on the node that have applied settings respond successfully and shards on nodes with old setting will return failure and such queries will see partial success during transition (instead of full failure before).
  • Eventually post applying of the setting, all queries with clauses < 2048 will get passed

Does this address your concerns? If not, it will help to share the exact scenario you had in mind

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @malpani , this "... eventual consistency ..." part is essentially what bothers me, not only between nodes but between multiple shards on the same node: it may happen that for the same search request, while it is being distributed between shards, some shards may see old setting whereas others may see new one.

Not sure it is solvable at all, unless we take into account the state of search-related thread pools and delay the change, probably adding documentation notes about the impact of changing this setting on clusters would be sufficient.

Thank you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, I will tag you on the doc update PR.

Intra-node consistency probably can be achieved by making it volatile in Lucene. We would still have the inter node eventual consistency so will document this clearly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there another example where we are adjusting a dynamic property in this way or is it the first time? I'm all for YOLOing a setting at runtime, but maybe @nknize has a stronger opinion on this?

Copy link
Contributor

@setiah setiah Nov 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't dug deeper here but thinking out loud, if we could do such check on the coordinator node that spans out the search request, we could probably avoid intra and inter node inconsistencies by rejecting limits breach at the coordinator itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dblock - yes, this is a common approach for updating dynamic settings for example - https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/search/SearchService.java#L294 . the slight uniqueness here is it is updating a static setting.

@setiah - Interesting thought - while this clause validation can be done on coordinator side, I dont want to parse yet again for checking on this failure scenario and affect the happy paths. Background - each shard query does a rewrite based on local stats for that shard (not available on coordinator) - so the per shard rewrite is not avoidable today. One of the long term exploration areas is to do better coordinator side planning and query estimation and this could be revisited then. I want to keep this change simple and dont really see a problem with eventual consistency semantics here (this is inline with other setting updates)

.prepareUpdateSettings()
.setTransientSettings(Settings.builder().put(INDICES_MAX_CLAUSE_COUNT_SETTING.getKey(), 2048))
);
Thread.sleep(10);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we check if max_clause_count value is updated to new value (with smaller sleeps and timeout) before proceeding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done - reduced sleep time from 10 to 1ms (the ideal way is to track via countdown latches but all my local runs pass with even a 1ms sleep so making the simpler change)

Signed-off-by: Ankit Malpani <malpani@amazon.com>
Signed-off-by: Ankit Malpani <malpani@amazon.com>
@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Wrapper Validation success bb38d20

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Wrapper Validation success 65b9934a34eba14bdaad488e5f2730b5b109e05f

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Precommit success bb38d20

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Precommit success 65b9934a34eba14bdaad488e5f2730b5b109e05f

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success bb38d20
Log 1049

Reports 1049

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 65b9934a34eba14bdaad488e5f2730b5b109e05f
Log 1050

Reports 1050

@malpani
Copy link
Contributor Author

malpani commented Nov 11, 2021

i looked into the gradle failure and its coming from an unrelated area - looks like the docker infra for s3 repository tests did not come up cleanly. Here are the relevant logs - could one of the maintainers triage this or let me know what the next steps are?

> Task :test:fixtures:s3-fixture:composeUp

ERROR: for 56aa93ed60e3e11b06e2067e06b0dd8c_s3-fixture__s3-fixture-with-ec2_1  UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60)

ERROR: for 56aa93ed60e3e11b06e2067e06b0dd8c_s3-fixture__s3-fixture-repositories-metering_1  UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60)

ERROR: for 56aa93ed60e3e11b06e2067e06b0dd8c_s3-fixture__s3-fixture_1  UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60)

ERROR: for 56aa93ed60e3e11b06e2067e06b0dd8c_s3-fixture__s3-fixture-with-ecs_1  UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60)

ERROR: for 56aa93ed60e3e11b06e2067e06b0dd8c_s3-fixture__s3-fixture-with-session-token_1  UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60)
Creating 56aa93ed60e3e11b06e2067e06b0dd8c_s3-fixture__s3-fixture-other_1                 ... done

ERROR: for s3-fixture-with-ec2  UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60)

ERROR: for s3-fixture-repositories-metering  UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60)

ERROR: for s3-fixture  UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60)

ERROR: for s3-fixture-with-ecs  UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60)

ERROR: for s3-fixture-with-session-token  UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60)
An HTTP request took too long to complete. Retry with --verbose to obtain debug information.
If you encounter this issue regularly because of slow network conditions, consider setting COMPOSE_HTTP_TIMEOUT to a higher value (current value: 60).
Stopping 56aa93ed60e3e11b06e2067e06b0dd8c_s3-fixture__s3-fixture-other_1 ... 

> Task :distribution:bwc:staged:buildBwcLinuxTar
 [1.2.0] > Task :distribution:archives:buildLinuxTar

> Task :test:fixtures:s3-fixture:composeUp FAILED
Stopping 56aa93ed60e3e11b06e2067e06b0dd8c_s3-fixture__s3-fixture-other_1 ... done
Removing 56aa93ed60e3e11b06e2067e06b0dd8c_s3-fixture__s3-fixture-with-session-token_1    ... 
Removing 56aa93ed60e3e11b06e2067e06b0dd8c_s3-fixture__s3-fixture_1                       ... 
Removing 56aa93ed60e3e11b06e2067e06b0dd8c_s3-fixture__s3-fixture-with-ecs_1              ... 
Removing 56aa93ed60e3e11b06e2067e06b0dd8c_s3-fixture__s3-fixture-repositories-metering_1 ... 
Removing 56aa93ed60e3e11b06e2067e06b0dd8c_s3-fixture__s3-fixture-with-ec2_1              ... 
Removing 56aa93ed60e3e11b06e2067e06b0dd8c_s3-fixture__s3-fixture-other_1                 ... 
Removing 56aa93ed60e3e11b06e2067e06b0dd8c_s3-fixture__s3-fixture-with-ec2_1              ... done
Removing 56aa93ed60e3e11b06e2067e06b0dd8c_s3-fixture__s3-fixture-repositories-metering_1 ... done
Removing 56aa93ed60e3e11b06e2067e06b0dd8c_s3-fixture__s3-fixture_1                       ... done
Removing 56aa93ed60e3e11b06e2067e06b0dd8c_s3-fixture__s3-fixture-with-session-token_1    ... done
Removing 56aa93ed60e3e11b06e2067e06b0dd8c_s3-fixture__s3-fixture-with-ecs_1              ... done
Removing 56aa93ed60e3e11b06e2067e06b0dd8c_s3-fixture__s3-fixture-other_1                 ... done
Removing network 56aa93ed60e3e11b06e2067e06b0dd8c_s3-fixture__default```

@reta
Copy link
Collaborator

reta commented Nov 11, 2021

@malpani yeah, seems like broad CI issue, have seen it there #1500 (comment) as well.

@dblock
Copy link
Member

dblock commented Nov 11, 2021

start gradle check

@dblock
Copy link
Member

dblock commented Nov 11, 2021

@malpani amend your commits with -s, DCO check is failing

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success 65b9934a34eba14bdaad488e5f2730b5b109e05f
Log 1055

Reports 1055

Signed-off-by: Ankit Malpani <ankit.malpani@gmail.com>
Signed-off-by: Ankit Malpani <ankit.malpani@gmail.com>
@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Wrapper Validation success 67fee73

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Wrapper Validation success ed0e6e4

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Precommit success 67fee73

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Precommit success ed0e6e4

@malpani
Copy link
Contributor Author

malpani commented Nov 13, 2021

@malpani amend your commits with -s, DCO check is failing

fixed this now. The -s was confused between my work and gmail id

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success 67fee73
Log 1073

Reports 1073

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure ed0e6e4
Log 1074

Reports 1074

@malpani
Copy link
Contributor Author

malpani commented Nov 14, 2021

the current gradle check failure looks transient as running ClusterHealthIT locally passed - relevant logs

WARNING: Please consider reporting this to the maintainers of org.gradle.api.internal.tasks.testing.worker.TestWorker
WARNING: System::setSecurityManager will be removed in a future release

REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.cluster.ClusterHealthIT.testHealthOnMasterFailover" -Dtests.seed=FF43A248A83ACD45 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=en-NZ -Dtests.timezone=Etc/GMT+3 -Druntime.java=17

org.opensearch.cluster.ClusterHealthIT > testHealthOnMasterFailover FAILED
    java.util.concurrent.ExecutionException: RemoteTransportException[[node_s5][127.0.0.1:40395][cluster:monitor/health]]; nested: MasterNotDiscoveredException[NodeDisconnectedException[[node_s0][127.0.0.1:46309][cluster:monitor/health] disconnected]]; nested: NodeDisconnectedException[[node_s0][127.0.0.1:46309][cluster:monitor/health] disconnected];
        at __randomizedtesting.SeedInfo.seed([FF43A248A83ACD45:61AC8D65ABD0D1A3]:0)
        at org.opensearch.common.util.concurrent.BaseFuture$Sync.getValue(BaseFuture.java:281)
        at org.opensearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.java:268)
        at org.opensearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:99)
        at org.opensearch.cluster.ClusterHealthIT.testHealthOnMasterFailover(ClusterHealthIT.java:393)

        Caused by:
        RemoteTransportException[[node_s5][127.0.0.1:40395][cluster:monitor/health]]; nested: MasterNotDiscoveredException[NodeDisconnectedException[[node_s0][127.0.0.1:46309][cluster:monitor/health] disconnected]]; nested: NodeDisconnectedException[[node_s0][127.0.0.1:46309][cluster:monitor/health] disconnected];

            Caused by:
            MasterNotDiscoveredException[NodeDisconnectedException[[node_s0][127.0.0.1:46309][cluster:monitor/health] disconnected]]; nested: NodeDisconnectedException[[node_s0][127.0.0.1:46309][cluster:monitor/health] disconnected];
                at org.opensearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$2.onTimeout(TransportMasterNodeAction.java:275)
                at org.opensearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:369)
                at org.opensearch.cluster.ClusterStateObserver.waitForNextChange(ClusterStateObserver.java:174)
                at org.opensearch.cluster.ClusterStateObserver.waitForNextChange(ClusterStateObserver.java:142)
                at org.opensearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction.retry(TransportMasterNodeAction.java:258)
                at org.opensearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction.retryOnMasterChange(TransportMasterNodeAction.java:239)
                at org.opensearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction.access$200(TransportMasterNodeAction.java:143)
                at org.opensearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$1.handleException(TransportMasterNodeAction.java:224)
                at org.opensearch.transport.TransportService$6.handleException(TransportService.java:742)
                at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1357)
                at org.opensearch.transport.TransportService$9.run(TransportService.java:1214)
                at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:733)
                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
                at java.lang.Thread.run(Thread.java:833)

Copy link
Collaborator

@nknize nknize left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of my concerns were already raised on this PR.

  • eventual consistency for in-flight queries
  • overall performance degradation

In general I'm concerned about users being too cavalier with this setting. One mis-behaved query may lead to a user blindly increasing this setting instead of fully inspecting and re-working the query to improve performance and suddenly long term search latencies lead to a boiling frog. At least right now the cluster needs to be brought down before blindly changing the parameter.

What version is this targeting?

Lucene 9 refactored BooleanQuery.maxClauseCount to IndexSearcher.maxClauseCount and there has been some discussion around further refactoring to be per-searcher instead of global so this can be converted to an IndexScope setting.

I'm all for the progress not perfection and it is true there are other settings (e.g., search.max_buckets) that can cause a similar issue but I'm not sure that justifies adding another potential trap?

I think a better approach might be to consider contributing an upstream Lucene 9.1 change to refactor IndexSearcher.maxClauseCount to be per-index and then change this PR to refactor the current NodeScope indices.query.bool.max_clause_count to be IndexScope index.query.bool.max_clause_count and release as an enhancement in OpenSearch 2.0. This has the benefit of minimizing the bwc blast radius. In the meantime, users can continue changing this setting w/ a full cluster restart, which I think is a better guardrail than adding this dynamic leniency.

@malpani
Copy link
Contributor Author

malpani commented Nov 17, 2021

Thanks @nknize for reviewing. Here are my responses

eventual consistency for in-flight queries

It is an artifact of a distributed cluster state. It is the same with any other dynamic setting on opensearch - am I missing something?

overall performance degradation ... In general I'm concerned about users being too cavalier with this setting. One mis-behaved query may lead to a user blindly increasing this setting instead of fully inspecting and re-working the query to improve performance and suddenly long term search latencies lead to a boiling frog. At least right now the cluster needs to be brought down before blindly changing the parameter.

I prefer users get to decide this ie. providing the tradeoff of a slow query as against today's behavior of a failing query with no option to get out of except a full cluster restart or re-working client logic. That is the intent behind making it dynamic.
Clear documentation that calls this an expert only setting can give users enough info to experiment and decide the limits. Because it is not dynamic today, even experimentation is hard.

What version is this targeting?

1.3 - as existing behavior and defaults are unchanged and it is just an extra convenience knob. We have some customers asking for this ability and today it is operationally heavy to update ymls and do rolling restarts as the only way to bump up clause limits

I think a better approach might be to consider contributing an upstream Lucene 9.1 change to refactor IndexSearcher.maxClauseCount to be per-index and then change this PR to refactor the current NodeScope indices.query.bool.max_clause_count to be IndexScope index.query.bool.max_clause_count and release as an enhancement in OpenSearch 2.0.

Based on this suggestion, can i assume that you are not concerned about making the setting dynamic :)

Index vs cluster level setting is an orthogonal discussion. I am not convinced that index level granularity is needed upfront and hence prefer an incremental approach of roll out with cluster level setting in 1.3 (which uses Lucene 8.x and needs no changes to Lucene). If there is ask from users to get more granular, per index option can be revisited on the next major version of OpenSearch.

@reta
Copy link
Collaborator

reta commented Nov 17, 2021

@malpani I think setting maxClauseCount per-index (or even per-request) would be ideal, and as @nknize pointed out, contributing Lucene change may get us there. I think if we settle on current approach (with all pros and cons discussed), it could be difficult to introduce more granular configuration without changing the semantic of this dynamic setting, just my 2 cents.

@nknize
Copy link
Collaborator

nknize commented Nov 17, 2021

It is the same with any other dynamic setting on opensearch - am I missing something?

"...I'm not sure that justifies adding another potential trap?"

and today it is operationally heavy to update ymls and do rolling restarts as the only way to bump up clause limits

this was my point behind "...At least right now the cluster needs to be brought down before blindly changing the parameter." as a guard rail against being overly cavalier about this parameter. This limitation isn't a blocker today, it's just an inconvenience; and one I think is good to prevent against silently harming search performance.

I prefer users get to decide this ie. providing the tradeoff of a slow query as against today's behavior of a failing query...

+100 to community decision. IMHO failure in this case is better than a slow query. In the former there's no question as to the cause TooManyClausesException at which point users can profile the offending query and rethink their query architecture to fix that failure. In the latter users will need to investigate this additional knob as a "potential" source of performance degradation on unrelated queries (e.g., new parallel slow queries contend for the same resources as once expedient queries). Those runtime performance issues take much more time to discover usually result in higher support costs in the end.

Based on this suggestion, can i assume that you are not concerned about making the setting dynamic :)

If it's index scoped where behavior is impacted on a per index basis as opposed to all queries then yes I'm not opposed to it being dynamic.

Index vs cluster level setting is an orthogonal discussion.

Refactoring from a static to a runtime variable on a per IndexSearcher scope is coming at the lucene level regardless of what we decide here. We can influence and expedite that if we'd like and get it done for 9.1. The result is that would have to wait for OpenSearch 2.0 at the earliest (assuming 9.1 is released before OpenSearch 2.0; a safe bet I think) which I think is a net positive to minimize bwc requirements.

it could be difficult to introduce more granular configuration without changing the semantic of this dynamic setting

Precisely. We'd be shifting from cluster wide indices.query.bool.max_clause_count to index specific index.query.bool.max_clause_count which would require a migration path for 1.x users that set this cluster wide and then have the ability to switch to index specific. Which setting takes priority during the deprecation phase? Introducing this as a dynamic, per-index, setting eliminates that headache; and the only tradeoff is setting it through the yml and restarting the cluster (which also avoids the eventually consistent concern)?

I like the idea of contributing upstream to lucene and introducing this as an index level setting in OpenSearch 2.x.

@nknize nknize closed this Nov 17, 2021
@nknize nknize reopened this Nov 17, 2021
@malpani
Copy link
Contributor Author

malpani commented Nov 17, 2021

Thanks, agree that a future migration to index level from cluster level will become an overhead. I will close this PR for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants