-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Test] Reduce concurrency when testing creation of security index #75293
[Test] Reduce concurrency when testing creation of security index #75293
Conversation
The internal cluster could get overwhelmed by many nodes and large number of concurrent putUser request. It can sometimes fail with confusing messages when JVM is under pressure. This PR reduces the concurrency so it has a better chance to succeed.
Pinging @elastic/es-security (Team:Security) |
final int numThreads = Math.min(50, scaledRandomIntBetween((processors + 1) / 2, 4 * processors)); // up to 50 threads | ||
final int maxNumRequests = 50 / numThreads; // bound to a maximum of 50 requests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also choose to reduce the concurrency only when the test is inFipsJvm
since the build stats show that all failures so far are for FIPS jobs.
// TODO: We could also just assert future.actionGet does not throw since either created | ||
// or updated is sufficient for the test purpose, i.e. security index is created. | ||
for (ActionFuture<PutUserResponse> future : futures) { | ||
assertTrue(future.actionGet().created()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to add this comment to be able to comment this piece of code on the GitHub page. Since the test just needs to ensure security index is created when there are concurrent operations, either a "created" or "updated" user is sufficient. If we make this change, it has the benefit to not fail the test if the same edge case (#73779) happens. Note this change does not replace the reduction of concurrency because large number of threads can fail at prepration phase which is before the requests can be sent off at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can make that change, given what the test is really trying to do.
It's unfortunate that PutUser might sometimes given a misleading response, but it's a natural consequence consequence consequence of not being able to guarantee exactly-once-delivery. Sometime an internal retry will produce a no-op update instead of a create.
But the point is that the index & alias are created and the Put operation succeeds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Sorry this one got pushed down the list of reviews, and I forgot about it.
…astic#75293) The internal cluster could get overwhelmed by many nodes and large number of concurrent putUser request. It can sometimes fail with confusing messages when JVM is under pressure. This PR reduces the concurrency so it has a better chance to succeed. The test also no longer relies on the user being "created" instead of "updated". Since they do not make a difference for the purpose of this test.
💚 Backport successful
|
…5293) (#77836) The internal cluster could get overwhelmed by many nodes and large number of concurrent putUser request. It can sometimes fail with confusing messages when JVM is under pressure. This PR reduces the concurrency so it has a better chance to succeed. The test also no longer relies on the user being "created" instead of "updated". Since they do not make a difference for the purpose of this test.
* master: (185 commits) Implement get and containsKey in terms of the wrapped innerMap (elastic#77965) Adjust Lucene version and enable BWC tests (elastic#77933) Disable BWC to upgrade to Lucene-8.10-snapshot Reenable MlDistributedFailureIT [DOCS] Fix typo for `script.painless.regex.enabled` setting value (elastic#77853) Upgrade to Lucene-8.10.0-snapshot-bf2fcb53079 (elastic#77801) [DOCS] Fix ESS install lead-in (elastic#77887) Resolve thirdparty gradle plugin artifacts from mavencentral (elastic#77865) Reduce the number of times that `LifecycleExecutionState` is parsed when running a policy. (elastic#77863) Utility methods to add and remove backing indices from data streams (elastic#77778) Use Objects.equals() instead of == to compare strings (elastic#77840) [ML] prefer least allocated model when a new node is added to the cluster (elastic#77756) Deprecate ignore_throttled parameter (elastic#77479) Improve LifecycleExecutionState parsing. (elastic#77855) [DOCS] Removes deprecated word from HLRC title. (elastic#77851) Remove legacy geo code from AggregationResultUtils (elastic#77702) Adjust SearchableSnapshotsBlobStoreCacheIntegTests.testBlobStoreCache (elastic#77758) Laxify SecureSM to allow creation of the JDK's innocuous threads (elastic#77789) [Test] Reduce concurrency when testing creation of security index (elastic#75293) Refactor metric PipelineAggregation integration test (elastic#77548) ... # Conflicts: # server/src/main/java/org/elasticsearch/cluster/metadata/DataStream.java
The internal cluster could get overwhelmed by many nodes and large
number of concurrent putUser request. It can sometimes fail with
confusing messages when JVM is under pressure. This PR reduces the
concurrency so it has a better chance to succeed.
Resolves: #73779